[This Transcript is Unedited]
DEPARTMENT OF HEALTH AND HUMAN SERVICES
NATIONAL COMMITTEE ON VITAL AND HEALTH STATISTICS
Working Group on HHS Data Access and Use
June 22, 2012
Doubletree Hilton Hotel
8727 Colesville Road
Silver Spring, MD 20910
Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030
caset@caset.net
CONTENTS
- Working Group on Data Access and Use
P R O C E E D I N G S
Agenda Item: Working Group on Data Access and Use
DR. CARR: Welcome. The HHS Data Access and Use is about to begin. The absolutely long awaited, delightful, wonderful day, because we have been looking forward to this for a long time. We will begin with introductions. The convention for the committee members is to say who you are, where you are from, and whether you have any conflicts about what is being discussed. Although I have to say, I think it is almost an impossibility if you are eligible for the committee, you are probably very conflicted, in any event, to disclose. The Work Group members, is that the expectations as well.
MS. GREENBERG: The Work Group members are not special government employees. I think just for in the interest of transparency and sharing, I think if each of them could just tell us a little bit about a little more than what is typical for this first meeting.
DR. CARR: That is right. Maybe a paragraph.
MS. GREENBERG: Once we convene the group I think everybody is the same. We are asking, I think, Debbie, we are going to ask for the bio sketches also, so we will be following up with you on that on everybody, so you can have those.
DR. CARR: I will begin. Again, I am Justine Carr. I am the chair of NCVHS. I am Chief Medical Officer for Steward Health Care System, which is integrated care delivery system of 10 hospitals and 2000 physicians in Eastern Mass. We are very much immersed in the new world of accountable care because we were one of the original 30 pioneer projects. With that I will turn to my left.
MR. SCANLON: Good afternoon everyone. I am Jim Scanlon. I am with HHS. I am the Deputy Assistant Secretary for Planning and Evaluation at HHS. I am the staff director for the full committee here too.
Our office at HHS has been the think tank for the Secretary, so we get to try things that the programs will be afraid to do or get in trouble if they even thought they were – the public thought they were getting involved. And we have often in the past, been asked to incubate new concepts and ideas. We staff Todd Parks, for example, on a number of issues there. We also manage within HHS we have a similar group, the HHS Data Council, from an internal point of view, discusses our data plans and strategies and so on. Thanks again for agreeing to serve and welcome.
DR. COHEN: I am Bruce Cohen from the Massachusetts Department of Public Health. I am Director of Research in Epidemiology and the Bureau of Health Statistics. I am on the board of NAPHSIS, National Association of Public Health Statistics and Information Systems. I have an abiding passion as a resident data geek, I have been chair of our IRB and Research Data Access Committee at EPH. I have been involved with community data and a lot of community needs assessment. I do lots of community data training. I have been developing web-based data query systems for the State of Massachusetts for 20 years.
I was at NCHS before I moved up the food chain to the state. Maybe I should strike that from the record. But I really am passionate about providing data for community and policy decision making. And I am really excited about this opportunity.
DR. KAUSHAL: I am Mohit Kaushal. I am excited to be here. ER MD guy, by way of background, faculty of GW still. Then to business school. Joined the dark side as a venture capitalist for a number of years, and then ended up coming to the administration. Built the first health care team in the Federal Communications Commission. Started all the regulatory processes with the FDA around networks and really thinking about what is the prerequisite connectivity we need to enable all the data-driven services that we see in other industries.
And then left there about 18 months ago to join the West Health Initiative. I think the best way of thinking of us is we are non for profit with a pretty unique financing structure, a husband and wife giving away their legacy to this aim which is reducing cost of health care. We have a policy to take care of in Washington, DC where we do general convene. As an example we held the CMS innovation center’s first event earlier this year.
We have a fellows program, Peter Newman from Tufts is our first fellow, trying to highlight where regulation of reimbursement can be moved to save billions of dollars. We are hosting with Greg Downey, who may be here later on, the first data innovator in residence with the CTO’s department there. Essentially to be a real conduit between the data that government is releasing and all the entrepreneurs and private sector out there who are crying for this raw data to turn into something meaningful.
The other part of our organization, we run a philanthropic venture fund, so we invest, we don’t take board seats into companies, we hope to accelerate to fulfill our mission. And then in the institute we have about 20 engineers. We actually innovate from grounds up on things that we don’t think organizations would do. Interoperability is a key focus of ours is an example. We think you need the data liquidity to again to enable the services and the systems that we see that we want to perpetuate. Again, very excited to be here. Thank you.
DR. CRAVER: Good afternoon. I am Jim Craver. I am not Ed Sondik. He has asked me to stand in for him. I am the co-acting director of the Office of Analysis and Epidemiology at the National Center for Health Statistics, co-acting with Jennifer Madans, that I believe some of you know.
Out of my area we have the Health Indicators Warehouse. We take very seriously the fact that NCHS can be a wholesale shop to make data open and to liberate it to maybe be on the frontlines of some of that liberation. The technical folks in my area are perhaps more knowledgeable and aware of some of the risks involved in that, but also of the potential. I am very happy to be here in Dr. Sondik’s stead and hope to learn a lot and bring a lot to the table. Thank you.
MS. QUEEN: I am Susan Queen, the director of the division of data policy in Jim’s office at ASPE. And formerly from NCHS, many years ago, working on the NHANES survey for a number of years. My background is survey statistician. At HRSA, I learned kind of a bigger picture of the department’s data systems, public health, briefly at SAMHSA, in their office – well it was the Office of Applied Studies, now it is CBSQ, Center of Behavioral Statistics and Quality, and then here at ASPE and have a lot of interest in data liberation, particularly looking at issues of the mosaic effects, privacy protections, et cetera. I look forward to working with you all.
MS. KLOSS: I am Linda Kloss, a member of the full committee and I am a health information management professional, and currently a consultant on health information management policy and strategy. I look forward to hearing this first launch meeting. Thank you.
DR. CROWLEY: Hi. Good afternoon. I am Kenyon Crowley. I am Director of Health Innovation at the University of Maryland’s Center of Excellence in Health IT Research, as well as help lead programs in our Center for Health Information and Decision Systems. Our research is principally focused on how do we adopt information systems and health care looking at strategic policy and technical considerations. Then we work across University of Maryland systems, both clinical institutes and our computer scientists and others throughout the system to understand how we can make the best use of information systems in health care. I am delighted to be here.
Something of interest to the committee, we are working with the military health system right now to free up their data assets for the scientific community. We have been working through a number of those issues and hope to collaborate with HHS in different ways to do that.
Additional background, some of our key research areas include consumer informatics, big data analytics. We have a social media and health care group and then we do a lot of work on care transitions and health information exchange. And we are running some programs for the health benefits exchange as well. Before Maryland, I was in private industry with Johnson and Johnson, I worked with their health IT strategy group. Just again, delighted to be here. Thanks.
DR. VAUGHAN: Good afternoon. My name is Dr. Leah Vaughan. I am a physician and epidemiologist based in the San Francisco Bay area. I am the Director of the Health Policy Group which is an umbrella in the organization with social enterprise space, managing a series of projects. My particular expertise is in geographic information systems and geospatial/ geolocation data.
Early and long-time advocate in open data and open access initiatives, both in California and nationally. I manage the social media for an international humanitarian community, and I am also the co-founder of Medan(?) Start Up in the Bay area.
DR. MAYS: My name is Vickie Mays. I am at UCLA. I am a professor of psychology and also a professor of health services in the School of Public Health. Probably the most relevant in terms of this particular group, is that I just received five years of funding for my minority health disparities where it is all being driven by the use of technology. I had my first experience in terms of this past quarter, I was teaching engineers and computer science students with the director of Wireless Institute, for the development of potential apps and other things for things in disparities. I am going to continue to co-teach with him and will be teaching also with the nano-tech center, in terms of developing things in the area of how specific to health disparities.
MR. DAVENHALL: My name is Bill Davenhall, and I am a Global Marketing Manager for a private company called ESRI, which is the world’s largest software developer for geographical information systems. Most of my time is spent outside of the United States right now. My background includes a degree in medical behavioral science, University of Kentucky, and prior to that I was at a hospital association of Kentucky in hospital operations for many years. I am much older than I look. I have to tell you that about 80 percent of our revenue is derived from state, local, and federal governments from around the world in providing this geographic digitalization software.
Our division in health and human services covers public health, hospitals, social services, managed care. I used to tell people everything but forest health – you know like tree health and everything, slowly I have come to realize that is part of the ecosystem. I guess I am in charge of that also.
DR. ROSENTHAL: Joshua Rosenthal, co-founder of a little health care analytics start up on pay for performance called RowdMap and background is PhD Fulbright Center for Advanced Study(?). And we have had multiple health care analytics start ups and success with them all – whether it was old school kind of Dartmouth Atlas variation to different sorts of things and peer play(?) technology as well as quantifying social media and turning it into something meaningful in the health care space. Conversations, served CMS, HHS in different ways around. I also work with some of the folks at MIT and Harvard on the startups coming out of that.
I had a number of conversations with Todd and company about how do you take all this public data and do something meaningful with it in terms of have an astronomical failure rate in the health care 2.0 startup space, compared to other verticals. And also you might say our legacy folks haven’t adopted in the ways we might have hoped. I’ve had numerous conversations around that and the most recent turned into an educational piece at HDI, which Todd was graciously and kindly referencing. How do you take that and turn it meaningful? I am very glad to be here and it is very much in line for a number of conversations.
DR. SUAREZ: Good afternoon everyone. My name is Walter Suarez. I am the director of Health IT Strategy and Policy for Kaiser Permanente. I have actually been in the standards world for many years, going back to before even HIPAA started, when I was working in the State of Minnesota. The State of Minnesota actually passed a law in 1994 that established the first round of state mandated standards similar to HIPAA, transactions and identifiers. We like to think that they thought the federal government, was really good idea and they cut and paste a lot of data and put it into something called Health Insurance Portability and Accountability Act in 1996. I have worked in this space for almost now 20 years.
At Kaiser, basically I am responsible for helping give shape to our enterprise wide health IT, HIE and meaningful use now strategy and implementation. As it happened, we are in ten different states and District of Columbia. We have to address issues and applications that are unique in different states for also some things related to health IT and HIEs.
I am also a member of the National Committee. I co-chair the Standards Subcommittee of the NCVHS. I am also a member of the Board of Directors of the Public Health Data Standards Consortium. I like to think that public health is actually my second passion. As part of the work with the consortium, have helped create something called the Joint Public Health Informatics Taskforce, which is an organization that groups – brings together basically the largest public health associations in the country. NAPHSIS is a member, NAHDO and ASHTO and CSD and others.
I am also a member of the Health IT Standards Committee, and co-chair of the Standards Committee, the Privacy and Security Workgroup, and serve as liaison between NCVHS and the Standards Committee. I am based here in the East Coast, which makes it a lot easier for me to come to all these meetings. Thank you. And I am very excited to be part of this group. I am looking forward to it.
MS. GREENBERG: Good afternoon and welcome. I am sorry I am sort of being blocked by my computer, but I was so excited with these refreshments that I spilled them all over my laptop. Despite that, I am the Executive Secretary to the committee. I am Marjorie Greenberg. I have been at the National Center for Health Statistics since 1982. I have been the Executive Secretary of this committee since about 1998, but have worked the entire time I have been at the national center, and actually was sort of a groupie of the committee before that. I got my start in health data at the Commission on Professional and Hospital Activities in Ann Arbor, Michigan. Some of you may have known about CPHA.
My responsibilities in addition to this committee at NCHS, are for international classifications, specifically the clinical modification of ICD9 and ICD10. When I told someone that recently, his response was, I am sorry. Actually I am proud of it and optimistic that we will move along as has been proposed.
In that capacity, I am head of a World Health Organization collaborating center for the family of international classifications, which is principally the ICD and then a functional status class or disability classification, The International Classification of Functional Disability and Health. That is one of my passions as well.
Like Walter, he and I were founding members of the Public Health Data Standards Consortium, and very much involved in public health standards and informatics and I have a small informatics group as well. That keeps us and my little staff pretty busy.
We are delighted to be supporting this new Working Group in any way we can. I think you have met my colleagues in crime here, but the executive secretary, my wonderful staff, Debbie Jackson, the actress in the group, and Nicole Cooper who will help in many ways, et cetera. I think you met Susan Kanaan who is a writer. She is not officially on my staff, but I consider her part of us. Hetty Kahn who is an informatics specialist on my staff and so works with this committee. Marietta Squire, who I am sure you have heard from and you will be hearing from her and Nicole quite a lot. And if you do have problems or issues with any of the logistics or any questions, please let us know, and feel free to contact them. And Katherine Jones, who is the team leader for the NCVHS team on my staff.
We will try to make your service on this Working Group as convenient as possible, within the constraints of course of working with the Federal Advisory Committee and with the federal government. We will try to be as user friendly as possible and as accommodating as possible, because we know you are all volunteers, and that includes everybody on the committee really. We deeply appreciate your willingness to provide advice to the Department.
DR. TANG: I am Paul Tang, Palo Alto Medical Foundation. I am an internist and chief innovation and technology officer for our group. I have done a lot of medical informatics for the past 30 years – I may be older than Bill. Spent a fair amount of time in the past ten years in health information and health information technology policy in Washington.
DR. CARR: And now we have folks on the phone. Patrick Remington.
DR. REMINGTON: I am Patrick Remington. I am a professor and associate dean for public health at the University of Wisconsin in Madison. I have been, almost 30 years now as a physician epidemiologist, beginning at the CDC working on the behavioral risk factors surveillance system. About 10 years at the State Health Department, working as a very chronic disease program chief medical officer, and then 15 years here at the University of Wisconsin.
For a decade we worked on a way to engage our counties in improving population health by ranking the health of counties. We brought that approach to the Robert Wood Johnson Foundation now three years ago, and do that annually for all our counties in all 50 states.
I am very interested in not only the accessibility data, but the use, and in particular the communications group, policy, community, the media, and the general public. I am very excited to be part of this workgroup and I appreciate being invited.
DR. CARR: Is Leslie on the call? Guess not.
(Introductions around the room.)
Agenda Item: Discuss Work Expectations and Schedule
DR. CARR: We really have had a lot of exciting discussion today. I mentioned that Todd was here earlier doing his Todd thing and getting everybody even more excited. I think what has also been exciting is we haven’t quite formally introduced the new Work Group members to the full committee, but in a way for those who were able to be here and listen in. You got a little flavor of what we do. And really the kind of crossroads we are at right now managing old systems, thinking about new systems, and trying to think about how we can point to convergence.
But I think today is going to be a good orientation for you on hearing from Jim and Greg or Niles, and then maybe have a chance for some discussion about next steps.
MR. SCANLON: Well, again, thank you all for agreeing to serve. When the Secretary and Todd asked us to look at how can we provide HHS on a systematic basis, what kind of the best advice and assistance in the area of getting our data out where it can be used in innovative ways and in useful ways, we thought about this committee and it was a natural fit, and we thought what we needed to do was to augment through a working group that could focus on this issue specifically, through the various expertise you see around the table.
And I am happy to say just about everybody who we asked, and all of you highly recommended, just about everyone accepted and we are very pleased to have you today.
Generally your role will be to assist and advise HHS. And it is really a fairly wide open role, on giving the data that we have or are developing. We can go through kind of a briefing process where we can — some of you know very much about it, some of you know very much about specific parts, but we will try to bring you up to date briefing wise, on pretty much all of our major data systems that are intended for public, at least for research analytic public health use, obviously not the internal personnel systems. Some of them are meant to be used by researchers, analysts, public health, quality measures and so on.
Some of them really have other purposes as you know, but we can re-purpose the data or use the data for really to promote public health and health care more generally. And some of the data are quantitative in nature. Other data as you know, are really more vocation based, like where are community health centers are located or where are maternal and child health grantees are located or where are quit lines – the number of various cigarette quit lines. Some of it is quantitative data. Others are more individual oriented or location based. It is really for individuals to get help or to find out more.
Again, we are hopeful that you can bring us your perspectives to look at what data do we have exactly. How do we disseminate it now? And there are a variety of means, as you know. We will talk a little bit about that today.
And then are there ways that we can make better use — you can recommend that HHS, if we could do something differently, if we can provide data somewhat differently or even tweaks here and there or some new data. Is there a way to make it available for the various communities that we would like to see here?
In many ways the research community I think, they are sort of aficionados, they sort of know what data we have. They know exactly where to look and they have well established relationships, but clearly there is room for improvement there. And public health community for the most part and statistical community, sort of know what we have exactly. They know more acronyms. They know exactly it. We sort of accept that the way it is now is the way it has always been and will be, and it doesn’t have to be that way.
I think we are really seeing now that we are getting increasingly more demand for the data for all those purposes, for health, public health, patient decision making, provider quality measures, community decision making, community health indicators.
For all of those various communities, I think everyone believes we could do much better. Just starting with the data we have if it can be packaged and made available in ways. And, again, we are always — the limiting factor in many ways as you know, has been in many cases the privacy and confidentiality. Can we do this in a way as we go to increasingly more local levels, but we still are able to protect privacy and confidentiality? Again, that is something you have all worked with.
We have probably pushed within HHS the data, at least as far as people believe in terms of geographic level data, we are always trying to get it lower. Typically it is national data and in some cases it is state data.
In the case of CMS, where we have all the administrative claims for Medicare, we literally have a universe of that whole set of data, and Niles and others are trying to put that in a meaningful way.
So let me talk a little bit about sort of what I envision the process to be and a little bit about how we disseminate data now. Then I really would like to open it up just to see what would be most useful to help the committee.
In terms of the data we make available now we have tried to push most of the data in machine readable form or datasets and tools to one website, healthdata.gov. We gave you a packet in your — just an introduction where there is a new platform now that we have changed for healthdata.gov. It is hopefully more user friendly. It is working now. But our policy basically is to move the data to that website so that people can get it without a lot of fuss, without having come to us, without having to go in our research data system center, where we can make it available putting it there. That is one part of our strategy and we will try to keep moving that forward.
We have, I think, probably close to 300 tools and datasets that we have already posted since 2009. We have a process whereby we are posting. We are constantly looking for more data to post and make it available, as well.
One of the other areas as Jim mentioned, is the Health Indicators Warehouse. Now the data here is more indicator type data. About 1100 or more indicators, largely county level I think, but could be other levels as well. Again, we are pushing trying to get indicator type data there as well.
Our agencies have public use data files. I think many of you are familiar with this. Though I have to say with privacy and now that we have experts who make a living trying to re-identify some of our participants and patients, we probably have a little less of the public use data. But basically it would be our survey data where it is this probability sample of the population.
We publish it in a way where all the identifiers are taken away and the geographic level is limited so that you couldn’t — it would be hard to piece together the re-identification. We used to have a number of public use data files available that way. We still do for our major surveys. And all you have to do is you go to an agency. You sign an agreement that you won’t re-identify, try to re-identify, and use only for statistical purposes, and basically it is yours.
But I have to say these are all over the place. You would have to be an aficionado to know where to get it. We could clearly do a better job there, pulling it together in one source. And it is not exactly what we can publish so far on healthdata.gov. You need some sort of an agreement. But we will ask you to look at that as well.
Then we have as we move along from public use data to data with the data use agreement going to the agency, we have other data that is really restricted access. It can’t be made available publicly. And even if with all identifiers have been deleted as you know, there is still enough information even on claims data as you know, that an individual who really knows the area and knows the dates of procedures and so on, it just creates the conditions whereby the potential for identification exists. Our agencies have developed very sophisticated means to make that available.
Generally it is made available through a research data assistance center where the researcher or whoever the data user is, you would go there and request the data, Medicare data, Medicaid data and other data, then you would use it for our purposes. You would agree to a data use that you weren’t going to disclose it or sell it or obviously contact the people who are part of that.
And then there are other ways where really the data is so protected that you have to go — you literally send your request into the agency and ask for — you tell them what your analysis is. These are the research quantitative approaches and then they will run it for you. NCHS has a center like that, AHRQ, and there are a couple others like that. Even there I think we could do a better job of making them more available.
We have everything from just here it is, we collected it, there is no identifier here, you can have it all, and it is on healthdata.gov. To more and more restricted use, and largely because of confidentiality promises to respondents. Again, there is a scale of how it is made available. A lot of programmatic data, survey data, public health data. Many of our programs as I said, collect data as part of just managing the program, but they produce very interesting and useful data as a byproduct and virtually everyone now is at the point where we get data that can be used to help identify the program and help identify areas of need and so on, and the data can be made available, as well. We will take you through these various types – probably agency by agency.
We have in your packet just an abbreviated guide to HHS surveys and data. That is probably the two dozen most frequently requested data. But we have a much more detailed data inventory under development that is being updated that where a lot of other data, including program reporting data is available.
Again, we will be asking you to think about from the communities you represent, when you see that data, how might it be used, at what level, for what uses, and how best can we make it available.
And, again, we have as I said in HHS, really probably world class researchers, epidemiologists, statisticians, statistical folks. We know how to collect the data and get the data. Get it to a certain point. But I think we have to — we are not experts at the applications that you know about, to take it from that point. We could do it as a hobby, which we have kind of been doing now, and now we are taking it more seriously and we have a concerted effort to do this, thanks to Todd and others, to make this available. But we really need your help in terms of you make a living doing this. You know what is needed better than we do. You could probably look at what we have and say, gee, if only that were made available at a county level or if that would be available as an indicator, or if you meshed up your data with EPA or with Transportation or with HUD, which we can do, how much useful it might be. Generally that is the stage.
The process I think we can — whatever you think would be useful. But we would probably start by briefing you with the various data systems. For some of you it will be old information, for others I think it would be fairly new information. But we take you through those. And we will give you more of an in-depth view on healthdata.gov and the Indicator Warehouse and the other dissemination approaches that we have now.
And then we will be looking for your guidance about well, and even the idea of or ways to protect identity that you found that work that we haven’t tried or not aware of. So I think this is generally the way we will be heading.
We will try to do a lot of our work via telephone calls and via sharing information over the web. I think in many cases, we will probably ask you to react to some of our thoughts and plans before we make a big commitment, just to see if our thinking is in the right direction. And then in other cases, we will ask you to sort of give us ideas in a direction rather than reacting to specific resources, and so on. Really our charge here is fairly open.
I think you saw many of you who were here for the earlier parts of the meeting, how much of what was discussed there produces the data hopefully in a standardized form, in a high-quality content, that can be used for many of the purposes. Many of you will remember that the Medicare data, administrative data, that are so prized and useful now, it took at least 10 years before that from the administrative stage, got to the stage where it was uniformed and high quality and so on.
The vital statistics system, which is probably 130 years old, which started out as — that is an administrative record system. I don’t know, Bruce, when we actually got it where we could start counting and analyzing. It takes a while. One of our former chairs said that it was Florence Nightingale who first started this work – analyzing – to understand patterns and use and so on.
We are really I think now, at a stage where the technology where several — I am going into the Todd talk now, we are seeing forces converge that can really make a big difference and more than just a next step, but possibly some defrogging type possibilities. And we are getting — and if technology can clearly support us here, the web technologies can clearly support us here, web technologies can clearly do this in the way of getting the data out. I remember at one meeting, Bill, you were describing the rainbow series at NCHS. This has started probably in the 1960s for NCHS where of their surveys would publish the results in a colored report, I hate to admit that I co-authored some of those myself, but we are now at the point where we could do those as well, really there are a lot of other ways to get that kind of data out and that is fairly statistics research journal kind of information.
Let me stop there and see if folks have questions or have ideas about how to proceed. We want to be efficient. I don’t think we want to do things just to have rules or procedures. We would like to start perhaps with some — after you get your feet a little wet, maybe with some low-hanging fruit that we could — because we are constantly trying to develop these areas anyway.
And then the other in terms of how we can work, certainly your own expertise, the information we present to you, you can hold a meeting or a hearing where you can call in various communities that you want to hear from. You may want to hear from the venture capital community. You may want to hear from the patient community. You may want to hear from the consumer community, from the local public health community, from families. You can imagine a number of these. There are various ways to collect information, but hearings are one way as well. Virtually any part of HHS we can brief you on in terms of what we are planning.
DR. CARR: I was just going to say not everyone was here this morning when Todd had a couple of asks, so I think I, on his behalf, share some of the asks. There were three basically. One is health data supply improvements – so that it is following on what you were saying, Jim. Can we get from a one-year lag to a one-month lag? And you mentioned also about the Niles and new Office of Information Products and Data Analysis. I think it was something around mind-blowing pace — need input from the Work Group where to go beyond that. Looking at workgroup as sort of a reactor panel.
MR. SCANLON: Set the standard high.
DR. CARR: Mind blowing pace – write that down. The second ask, how to bridge the gap from ideation to beta testing, especially in the live environment. We talked a bit about what Paul et al are doing at the Palo Alto Medical Foundation, where creating an incubator. We have created a challenge and the winner gets to work with doctors of Palo Alto Medical Foundation. I didn’t get all the notes on here. But basically — I don’t know if you want to say a word about it. But you get the idea.
And I think it kind of runs through even to the work of the committee today, because we have been talking about the big picture policy things and what we have been learning is that when we have an idea — what we heard at our hearing was that things don’t always work that well at the local level. But the idea, and getting it to local environment needs work. I think that it is a great point that he made that it is really we are getting the outcome we are looking for. We are going to talk about the process, but if you don’t get to the outcome we have wasted our time.
And then his third point was just about health data analytics massively hot, quote, shortage relevant to demand, grad programs, how to scale, acquisition of expertise, open data initiatives, launching copies and education and safety online universities, et cetera. The data analytics is the number one bottleneck. I am not quite sure how the crosswalk of that. But I think the two things are being a reactor panel to things that are afoot and then really thinking about some of the things of how can we get it granular an idea into a beta testing.
I think the other thing — you made this point yesterday, but there is going to be sort of two direction in this committee. When it is a reactor panel, the HHS asking for a group and a response from the group, as we touch on things on policy, privacy, things like that, we want to have the deliberation of the Work Group and then come up through the committee in a customary way of vetting that and marrying it up with work previously done.
Greg, did you want to add anything?
MR. DOWNING: I think many of your arrays sort of well immersed into the culture change around these, is of data and the directions are going have been involved in a good number of the events. If you were here for Todd, I think he gave a perspective of the trajectory of the applications and services that are emerging from the uses of these data are pretty staggering actually over the last three years, as we have been tracking them from the uses of consumer health data, from EHRs, and from a lot of the resources that have been emerging from HHS.
I think the one thing we have noticed is that the degree of collaboration across agencies has been a pretty significant thing and that the infrastructure and capabilities of really tapping into the know how within the organization, is one of our prime goals is that not only just publishing data, but key understandings about them.
I am sure he mentioned this morning about the digital services strategy, but this will have a transformative effect within a lot of the deliverables that are published from NLM to surveys and to content on the web. The technological platforms that are emerging are really quite exciting in terms of proving the liquidity of not just data but also content images, concepts, tables, what have you. There is a substantial amount of work going on across the Department on adapting common principles to improve the uses of data overall. We have lots of websites. That in itself, has been a challenge, but the ability to tag information and develop a new capability for serving it up and machine to machine interactions are going to be a rapidly emerging area.
Jim, did he mention our new CTO this morning?
MR. SCANLON: Yes.
MR. DOWNING: We have both the chief information officer and a new chief technology officer, who are well known for their innovations and the government space as well as in the private sector. The Secretary and Deputy Secretary have been very supportive and committed to continuing their own interest in this. They are frequently asking to hear about new examples. They are very much aware of this subcommittee and they are interested in what ideas you bring to the table. We are grateful for your service and to the NCVHS for being able to sponsor this subcommittee.
We have a number of new staff within HHS’ offices to support information needs and content that you are all looking for in examples. I know Todd expects that all of you are going to be messengers and active participants in the data future. If you really get good at it, he will make you a ninja, but that is pretty hard.
Anyway I just want to thank you. It has been a really important journey for all of the Department. And I think many of you have made enormous contributions that these are now becoming well recognized not only in health and health care, but beyond that, and the whole aspect of knowledge economies within health care that the value of information is definitely present in terms of the missions of improving access and quality and the efficiency of health care systems overall. I just want to say thank you and we are here to help.
DR. CARR: I want to add that Kalahn Taylor-Clark has joined us. Welcome. What we did was we went around and just said a few words about where you are from and the work that you have been doing.
DR. TAYLOR-CLARK: I apologize for my tardiness. I was actually at another board meeting. Today has been the day. And then I got stuck in Silver Spring traffic. My name is Kalahn Taylor-Clark and I am the director of health policy at the National Partnership for Women and Families, where I lead a large portfolio of issues of health policies ranging from health information technology to disparities and access in quality of care, to a lot of the work that the innovation center is putting out with the new programs around comprehensive primary care, partnership for patients, ACOs and the like.
A lot of what I have been focused on at the National Partnership is around patient engagement, very specifically, how do we define that. How do we incorporate that concept into the work of the grantees? That is where a lot of my work was.
I come to you in a different capacity of course, with the health information technology background. I see Bruce Cohen over there, which makes me excited because I worked with him when I was the director of research at the Brookings Institution at the Engelberg Center for Health Care Reform, where I led their health equity portfolio and their patient-centeredness portfolio. That work was really around trying to get race, ethnicity, and language data. Now we are actually extending that to sexual orientation and gender identity and some of these other data elements into HHS systems. I am very excited to be here. I look forward to working with everyone.
DR. CARR: We have painted quite a broad landscape, and I think on everybody’s mind is what is the next thing. What is going to happen? What will be the next thing? Perhaps it might be worth just going around the room to hear from folks, what are the intersections that you see and then begin to craft a strategy.
MR. SCANLON: Or if you have questions or suggestions. Does everyone have a sense though of the direction we are trying to go in?
DR. KAUSHAL: I think it would very useful to drive focus if there is a portfolio problems that you want to help them. We should get that done as early as possible. I think between all of us we probably have a range of problems or ideas that we have. Maybe the first step is amalgamating everyone’s ideas – and maybe we can do this over email, but then trying to find a smaller number that we can focus on. I come to this from the demand side. We need to get data quickly. How do you build a scale of architecture? How do you get access — that is just some of the things that I am thinking off top of my head. I think we all have our own ideas and we should get it out on the table.
MR. SCANLON: Does everyone feel reasonably familiar with the data, the kinds of data we are discussing here? In which case we won’t have to spend as much time going over.
DR. VAUGHAN: Certainly from an epidemiology point of view, that is for sure. I guess what I would be interested in understanding better is what does success look like for what we are going to generate? What are the measures of success or the timeline of a success? What is news you can use short, medium, and longer term that will carry this really important series of issues forward?
MR. DAVENHALL: I would be a little bit more interested in the meta-data part of this. What is the data? What is the geographical specificity of that data? Who are the potential audiences or who is the audience today? And that is the other thing we talked about yesterday was do we call these people customers or audience or constituents. I don’t know exactly. Do we need to settle on something so we are all talking about the same thing. There were two uses today, and then we can help you identify who some of the potential users are that perhaps aren’t included in that yet. But it just goes to regard to the briefing. Before you start giving the briefing you probably ought to get some consensus of what we would expect. What would be some of the aspects of that briefing.
DR. ROSENTHAL: For whatever it is worth in the spirit of real time based on Todd’s wants and needs, and that the nice thing about Todd is he doesn’t tell you where he thinks you should go, but where you should go. I put together some notes and material around those three or four topics with what he said, where I think he was leading, and some of my own thoughts around that as well, I can share that at leisure including meta-data.
I think it might be worth also thinking about some things we don’t want to think about. There are a lot of assumptions in the room in terms in terms of privacy. We have already heard a few of them. And being a product development guy and a statistician, somebody who likes to look at data, I would like to test some of theses. I heard you a little bit earlier in the full committee saying lets do these panoramic views and look arounds. You have heard about that. The people who re-identify things. I have never encountered those individuals and I have a pretty good sense of the market, I would like to do a little bit of look around and see who does that, what companies offer that service publicly or not publicly.
Maybe even look through now that you have instituted instead of this platform and you are able to use that with your own data-driven product development, maybe look at what files. People look at it as quantitative data behind what is interesting and what is not interesting. I would like to look at some of those at risk files that have the potential for re-identification to see how many times are those downloaded a month and get some sense of the risk.
Aside from looking at what you want to look at, what we may have in terms of ideas, I would like to potentially challenge some basic assumptions. I think that might be worthwhile as long as we have this opportunity.
DR. REMINGTON: This is Pat Remington. Can you hear me?
DR. CARR: A little bit louder, but yes.
DR. REMINGTON: Okay. Well, I just like to second that comment about identifiable data. I think what I have seen over the years is the slow but progressive redefinition of what is potentially identifiable and certainly well intended. Again, I think the end result is that we sometimes tie both hands behind our back and claim that we can’t use the data because of fear of confidentiality.
I agree that we ought to look at objective measures of not only the actual misuse of the data, but potential and then try and balance it with the opportunity costs when we don’t provide local area data.
Again, I just think that question is fundamental to the use of data where many people argue that the data is most useful the closest to the community you can get, but that always bumps up against confidentiality concerns. Some of which I think are real, but most of which I think are perceived.
DR. ROSENTHAL: Another tenet to that I would suggest also — I forget what the terminology typically used for this – scan or walk-about. I am sure I am using the wrong term. I knew it was something kind of original something like that. The idea where you look around and you basically say let’s talk to some industry experts and see are there businesses around that. Are there services? Are there clients? Are there industry reports indicating that this is real and if so, what is extent of that. It may very well be. I don’t have a sense of what is real and not real so I would like some qualitative information to answer this.
MR. SCANLON: I think a fair amount of that came up in HIPAA. You don’t need a lot of them. All you need is the threat of one that gets the gatekeepers. It is not necessarily that it happens, it is that the privacy folks and the gatekeepers throw one in front of you all the time. But I think you are right. In that we think we actually employ some of these people to see if it could happen. But I think you are right. I think we ought to look a little more into it. I would look at it in a positive way. Are there ways to make them available without doing this?
DR. ROSENTHAL: Get a sense of what might really be going on there might be some other measures that might be quicker and more meaningful than just pumping out synthetic files. I am saying getting a real accurate use of what is an instance of threat, but what does a market threat really look like. Then you can start thinking about some of the architectures that work.
MR. CROWLEY: One of the themes that seems to resonate throughout the charge here and what we are speaking about is, is sort of how do we create this learning community around the data. There is a lot of different roles to be played by different stakeholders from the business side to the public side to the research and academics. What is the structure of the public/private partnership and what is the structure of the tools and mechanisms that enable the feedback from these different groups that will help answer all these questions? I was really delighted to see some of the changes to healthdata.gov where they are enabling some of these social features to enable that feedback that just – the theme has been moved forward to think about what are the specific outcomes, the metrics, and what we want to walk away with. For a guiding principle what is that learning community that we can maybe enable through the design and use and sharing of these type of data elements.
DR. ROSENTHAL: Just broadly speaking, are you guys interested or at least open to partnerships, not just Stanford or MIT, but some of the platforms they have coming online. Todd hinted around about that, things like that. Stanford data mine certificate that is real courseware, $10,000 gives you a mark of excellence in terms of very high end integration. I guess open to those sorts of public/private partnerships. There is a strictly internal that we should be considering.
MR. SCANLON: I think as long as – it is different perspectives, but certainly you could advise and recommend that HHS pursue those as a direction starting with our data presumably, or at least partnering with the data. Sure, I think we are wide open here.
But the starting point I think — and I think this is where Todd – it has to be data that we have or could get either alone or in combination or could be developed. That is one of the dimensions. And then I think you are right, Josh, are other partners in all of this who could do the whole system and we don’t have to do it all ourselves.
DR. ROSENTHAL: And particularly piggyback on the educational piece of it, not just data, but how to turn it into information and that lends itself towards other players and partnership I think in a slightly different way.
DR. COHEN: That is certainly a path that I would like to see us follow. I guess there are two independent paths that intersect. One is liberating the data so people can get access to it and figure out what they want to do with it. And the other is transforming data into decision-making tools for whatever the audiences are. The audiences I would like to focus on are community and individuals, and being self serving, local government to make better policy decisions.
At some level I think there is never enough data out there, but there is a lot of data out there and where we lag is transforming numbers into decision-making tools for our target audience. That is the space that we can contribute in and that I would like to see us move towards.
DR. VAUGHAN: Actually let me just jump in before we lose the thread. I think there is also perhaps another way to look at it in terms of there being a spectrum from what is completely an open data about restriction to something through something that is open access where there are real concerns whether they are real or as a practical matter, hard wired into the system that some data are always going to be private, but there should be open access to qualified researchers, people who are agreeing to these other structures and requirements to look at those data freely, to look at other people’s analyses freely, to look at funded research freely, and to see are these numbers adding up to sort of making real clear and coherent and responsible decision makings.
I think along with that though is the consideration somewhere in all this of what does human subjects review look like now because you have a whole group of developers and engineers who are using data without any background training or framework and making whether it is individual data or not, making assumptions about large geographical wide areas and uniformity and conclusions and that can conform policies sometimes. And perhaps that is true and perhaps it is not. How do we innovate human subjects review in a way that is responsive and responsible to communities and individuals who are entrusting us with information and stewardship of their data moving forward.
DR. ROSENTHAL: That type of information is exactly what I am talking about putting in a learning center. If one of the problems is there is a lack of young intelligence, you can use this data and there is going to be a shortcoming in terms of work edge, one issue is how do you actually teach the people the content of the data. And we have talked about different systems and things like that.
The other questions surrounding it, earlier this morning we talked about privacy or we talked about interpretation and that sort of kind of second-order type of material. I think it will be fantastic.
MR. SCANLON: This comes up all the time. This would be very useful if you could — almost every data set we make available or decide not to make available, besides the data quality and the other issues, people are worried about so it is free now and anyone can analyze it or say anything. What happens there is I think, we have argued about, the director of NCHS might say, maybe I am not going to worry about it. Once it is out there it is up to the society itself to worry about it. And there are others who say that yes, I have to worry about it. I have to be sure that it will only be used for good purposes. If you could help us think some of that with some ways that would be very helpful to us.
MR. CRAVER: Representing NCHS, I would be remiss if I didn’t say that the mosaic effect is something that we are very concerned about and the potential for inappropriate disclosure of confidentiality is written into our legislation. It is one of the limitations that we are very well aware of. We spent a lot of time doing review disclosing all of the datasets that we release and take that very seriously.
What we probably need help with is the mosaic effect. It is not just the disclosure risk of one particular dataset, but it is when mash ups occur and people are potentially re-identifying or doing things that maybe they have no ill intent, but are opening the door to re-identifying a survey respondent. I do think that there is a difference to consider when we are talking about administrative datasets versus survey data sets.
One of the things that you hear about is our response rate. There is a very real fear whether that can be objective examples can be pointed to. I think it is a good question. But there is a very real fear that if there were disclosure of someone’s data, if confidentiality was violated that our response rate will be hurt. And we struggle enough as it is. You have to know that the fear and mistrust of the government causes doors to be slammed in the face of our interviewers. It makes the interview very expensive and collecting this data very expensive.
I think it is something that we need to consider. I absolutely appreciate wanting to get very real examples of individuals who are attempting to do this, who are doing this, who are selling their services as big data miners to commercial and other enterprises. Whether they will talk to us or not I don’t know. Whether we can afford their per hour rate, I am not sure. But the mosaic effect is something that we absolutely have to address.
MS. QUEEN: Leah, when you mentioned the human side of protection, we may have to rethink what is on the consent forms because currently people are consenting for their data to be used for some research and statistical purposes. There may need to be a revisiting of the language that is used there for informed consent.
DR. COHEN: Who owns the data, is probably the question that we probably should start with and a lot of considerations. The fundamental identity data in this country, the birth date and death date, are owned by the 57 jurisdictions that are responsible for collecting those data and there are 57 sets — there is actually more than 57 sets of laws and regulations, that govern their release which is an issue that NCHS works with the jurisdictions to try to address. But as we liberate the data, we are certainly going to have to be concerned with some of those constraints.
DR. CARR: I hear three themes that tie very much with the fundamental theme of NCVHS, which is stewardship – data stewardship. I think Josh’s question is a good one that we get quantification around the magnitude of the risk. Second, that we identify learning for users of this data of understanding how to use data so that we are not creating misinformation. And the third is, obviously respectful for the regulatory oversight.
I was struck by the fact that as we — was it Todd that mentioned the insurance where you could look up insurance. Anyway went to a website and they are asking me my name, my husband’s name, our height, our weight, if we smoked. I actually didn’t go all the way through it. People are giving up amazing amounts of data that is readily identifiable that they are giving willingly. And it is interesting, when you look at something like that insurance thing – I put in a fake data of birth – just so you know so they couldn’t find anybody.
Thinking about we have to be respectful of the consent and the regulatory oversight in those datasets that are very strictly regulated, but there is also all the rest of it, which speaks to the mind of the US public. They are in many cases, okay with giving up data because it is going to help them. Someone will call them and tell them what health insurance to get. Fair enough. That is what they want to do.
But I think these two themes about what is the risk and secondly, is there evidence that it happens or where or how or by whom. And then secondly, are people educated users of data and have we afforded them, and Todd mentioned this this morning, the opportunity to learn how to use data.
DR. ROSENTHAL: I would like to say something really quickly, a nuance a little bit. There is a definitely an issue here of respecting an individual who wants to contribute to this like outside of where I am legislative where I can’t share. Individuals may want to share because you give them value. That is why I let the credit card send me things. That may happen on the medical side as well. If that is the case, and have individuals who want to contribute that or for altruistic reasons as research, it might be worth thinking about.
Just got done hearing Todd say this fantastic mandate where we are going to push any IT system it has to be machine readable and that is a mandate. How about considering something as wild and crazy as a mandate where we respect individuals who want to contribute either at different levels for altruistic reasons, to get value out of something, et cetera, et cetera.
My comment isn’t so much about does this happen or does it not happen or mosaic. I think, number one, you want to understand where the business value is in terms of where are you going to have market forces. As I look at some of the data, county is not useful to me. Neighborhood is not useful to me. Contract is. It is a higher level grain, but it is useful because I can walk it over to the payer and compare. I may choose to do county. I may choose to do contract. But knowing where I am creating business value, which I heard someone mentioned opportunity cost on an individual, but we are responsible for both of this. Loss and gain. Instead of just risk assessment and knowing the particular, if you are craft legislation and recommendations and spend money and want to know what is the nature and the magnitude of the risk and I don’t have a sense of that. My general sense is there is not a good sense of that. Not just knowing the magnitude of it, but the nature of it allows you to do crazy things.
Maybe I will just do synthetic file and everything else is public/private. Maybe it is information. That is one thing I haven’t heard too much about. I have heard a little bit around using information in a meaningful way.
If I look outside the health care data space, I see a lot of information through kind of public data explorers. I push the data into one of these explorers like Tableau Public or Google Public and they do public/private partnerships I think. Read/Write Web did one where the winner was comorbidity factors for juvenile diabetes. Hundreds of thousands of people using this thing, using your data. You are not involved in it. But the point is it is not using your data, the data is secure. They are playing with analysis of the same file which goes to the privacy issue, et cetera. My point is there is nice work around that could solve all needs potentially once you get a real sense of the nature and risk of it.
MS. QUEEN: Can you explain what you mean by local level, county level, contract?
DR. ROSENTHAL: I am using an example from like impact work(?) where they are talking about PubSim file and they are saying what should we do for this mosaic effect. That was a call three months ago or something around that. They are saying what level of grain can we release this data at. Some people are pushing for county or for this or that just because they always want more, more, more. You have the private guys as data pigs or whatever, saying I want more, more, more.
Then when you take a step back and say what do you want to use this for. I want to use this to tie it to pay for performance for payer populations for MA. Well, the unit of analysis and payment is of the contract. Anything more granular isn’t going to be helpful. Now, you have a decision to make. If you want to be able to crosswalk over to the payer side where you can create a decent amount value, that level of grain is contract. You don’t need to go further at that business goal. You can at least make the decision informed. Here is the mosaic effect of risk, are we going to do it? And then with the hope or to balance the reward, of being able to walk member level claims condition prevalence over to contract P for P.
I am not suggesting that you do that or not. I am just suggesting that that is an intelligent way to have that conversation rather than people pounding on the table saying, I want more, and people saying no, no, it is scary.
MR. DAVENHALL: Part of the thing you ought to do — part of the strategy would be to have those hearings. I am just sitting here thinking about all the people I know who have specialized for 30 years in examining spatial risk of identification geographically in public health. You need to hear from these people because they have invented ways to mask this data into – they call it jiggering. It is a term that is used. A technical term.
That is the part I would like to just say. I just came from a meeting in the social services area where basically we were told that if someone says they can’t share the data, ask them to see the statute. Start the conversation really hard. Say, show me the statute that says we can’t share it. Because they said most of the times there aren’t very many prohibitions. Even with HIPAA. It is about how you are going to share it and the conditions under which that sharing can occur.
Part of me wants to say, I like this idea, Josh, of getting some people in here who can talk to us collectively about what are those potential risks. There is no book. There is no magazine that I know that I can read, that sort of chronicles all the reidentifications that have occurred that have landed in court at huge settlements. Is anybody really aware of those?
MS. QUEEN: We’ve got some for HIPAA.
MR. DAVENHALL: Is it a secret?
MS. QUEEN: No. They change the penalties.
MR. SCANLON: Those are formal HIPAA complaints and certainly we count data breeches, but that different. That is a data breech where the holder of the data messes up. The disclosures this way, are very — first of all I don’t know how — it would be hard to count, but there are folks who at the beginning part of HIPAA you will remember, they actually published some literature which they just showed you how you could do it by mashing hospital discharge data to other data. But we have overcome some of that now by the kind of data we release. We should just get one of those red hat folks – to show you how you do it. Then it can be done. It is not how many. You are right. How many times —
DR. ROSENTHAL: On the other side there. You can get your major in comp sci and resynthesizing.
(Simultaneous comments)
DR. ROSENTHAL: There is newly emerging specialization in computer science where it re-synthesizes while maintaining statistical integrity for your questions. The point is there is that and there is opt in or there is pure synthetic. There are other ways to go about it. Once you do a real needs assessment.
PARTICIPANT: Todd talked about this morning. It seems like this is high on his agenda now. Will there be one of these that the department will create a corollary to this at the departmental level?
MR. SCANLON: We will certainly have an effort. That is an administration wide plan. The new version on – it is digital.gov. It is more than e.gov. I think every department will try to adopt everything there.
DR. VAUGHAN: I would be curious whether we could go back to the educational point because that covers a whole lot of ground. We talked earlier today about bringing on the community as a level of analysis engagement and participation and we have different communities. We have patient civilian members who aren’t members of the health team. But I think to a very real extent, I wouldn’t automatically assume that people in local government, in state government, in county government, feel that the data flows as easily or as transparently as it might right now, especially given the budget constraints.
How might we look at approaching education of these other colleagues within public health medicine, let alone the public, let alone technology or those different — they are interactive, they have to interoperable, but what might some of that look like. You had brought that up so maybe you might have some thoughts.
MR. CROWLEY: A few things, education is one piece, making sure people understand what the resources are. I mean patients also means researchers and research communities. As people are developing new insights using these datasets and finding new ways to do new algorithms and new ways to share data and new ways to create mash ups and integrate these things to do new things. How do we feed that back into communities so people can leverage those findings in an intelligent way?
Likewise, within these datasets where there are pieces that aren’t as integrative that people are having difficulty using. If there are specific problems that could prevent them from using the analysis even though on the surface they should be able to, how are we creating the mechanisms for that to be fed back in from the community so that we can have a collective problem solving experience?
One concept we have been looking at that is interesting about open software and sort of those algorithms. You can almost think about open research data. As you start creating new mechanisms for using datasets to get new insights and inquiries whether it is comparative effectiveness or other things. How do you create those algorithms in a community environment to allow others to leverage that, build on that, and create that sort of open learning community?
DR. ROSENTHAL: I think it might be worth bucketing or just kind of refining our terms in what you want to accomplish with learning and in terms of learning what the data is, in terms of learning and how to create value, and if one of the goals is to spark innovation and to let the market take – you are liberating this data because you are not going to build the products yourself, then the market will take over. It is probably not going to be just a public or a social goods service. If folks outside the space has a high failure rate partially because of a series of perverse incentives from supply inside, that sort of education is one piece around it. Then you have the meta-data and what have you. So identifying who the audiences are and learning and what you expect from them. I don’t expect the local community level to be creating scalable things to do. I do expect them to do other things. What are the content buckets for that that learning. How do you go about doing it?
Meta-data would be one piece of that as well. But any sort of integrated learning platform you can do internally, but right now you are witnessing an explosion. I heard Todd kind of hint around about that from university software, et cetera. Those sorts of partnerships might be interesting once you define what you want to get out of learning.
MR. SCANLON: That is half a dozen or more areas, which is quite good. Do you think that that one point you mentioned, Josh, about the — what are we assuming about – in order to extend this market, create this market (operator interruption). You are right. We have a number of assumptions or not even at the stage of assumptions yet.
DR. COHEN: One term that we are talking about very freely, and I am not sure we have agreed upon definitions. I think we should — one thing we need to do is agree upon definitions about how we use the terms amongst ourselves. When we talk about health data, are we talking about medical care delivery data or are we talking about public health surveillance data? There could be very different audiences and very different applications for the use of these data? Some audiences might want both. Some might just focus on one. Your example is at the contract level, when you are talking about paid claims and insurance data and EHR data, that is obviously a contract level. For public health surveillance data, neighborhoods is probably perfect for some audiences.
DR. ROSENTHAL: That is my example of that. People wanted to marry it. How would you —
DR. COHEN: I totally agree. I think we are investigating both of those streams. But we really need to be clear that there are distinctions in the use of target audiences for them.
DR. ROSENTHAL: And that actually raises a really good question how earlier in the full committee you are talking about speed and things in 2012 and 2010 — part of this is going to be how do you envision the mechanics of this. Where do you want to place — you have this continuum, right, between speed and completeness and precision. I thought I heard a little bit earlier that we want to introduce, or slide the slider a little more towards speed rather than completeness or precision. Any guidance you guys could give us on that would be helpful meaning typically there is a protocol or culture for academics in the room in terms of how you go about doing things. If you want to slide all the over and say, let’s get drafts out by next week and see what happens or is that too crazy?
DR. CARR: Yes.
MR. SCANLON: We actually like to discuss for the administrative data, for claims data, and for Medicaid and Medicare, and for our surveys that providers – Virtually every agency has moved up its production schedule and there are ways to do that in really quite interesting ways that everyone thinks would — you could still do a very nice analysis even with these. We could discuss some of those as well.
The problem with some of the survey is there a production. You have to collect the data, and we are speeding up those. It is like we move for example, many of the surveys, probably within nine months now after they collect the data and then it is released within — we would like to see that.
DR. ROSENTHAL: I apologize, I don’t mean data latency. I mean actually kind of speed of committees, recommendations. There will be a tradeoff there. If you want to go faster that means you are going to have to leave some precisioning on the table and are you comfortable or not.
MR. SCANLON: And the other thing is what does it take to do it if it is like the discussions this morning, where it was literally developing recommendations for regulations where you basically punish people if they don’t obey. That is the force of the law. We are not talking about that. That is why it is a very slow public process.
I think here we are talking about — this is the model of innovation I think. And in many ways is data we have already agreed to release. I am hoping there won’t be many obstacles in the way. But, again, I think we need to be thinking. You will see what the model is now I think. And you will see healthdata.gov is a model, and then you will see some of the other models, and if you think there are other ways to do this.
I think looking at the risk, empirically looking at the risk of disclosure, certainly there are ways to get around that, I think it really for example, just prevents many agencies from issuing anything at a lower level than that. These are excellent ideas, though.
Really recommendations we have ten good agencies at HHS. Some of them are renowned throughout the world in terms of research and largest health payment system in the world.
There is a lot of data at FDA that hasn’t been liberated in terms of very specific consumer data that would be helpful instead of just collecting it in terms of adverse report. Some of the data hasn’t been, as Todd would say, curated and got to that point where it is disseminated yet. Some of it is there but it could be better. Clearly it could be better. I think that is where we are looking for your thinking.
DR. CARR: Jim I wanted to say something. But I think in our remaining 25 minutes we do have to land the plane, and I think the key question is are we in the reactive mode that you will tee up something for us to look at or do you want as has been suggested, a couple of things in the proactive mode in the domain of education and also looking at risk of re-identification.
DR. COHEN: Maybe this is another topic that doesn’t fit in this set or maybe it can be shoe horned in or just put in the parking lot. I think it would be interesting to consider the limits, but also the possibilities of using EMR data for public health analyses. I think one of the assumptions is you get a very rich dataset with electronic medical records. The problem is that is only people who are going to the doctor and to then extrapolate those conclusions to the population is problematic, but solvable probably. Just to put that on the table as another area for discussion.
DR. TAYLOR-CLARK: I apologize for having been late, so please excuse me if you have already covered this. I read the charge of what we were asked to think about. Having reflected on what everybody is saying here, I think I am a little bit confused. Let me see if I can clarify.
What I am hearing is trying to land a plane where you have a lot of different data sources being thrown out. We have a lot of different users of those data being thrown out as potentials for our charge. I am now thinking that there are 10 agencies with lots of data that exist in the HHS. There are some that are untapped, like you mentioned the FDA.
My question is whether there is the ability to get an environmental scan to look at all of those data and to see who uses them, who doesn’t use them, where are those data, how do they get used currently, and where potentially the gaps are I think, we could potentially fill some of that in which is there are gaps here, here, and here. Those are environmental questions.
I like what you said about this learning community because the next step to me would say, now we need a learning community to both share what we have learned in this environmental skin, but also to learn what folks on the ground like Bruce, and others in states and communities are actually doing with these data, and what their limitations are with these data.
And I appreciate the point of EMRs, but we are just trying to get access to EMRs. The fact that this committee would take on that seemingly behemoth of an issue it seems behemoth to me. But I will speak for myself.
My question is whether we might be able to focus a bit on the HHS agencies who have data that is either tapped or untapped, and then moving on to better understand this learning community that I think would be extraordinarily important because then we can decipher between who the audiences of the data will be and who the users will be. That goes from patients which is of course where I care about, how they might use it and patient advocates, for example, to states and local communities, to even the feds who have yet to use some of the data that they even have. I think that that would be an amazing charge for this group, and I would hope that we could move forward somehow in that.
MR. SCANLON: I think that is number one of the charge that the department drafted up. We really want to start what we have. It is being made available or could be made available, but it is what we have already. EHRs as you say, we have a long way to go. We are happy that the adoption is one in five, but we should focus where we actually have the data and people are saying if you did this or you release it this level or made it more frequent, that is sort of the first step.
It doesn’t preclude you looking at future data sources obviously. We certainly would at HHS in terms of our own interest I think I would say this, if you could help us with what we already have. That could be improved. The value could be greatly enhanced if either it is unused or untapped or the way it is being made available in an agency’s website is just not useful. You need an epidemiology degree to even analyze it.
Some of this starts fairly basic. We could go that if — I think we may have a mixed couple of processes here, sort of that orientation part. We could certainly arrange for that in terms of what does the agency portfolio look like now. And I think we will stick to the main stuff now and not go after the smaller stuff, and then we can come back.
But the second is the proactive kinds of thought about how can we remove some of the barriers or extend the data through partners, various kinds of partners. And obviously the audience analysis is the biggest thing. That is hard to do. Surveys are not going to do that for you. You really have to be part of the audience or maybe a hearing or maybe some sort of an issue.
DR. CARR: Again, trying to tie it in with the working committee. I think the environmental scan is something that — we used that term earlier as well, to understand what is going on in administrative simplification. But in fact, key to that, is knowing the data. We are being asked to make a claims attachment to a bill to transmit data, and fundamental to that is to say, no, here is all the rest of the data that is there for other things.
Second, the idea of what is available, how is it currently being used and what are the obstacles, I think would again, keep us grounded in that. In addition the committee has just earlier this year, had a series of hearings from communities. We should provide everybody with a copy of that report. Susan Kanaan did phenomenal work on this. We say it is from the committee, but she did yeoman’s work. It was really very, very important because they are doing great things and very simple things are what they need. It would be an easy fix and an early win.
I appreciate your perspective because I think we probably need to walk before we run and we need to define the problem before we solve the problem. And maybe we start with the fundamentals, as you said, the liberated data, and then where we go from there.
MR. SCANLON: Would it help if — and, again, I think the best way to do this is an interactive way. Niles actually probably knows more about CMS data and where things are going. We have — NCHS, for example, I think interacting with the program office is probably a better way. We can give you written material, but you probably know that already. But we could send you just to sort of familiarize yourself, some of the selected major data agencies and their programs, and then maybe we would schedule for the next meeting more of an interactive discussion with this. This is what we issue. This is how we do it. In every case there is different ways of mixing and matching. This is what we have heard from the constituents. This is what we are worried about. This is why we don’t do that, and kind of in an interactive way would that, maybe for the next meeting, maybe half the meeting would be that. And we would send you written material. But I think in this case, I think you need the interaction with the program, the folks who do this. Does that make sense?
The other thing is I think that all of you have discussed, I think our agencies think the limiting factor besides basic resources, are producing the data, is probably the privacy/confidentiality issue. It just comes up so many times. It is almost ingrained in thinking now. If we could just open that up a bit, maybe that would be the second major theme we would look at.
DR. ROSENTHAL: In the first piece I think that sounds fantastic. The thing I would like your thought on or at least my assumption is that you put the data out there because you don’t want to take it to market yourself. You hope that the market will kick in and use this. In terms of the constituency, I always hear community and federal and state, rarely in that kind of traditional mantra, do I hear private company, for profit company, that has been conspicuously absent repeatedly. On the HEI side, that is almost all that is. Those are private companies coming in. You are hoping the market will adopt it just like weather and geolocations, et cetera. That piece of it would be really —
DR. CARR: We look at how is it being used. There is the traditional. But then really just taking all of the output from the Datapalooza would be the complement. I think that is right of what is the data, how is it being used the old way, the new way, and what are the challenges and data use and privacy and confidentiality? If we grounded ourselves in those fundamental questions I think from there we could build out next steps.
MR. DAVENHALL: I totally concur with that process of working at the low-hanging fruit. There is so much data that is available now with a little tweaking could be really delighters. To give you an example. People complain to me constantly about the quality of the geolocation of health care facilities that come out of the CMS files. Just a little work there to get that right would please a lot of people.
Now, there are also a lot of companies who make a ton of money by taking your dirty data and cleaning it. You are going to make some people unhappy. But in the long part of it, if you are going to have open data that everybody touches, that would probably be where you would want to — again, it is piece of low-hanging fruit. I would say there is enough of that that we could get some real value for the effort.
DR. CARR: Greg, is there any — I know we are putting out the Datapalooza stuff online is in last year’s, but do we have any cataloguing of what we got, what we saw, what datasets are being used? Each one was fascinating unto itself. But understanding what was easy to you, where did people go. There is another layer of learning when we look at the aggregate trending of who used the data and how they used it.
MR. DOWNING: From the data supply side, we have a fairly extensive catalog of holdings that are going into healthdata.gov that are categorized by particular application areas that we can provide you. It is fairly extensive, but we can flip through it and see if the general dimensions of data elements captured in meta-data. We held off on hitting you with that one this time.
From the forum, the reason the forum was established, the Palooza, was that we unlike many other countries who published data, we don’t track who uses it. We have fairly limited scales of numerical data representing when it is used, how many people use it. The notion of being able to figure out what people were doing is that the Palooza is really a concept to see what applications and services came out of that.
All of the presentations both in video form and Powerpoints from the afternoon sessions, are on hdiforum.org. They are all loaded as of today. If it is a cold snowy day in July and you are looking for something to do, you can just go online and get that. There are probably at least 70 collections of what people have done with them, and various formats of demonstration of the applications themselves.
I think those are probably the leading edge places to start to get a sense of what kinds of uses are coming from it. We have other forms that we are trying to establish on healthdata.gov to identify what community needs are. That is a growing resource from the standpoint that you all may want to follow is to what kind of questions and incoming comments that are there.
We actually get probably — I don’t know, Jim, you get some on your site, but probably at least a dozen or more inquiries a week about what people are interested in knowing about particular datasets. And then we will often refer them to program experts within the department.
Agenda Item: Next Steps
DR. CARR: Next steps then. We didn’t need our charter, but I think it was distributed and I think we covered it.
MR. SCANLON: It is a discussion charter, but if you have – I think everything that was discussed today was in the charter.
DR. SUAREZ: There are some important new items that we haven’t talked about from the charter. As I was looking at the charter, there is like three areas around which we would make recommendations. The final product as I understand it, is a set of recommendations or at some point on all these issues. By one day a set of recommendations, but progressively as we discuss and develop things, prepare some new recommendations on different topics.
The three topics I see are this area of monitor and identify issues and opportunities and make recommendations to improve data access and innovative use.
The second one is how to promote and facilitate communications to the public. That is totally different from what we talked about, it is really communication, education, and education outreach.
And the third one is to facilitate access to their opinion and public input regarding policies, procedures and infrastructure to include data. We have three chunks of domains to focus on. And then I was trying to frame the scope. The scope is HHS data primarily.
MR. SCANLON: Based upon a –- right.
DR. SUAREZ: We are not talking about the Department of Homeland Security data or the federal agency data that are outside of HHS. The VA, for example, has a lot of data.
MR. SCANLON: We can certainly deal with other agencies, but I think we want to start with HHS.
DR. SUAREZ: That is what I want to point out. The scoping of the HHS data, government data, public. But I think it is important to consider within this scope, the three major areas to focus on in terms of, plus the ones that have been mentioned, education – not just to the public.
DR. CARR: Thanks, Walter. So actually building on that as we look at item number two, identify and monitor trends and capability and traditional and new information dissemination, data access strategies. That was the one you were referring to. I think that the landscape is a bit about that. What is there, what is being used. Facilitate HHS access to expert opinions – we talked about that. And the other one was the —
DR. SUAREZ: Promoting and facilitating communication to the public.
DR. CARR: That is right, communication to the public. But I think if we start with what we said, we will be aligned with our second element. Then just to get — what happens next, Jim? Do we plan to meet in September?
MR. SCANLON: We will meet at the full committee meeting in September.
DR. CARR: We have to get the dates out to folks – and actually I would favor – I know you are having a retreat part of the day, but meeting prior to the full committee so that we would be informing the rest of the committee on the findings. And then over the summer, are we going to do some of the dissemination of information about — maybe a webinar or something.
MR. SCANLON: You certainly could do a webinar or if you — I think the first part was probably the familiarity with healthdata.gov and our analysis of — what we have there now. The it of what we are discussing, is kind of there now. I think we can add to that, but healthdata.gov. We have some meta-data. I think those are the basic systems we are talking about. I still would like to have some — maybe we could schedule by phone or certainly at the next meeting.
I think you would all benefit if we had maybe the three biggest producers at HHS, it would include CMS, probably NCHS, and I have to think about the third. Just go through each one of them in terms of interactive way about what do they do now? What is their philosophy? How far do they push the data? What is the reaction they are getting? What do they see as the limitation for a further — if the idea is geographic detail, what do they see as the issue? And even the way they disseminate it because you will see that some of this is public and some of it is hard to get. And then I think we can question some of those. That is the review of the portfolio part of it.
And then I would like to get started on the risk assessment, though I am remembering that the literature is a little old. Let us do a little digging there.
There is another member or two, of our workgroup from the privacy community, and they may be familiar with the folks that actually do this work now. In fact, we might have one of them as a consultant at HHS. One would be the data supply, where we are now and what do we see coming, including, as you said, Josh, how often it is made available. What is a limiting factor?
And then looking at this issue of one of the limiting factors that people believe, which is the privacy and the detail. Does that make sense?
DR. SUAREZ: Yes. I was just going to suggest that you framed, I was counting, about seven or eight questions, for each of the data —
MR. SCANLON: What will we want to know about the data?
DR. SUAREZ: Exactly. I think it would be helpful to write out those questions. The first step is doing the inventory of data sources, but the next step is going to one data source and start asking questions about that data source. We want to have that set of eight or nine or ten questions when we are looking at that data source, to be able to answer those questions. And then the next level is addressing the issue of risks.
MR. SCANLON: Rather than an inventory, I would rather – I mean, some of these are major sources and everybody knows about them and complains about them or likes them. There are other things that are — we have them, they are there, but it hasn’t been an issue so far. It is like chasing after the tail. We will get there sooner. We will get there. But I think if we went into some depth with one or two bigger ones you sort of work through the analytic process. That would help with some of the others. Jim, I would think the warehouse we might want to spend some time on this.
MR. CRAVER: Another resource that I was thinking about actually was Health United States. It was just released. Some of you should have gotten copies of the NBRI(?) version. If you haven’t looked at Health United States recently, and it is available online and inside that cover points you to the website. But there is an appendix in there that is incredibly detailed and valuable about their various data sources that are used in Health United States most of which, not all, but most of which are HHS data sources. It might help you. I think it is a very good suggestion. What is the matrix? What are the column headings of the question that you would want to know for the data sources listed in — column A? There is a high amount of consistency in that appendix of the description of each of those data sources. It really is — I don’t know. Price of the publication is worth the appendix itself.
DR. ROSENTHAL: I think I know the answer to this. Also, I hadn’t asked it anyway. In the essence of kind of speed and how much can they get done and simultaneous tracks while maintaining its clarity. Is there a working kind of Google doc with security or is there some sort of collaborative environment if we are going to recommend it for others –
MR. SCANLON: We will send out the information for the group — so we can have discussions. The very nature of this kind of activity is to keep going.
And I think you will see – well, some of you furnished me with all the data. In one way it is quite interesting. But it probably adds up. If you don’t count Niles’ administrative data, it is probably less than $400 million a year for all the surveys and for many of the major data systems. As a fraction of GDP or national health expenditures, it is quite small. You will be surprised to see it.
Now the administrative data is basically where the agency has re-purposed what it needs anyway and is making it available as an analytic resource. It is not a big — whenever we have to try to get additional resources we indicate what a small part of national health expenditures data investment —
MS. GREENBERG: I totally agree with the fact that we did not sit here and wordsmith the charge. I think the purpose of the charge is just really to give you the breadth of what this is about. I think we did it send it to you. I would just ask that if you have any questions about any particular piece of the charge or you think something is questionable or that something is missing if you could just — I will send out an email with this request. Then just reply all and we can just have some email back and forth about it. We can also do it on CHIP, but we will find a way to do. But just so that we have a sense that at least we are okay with this charge and then we will just use it. What will we want for it will be the work plan what we are kind of developing right now. We will after sort of figuring out what kind of lead time we need to — up the programs. We will send a poll for this webinar.
DR. SUAREZ: You mentioned the word work plan. Is that something that will take shape not necessarily on a document or something, but maybe a Powerpoint that describes this is the —
MS. GREENBERG: Then the webinar can be — between now and the end of the year I think we could have some kind of — you guys need to do this in your schedule. We will work on that with Justine also.
DR. CARR: Well, thank you everyone. Very exciting to meet all of you and even more exciting to hear your brilliant thought processes. We look forward to working with you. Thanks very much. The meeting is adjourned.
(Whereupon, at 5:00 p.m., the meeting was adjourned.)