Transcript of the September 23, 2014 NCVHS Working Group on Data Access and Use
[This Transcript is Unedited]
National Committee on Vital and Health Statistics
Subcommittee on Standards
Working Group on Data Access and Use
September 23, 2014
National Center for Health Statistics
3311 Toledo Road
Hyattsville, MD 20782
CASET Associates, Ltd.
- Update from the Office of the Chief Technology Officer and Working Group Discussion
- Building a Framework for Guiding Principles for Data Access and Use
- National Health Interview Survey (NHIS) Data Release
- NCVHS Usability and Access Parameters for Various Use Cases
P R O C E E D I N G S
DR. MAYS: Welcome everyone. We are now moving into the NCVHS work group on data access and use. Welcome to those of you who have come just for the meeting. We have not seen you before so welcome. We welcome those who are in the audience and have stayed with us.
Let me just briefly say what we are going to do in terms of going around the table and then we will do that. Introductions. We will do introductions. After the introductions, I want to spend a little time talking about what the charge is. It is very important that we have that as our background so that as we work today, we are working in a way in which we are trying to really address the charge that we have. One of the things I have learned about group awesome, that is your name for me, that group awesome can be everywhere and do lots of different things. But I am going to have us try and do some awesome things on behalf of HHS. I think if our charge is before us that will help.
After that, Damon is going to give us an update. If you remember, we have this different schedule we are going to try and do. After that, we are going to talk about some guiding principles in terms of access and use that might be helpful to the various agencies. I will talk about that in more detail. Then we have a guest coming in to talk about the NHIS as an example of ways in which they release, put up the data, and we are going to try and be helpful to that person.
And then we are going to have a discussion about some of these issues in terms of some of the parameters for various use cases. That is one of the things I really want to try and promote is making sure that as we are giving examples, we know who that example fits for.
And then finally I am trying to have this announcement time so that many of you who have different groups, meetings, and activities coming up will be able to share them with us. And then we will adjourn. We have some materials on the table. We will actually do that. That will be here.
Let’s start with introductions.
DR. VAUGHN: Dr. Leah Vaughn. No conflicts.
MR. DAVENHALL: Bill Davenhall. No conflicts.
MS. JACKSON: Debbie Jackson, National Center for Health Statistics, CDC, staff to the National Committee.
DR. COHEN: Bruce Cohen, member of the Full Committee. No conflicts.
And before we move on, Vickie has given me special dispensation to make my announcements since I have to leave early. My announcement has to do with a roundtable that we are having. We discussed this morning. If you already received copies of these handouts, you do not need them. But we are having a community data engagement roundtable October 27 and 28. The whole idea of this roundtable is to bring together community data users, data connectors, which are an emerging group, and data providers to have a day of conversation about what advice and guidance we can provide to data providers particularly government data providers about how data providers can disseminate information in more useful forms to the community.
We have an exciting list of invitees. I am here to invite you all to join us. It would be our pleasure to have you all with us. We have folks from San Francisco, Douglas County, Nebraska, Seattle, New Orleans, and Sonoma County. We have Robert Wood Johnson and hopefully the Kellogg Foundation is going to be coming to from the community commons. We are working on getting folks from Chicago.
The whole idea is really to get folks into two days of interactive conversation about how we can provide, I am a data provider primarily, and data that works better for communities. Damon, we would love to see you there and certainly folks from data.gov. If you have any questions, feel free to ask me or write to me and I will provide you more details.
MS. KLOSS: Linda Kloss, member of the Full Committee, co-chair of Privacy, Confidentiality, and Security Subcommittee, and member of the Standard Subcommittee and no conflicts.
DR. FRANCIS: Leslie Francis at University of Utah, member of the Full Committee and the Data Working Group and co-chair of the Privacy Subcommittee of NCVHS.
DR. QUEEN: Susan Queen, director of the Division of Data Policy in the Office of the Assistant Secretary for Planning and Evaluation, staff to the Committee.
MS. BRADLEY: Lily Bradley, Office of the Assistant Secretary for Planning and Evaluation, staff to the Committee.
DR. MAYS: Vickie Mays, chair of the working group, member of the Full Committee, Privacy and Populations.
MR. DAVIS: Damon Davis, HHS IDEA Lab, director of the Health Data Initiative for the Department of Health and Human Services.
DR. SUAREZ: Walter Suarez, executive director of Health IT Policy and Strategy for Kaiser Permanente. I am a member of the National Community, co-chair of the Standard Subcommittee, member of this work group and no conflicts.
DR. ROSENTHAL: Joshua Rosenthal, co-founder of RowdMap. No conflicts.
DR. KAUSHAL: Mo Kaushal, member of the Working Group.
MR. CROWLEY: Kenyon Crowley, deputy director of Center for Health Information and Decision Systems at the University of Maryland, member of the Working Group and no conflicts.
DR. GREEN: Larry Green, member of no subcommittees, chair of the National Committee on Vital and Health Statistics from University of Colorado, Denver.
MR. SAVAGE: Mark Savage with the National Partnership for Women and Families. I am the director of Health IT Policy and Programs.
DR. MAYS: Okay. Part of what I would like to do is draw our attention to what are our goals.
MS. DEUTSCH: Terri Deutsch, CMS and lead staff to the Standard Subcommittee.
MS. WEN: Mei Wen(?), Office of Minority Health.
MS. KANAAN: Susan Kanaan, writer for the Committee.
MS. JEAN PAUL: Tammara Jean Paul, CDC, NCHS.
MS. JONES: Katherine Jones, CDC, NCHS, staff to the Committee.
MS. COOPER: Nicole Cooper, staff to the Committee.
MS. NELSON: Wendy Nelson from the National Institute of Health.
DR. MAYS: Is there anyone on the phone? I think we are good.
One of the things I want us to do is just briefly go through what the charge is. I think that it is important each time that we keep this in mind as to what it is we are supposed to be doing. I am going to ask you to look specifically at the seven areas that we were asked to look at. We were asked to review the current portfolio of HHS data resources. Notice that it is asking us to do administrative, operational, survey, public health and research data. That covers a very wide berth. That is going to be important as we start to think about data systems that we may want to look at and to look at the current policies, mechanisms, and approaches for promoting access and innovative use and applications of HHS data to improve health and health care. Yes, we are going to look at innovative uses, but I think we also need to do it within the constraints of budget and issues like that. We can think about those things. We can put them out. But it may be in terms of when we recommend things, we will think about it within the context of what people are able to contribute in this resource deficit environment.
The other thing that we are being asked to do is to identify and monitor trends and capabilities and traditional and new information dissemination and data access strategies, development and technology, including social media and their application by the data technology, innovation committee and advise HHS on opportunities. That is something that I think in particular we are probably going to find ourselves doing when we are talking with Damon. Remember to keep social media in front of us.
Three is identify and monitor the types of data and information needed by all participants in the health system. This is where this issue of the use case I was talking about. As I said, it is pretty imperfect. But here they are asking us to think about consumers, patients, providers, planned payers, communities, state and local government and the policy research, public health, and stakeholder communities with attention to content, quality, technology, and audience issues. Again, those qualities I want you also to keep in mind are content, quality, technology, and audience issues. We can recommend and we can think about things, but also add that to it.
Identify and study areas of opportunity, improve data access and application and on associated privacy, technology, and other data policy issues. Again, I am trying to make sure that we are thinking about kind of not just we fix this thing, but we also want to think about what are the higher-level policy issues that we want to be able to comment on.
We are to serve as a form for promoting and facilitating creative communication to the public, key stakeholders, technology, and innovate a community about the availability of HHS data and opportunities for its use. When we start thinking about opportunities for use, I was thinking we need to also think beyond just HHS. Should we be thinking about linkages? If we are worrying about health and we take, for example, what really does create ill health, a lot of times that is beyond just a health variable, but it may also include things like transportation, schooling, et cetera.
I am going to say to us that we need to remember that we are to do fixes within HHS, recommended fixes, but some of those fixes may extend us outside of HHS. It seems like it is perfectly fine by our charter to be able to think about that.
Facilitate their access to expert opinion and public input regarding policies, procedures, and infrastructure to improve data access and innovative use. I think what that says is that we can reach out. You are an example of that reach out, but that we can also reach out beyond that when we think we need additional information.
Finally, advise HHS an understanding and evaluation of how HHS data is being applied and the value that it is generating. This issue of the value that it is generating I think is something that I would like to have at some point in time as an agenda item for us to really talk about. Return on investment, part of what how value should be viewed by us and how we can achieve some of those perspectives. That, I am going to put as a later agenda item.
Any comments, questions, or thoughts about what the charge is? Anything that you see that you think we should take up at some point at a meeting?
MR. DAVIS: This is Damon. One of the things that struck me as an opportunity for this group is assisting in the value proposition for the openness of the data that we provide. It continues, as was said, to be an austere environment of making data available. Needless to say, when we go from these activities that are research based or contract based or whatever, they have specific dollar amounts allocated to them. And then to turn around at the end and then say can you make that data available does not necessarily acknowledge that that is an activity that requires additional resources. If we can in many different ways and through different use cases demonstrate the value of the data for its alternative use I think that that can begin to make the case to the department as to creating some budgetary and line items that are going to actually be allocated directly towards the effort for data’s liberation. I would encourage the group to try to explore that in any way possible.
DR. COHEN: Following up on what Damon said, it would be wonderful to get a handle on just exactly how our data are being used. I do not know that we have any sense about secondary and tertiary uses of our data. It would help establish that value proposition. I find our data, federal health data, and state government data being used in ways that I had never imagined before. It would be wonderful to document that.
MR. SAVAGE: When I look at the charge in some ways, it is backwards looking, how are we using things, and yet the needs are forward looking. How do we position ourselves to have what we need in three years from now, which usually means starting now to design it. I would just throw out that, yes, it is looking backwards, but to be creative about looking forward as well and trying to anticipate what we are going to need.
MR. CROWLEY: Building on what Mark said, as we look forward on that roadmap, I think this charter and I think it was a good idea to review these points, I think we can think about how can we place this within a framework to look out three years. And for each of these charter items, we should be in some way hitting each of these. As we start to decide which programs or which initiatives or which projects are the most useful for the team to pursue, use that as the framework to help with that decision and that three-year roadmap.
DR. MAYS: Anyone else?
I think part of what we probably should do is after the meeting to think about how to use these to help structure some of what we are doing. We will not be the slave to them, but at the same time, I think we need to be responsible to them. We need to figure out what the outcomes are and what points we want to hit. I do like introducing this notion of where we are and where we need to be is quite different in the sense of thinking about the future may mean creating something a little bit different. I do want to come back to this discussion, but I wanted to get us started this way. We are all starting in the same place. We are starting with very different perspectives on it, but at least we are guided by the bounds of what it is that the committee, the workgroup should be focusing on.
Let me turn it over to Damon. The purpose of this is to get an update as well as any request that Damon may have on behalf of the department. We will see what we can be helpful in doing.
Agenda Item: Update from the Office of the Chief Technology Officer and Working Group Discussion
MR. DAVIS: First of all, thank you guys very much for having me back. I appreciate it. I will apologize in advanced. I have been out on personal leave. I did not have the opportunity to prepare nearly as much as I would have liked to have for this. I may not have as many requests as I would have liked. But thank you again for the opportunity to give an update as to what has been happening at the department. I am looking forward to a really good conversation.
Once again, I serve as the director of the Health Data Initiative where I look across the department about the various opportunities to open data, our policy landscape, and all of those different kinds of things that are going to be supportive of making data available on healthdata.gov and through other means and through other channels. I am just going to give you a couple of highlights of some things that have been happening in recent times and I look forward to some real good dialogue about these activities as well as additional ideas for how we could evolve, change, or somehow improve the things that we are focused on.
First of all, with regard to healthdata.gov, you will know that that is our platform for open data, widely available across for anybody to utilize data from various operating divisions of optives across the department. Healthdata.gov has grown to over 1600 data sets. That is a very significant improvement in terms of data set availability. I believe last year’s number was about 300. It is a multiple hundred percent improvement in data availability. That data is comprised of over 1000 HHS data sets as well as data from USDA. Ten different states and three major cities across the United States. This sort of subscribes to our goal of trying to make data available not only from the federal level, but also from state and local levels with the idea that health data really is something that should be considered where you are, where you live in your community as well as what you look like from the population health or a broader perspective.
We are working on getting data onto healthdata.gov from some other federal agencies to name a few. We are thinking about the Department of Veterans Affairs, the Consumer Financial Protection Bureau, CFPB, Department of Education and others who may not necessarily think of as having data related to health, but they in fact do.
Another related activity in the open data space has been an improvement in our metadata across the department. Just as a little bit of background, the Office of Science and Technology Policy and OMB have been very focused on getting every single department and government to have open data as well as common core metadata that they make available out of their data sets.
They have a quarterly report in on your data catalog, your data quality, et cetera. At the last quarterly report, which was in May, HHS had a 68.7 percent metadata quality score on the project open data dashboard. I can provide the link to that a little bit later.
This was mostly due to the fact that we actually were reporting in both HHS data sets as well as our state and local data sets. You can imagine that the states were not necessarily complying with the common core metadata standards at the federal level, therefore, diminishing the quality of the HHS metadata.
What we did was we stripped out the state level data and we reported only in the HHS data and that significantly improved our metadata scores. We are very much in compliance with the common core metadata requirements out of the open data policy. We are now sitting at approximately a 97 percent metadata quality score. But there are still improvements to be made on our metadata. We really do want to be engaged in a focused activity that is going to have discoverability of what is in the data set without actually having to download it and look at all of the columns, be something that is actively pursued across the multiple operating divisions. That is going to take a significant effort. But that is one of the things that we would very much like to focus on.
To continue, we just held some health data leads quarterly meeting. Just as a reminder for everybody, the operating divisions each have a health data lead. One person that I reach out to and am a liaison with across the various operating divisions, but you can imagine that with an organization like CMS or CDC with many different centers and divisions and other offices that can be a very challenging relationship to manage with just two people. Many of the different operating divisions have actually set up health data leads, organizations of their own inside the operating division.
For example, our Administration for Children and Families set that organization up to have health data leads from each of its various centers. And then SAMHSA said that sounds like a great idea. Could I talk to ACF about how they implemented this? And now SAMHSA has health data leads that are representative of their various divisions. The open data workforce if I could is expanding across the department, which is very valuable to have as well.
What we are now doing is going around and having individual quarterly operating division health data leads meetings. Instead of having a large group meeting like what we have here where we discuss broad topics, we are now engaged in a process of actually having individualized operating division meetings where we are trying to understand what the data needs of various operating divisions, what their activities are, data that will be released, et cetera.
Another thing that we are focused on is the public access memo for research data. You may recall that this is sometimes referred to as the Holdren memo. John Holdren, who leads OSTP, basically indicated some time ago that any agency in government who has a research budget of over $100 million is required to create a public access plan for their research data. You can imagine that HHS has many different operating divisions that have $100 million research budget. The ones that have to be complicit or compliant with this plan are the Agency for Health Research and Quality, Centers for Disease Control and Prevention, National Institutes for Health, Food and Drug Administration, and we had a volunteer organization, our Assistant Secretary for Preparedness and Response or ASPR volunteer to participate in the Holdren memo compliance with the department.
Just as a quick FYI, an update for you, we are exploring the formation of a work group that would tie together the chief information officers from across those operating divisions as well as the health data leads from each of those divisions so that we could have a more comprehensive exploratory set of individuals who are looking for the research data that can be made available out of that $100 million research budget.
We have also submitted a plan. OMB requires that each division submit a public access plan. Those plans have been submitted to OMB, as you could probably imagine that you have been with OMB for quite a while. But we do not have any expected date on the responses back to that. I would love if somebody would just take a note for a future NCVHS work group meeting that perhaps we look for a public access memo update once we have received feedback back from OMB.
Changing topics slightly. Let’s stick with the department for a second. Another thing that came up in the charge of the group that I think is pertinent here is the idea of exploring the value of the data and understanding the opportunities to make data available, not just in a retroactive fashion, looking back as to what we have made available, what should be made available going forward.
Another activity that I am currently engaged in with the health data leads from the various operating divisions is basically a mapping of the HHS strategic plan to the access to open data that we have available at the department. You may remember that earlier this summer, the department released its own strategic plan, which lays out at various levels our various goals for public health, for decreased costs, for improvements in quality, et cetera. It is a very narrative explanation of what it is that we would like to try to accomplish as a department.
One of the things that the IDEA Lab recognized was this is actually an opportunity for us to say where do we have data that is actually supportive of these strategic goals and where do we not have data that is supportive of these strategic goals. Those are some opportunities for us to make some data available.
With that said, I am currently working with our various health data leads to create the map of what the strap plan looks like from a departmental level as it coincides with the open data that we make available across the department. The goal is really to show the areas where the data is already available that meets the strategic plan, but also, again, to uncover the places that we do not currently have data available so that we can have a strategic focus on the places where we want to get data liberated in support of those goals. The goal is not necessarily to take every single data set on healthdata.gov and assign it to a strategic goal. Although, theoretically, each one should be tied to a goal. But there are a lot of data sets that are not necessarily of the highest value to the broadest number of people. We do not necessarily want to go through a super exhaustive process of mapping every single data set as much as looking for the strategic data sets and data systems that are supportive of some of our strategic goals.
Another thing that came up in the charge for the group is as was said, the idea of proving the value of the data. There is an organization out of the New York University called GovLab who worked with OSTP to develop what they call the Open Data 500. What the Open Data 500 is basically an attempt to try to identify 500 companies that are using government open data as basically an integral part of what their business model really is. If you were to go to opendata500.com — that was one of the things that we asked you guys to have a look at. The idea there was that they are going to be identifying companies that are using open data as part of the secret sauce for their business model, but also trying to document the value of government open data as an opportunity and an economic value for the US economy or any component of the global economy.
There have been two round tables that the Open Data 500 NYU group has done with two other departments. They have already spoken with USDA. They have also spoken with the Department of Commerce. Let me tell you about the structure of the meetings. Basically, the meetings are set up to have individual speakers tell a little bit about the operating division that they work for in that department as well as a little bit about valuable data that they present. The roundtables are then broken out into a series of other more focused topic related discussions. You probably have figured that I am driving to the point that I am considering having an open data roundtable for HHS that will involve the various operating divisions. I would love for you to be participatory in some of those discussions in any of the focal areas that you see fit.
I have not yet decided what the format is going to be. Obviously, Department of Health and Human Services is a massive organization with multiple different activities. We could have 2000 roundtables. It is going to be challenging for us to figure out how to make this work. I think what we may do is get a consensus of some of the operating divisions that intend to participate and then allow them to drive the agenda. If they would like to focus on their data writ large across their department or their division, that is fine. If they would like to focus on something very specific like reductions and cost or improvements to quality in the health care system or social services and their availability across the nation then we may have roundtables, breakout sessions that are focused on those various topics. I just wanted to put that out for the group to consider in terms of topics that you might find valuable for a roundtable at HHS. I would be happy to take some input on that.
Finally, on a last point during our preparatory call, we were talking a little bit about the idea of examining some of the ongoing data activities at the department and actually trying to understand the entirety of the dissemination process for that data set, that data tool, you name it. The idea being that we want to really understand everyone doing the best that we can with the dissemination of the availability of the data. Our social media or our blogs or other kinds of things. The highest value way to get the information about the data out there and what are some of the other alternative ways that we could be really blasting the fact that this data is available and free for your use.
We were also interested in looking at some of the flagship surveys and programs or what have you and asking how the data should be made available and better and improving the usability of the data. Again, not focusing at a granular level at every single data set that we have available, but let’s take some of our higher-level information systems and do an examination as to what their usability of the data is. What the communications, dissemination is about the availability of the data and really trying to give some feedback to some of the various operating divisions about the data that they make available, the tools that they make the data available through, et cetera.
I am going to stop right there and see if there are any questions about anything that I have said, any ideas that have been sparked or what have you. I am going to shut my mike up.
DR. MAYS: Okay. How about tents up. Tents up and we will go around. Let’s start with Leah and then we will come around.
DR. VAUGHAN: Hi Damon. Thanks for an excellent presentation as usual. Question. When you are talking about public access to the data, are you putting an umbrella around it or is there perhaps an additional category to consider for maximum and optimizing reuse of the data? On one hand, public meaning the open data, the open part that is seen. But I wonder whether there are also opportunities in what I would frame as open access to “qualified” researchers. It seems like a great deal of that infrastructure is set back 15 or 20 years too. It does not need that process. It does not take that much money to actually really blast it open to a lot of people who, I think, would make very fine use of the data and add value to the body of work.
MR. DAVIS: I think that those are all part of the examination of what is going to happen with the research data. I will admit to not being an expert on access to research data. However, I have heard many different conversations about how it is that the research community can bring greater value to it and actually be contributory to giving input as to how they would like to receive it and what data they would like to have so that you could have at least at some high level a demand aggregation from across research communities to say here are the top 25 things that we would like to have out of NIH. We would love to have them in APIs or what have you.
DR. VAUGHAN: I think there is a huge potential community that has been yet untapped. I would love to extend that conversation.
I guess maybe a piggyback to that would be in terms of looking forward. Sometimes what I have seen certainly in state and county level is kind of an automatic renewing of the old contractors. A couple of extra lines would have as part of those contracts that are ongoing making structuring so that there is an open piece and the formats are already baked in.
MR. DAVIS: That is a really good point actually. One of the things that we have been considering in this open data space is the procurement mechanisms even at the federal level that basically expand what it is that has been set out as a requirement for this project to include the open data piece throughout or at the end or wherever that appropriate place is.
One of the things that I have been really trying to do is build some relationships with the states so that they understand what federal resources are available to them. This is not just for the federal government to use. Many of the states are doing the exact same thing that we are trying to accomplish. There is a lot of value in continuing to disseminate the efforts that we are making at a federal level down to states and cities so that if nothing else, they do not re-invent the wheel, but secondly, that they can actually be very closely aligned with federal efforts so that you start to get this real ability to make connections between data sets, between projects across geographic areas and things along those lines because they have been following along in some way with what has been happening at a federal level and have a line to their state appropriately to be aligned with that.
DR. VAUGHAN: That is terrific. I think there is, again, a lot of richness and potential there. Thank you.
DR. MAYS: Before we leave that, Wendy, do you have any thoughts or comments in terms of the research data? Has there been discussion within NIH? Are there any thoughts that you have about how to have this conversation?
DR. NILSEN: I would like to say I know of them. I know they are happening. I have not been involved in them. They are happening at a very high level. I know they are happening within the department. There are conversations between all the research agencies on this. NSF is talking about this too. Everybody is talking about this. As you well know, it is complicated and I think they are trying to think about going forward as well as how do you go back. Going forward seems to me more workable at some level than how do you go back. I am not party to the exact conversation.
MR. DAVIS: In thinking about the going back piece, there is a lot of really valuable data. There is research that has been going on for many years. The challenge is how you retroactively make it so that these people who are engaged in this valuable research retain the data. Even when the research grant runs out, what is the longevity of the maintenance of that data well into the future? That speaks to the retroactive piece.
DR. VAUGHAN: I think there is a value in that too with so-called “legacy data” as people start to retire and the memory and the context for that goes away with their desktop. I think there is a premium on some of those. What are the highest priority items?
DR. NILSEN: I was just going to say I could bring this back to — our big data group is one of the ones looking at this to really sit. I can bring this back as a part of the discussion in how do we leverage NIH’s resources to help release at least some of the stuff that is going back.
DR. MAYS: I think that would actually be great because there is the issue of before and after. Some of this has to do with as — I am sure Leslie would comment — has to do with what agreements were made with the subjects and how far we can go and what are IRBs. It is a pretty big issue, but I think the value of what could come from it would be good.
Going forward, we have been told that if it is at a certain amount of money, you have to do this. This is a really big piece. There are many cohort studies that are a potential. I think this is something that we probably can think about carrying forward.
DR. COHEN: Thanks Damon. It is always exciting to hear what is going on. Leah, you reminded me. I have one legacy data set now that I am retired that I need to transfer. I have vital statistics data back to 1842 from Massachusetts. I do not know that there is any historical database like this before.
MR. DAVIS: That is institutional knowledge we would be interested in having.
DR. COHEN: That would be great if it were on healthdata.gov. I keep it hidden away. It is encrypted.
To pick up on a couple of the themes, one is the connection with states. I was struck by your initial comments about the difference between quality metadata available and federal data versus available on the state data. I was wondering another thing for you to work on or for this committee to work on is figuring out how to encourage and support consistent metadata standards across states. I think communities are beginning to use more and more state and local data. If we had consistent approaches for metadata tagging that would be awesome. It would help everybody out.
MR. DAVIS: Let me hit on that one first before you go to your next comment. It is a fantastic point and that is literally one of the things that I have been trying to drive to the states that I am in discussions with is the idea that at least the common core metadata are available on Project Open Data through the schema that they have created.
We also are trying to bring on an entrepreneur and residence through the IDEA Lab’s pathways for innovation. One of the projects that we are very focused on is laying the groundwork for making database linkages, creating the metadata that is going to allow some of these data to be much more efficiently interlocked into the future. It is going to be a long process. But I think the value in it is that obviously if the federal departments are engaged in some sort of core metadata creation for research data and social services data, you can see how that was easily templated into state and localized entities who are also trying to do the same thing. It is our hope that we can blaze a trail in creating some of these metadata, for lack of better words, standards in order to allow others to build on.
DR. COHEN: I think just models and recommended standards that are disseminated. The federal connection to all these data systems is right there. It is not going to require much money or resources, but just redirecting some of the existing opportunities that we have to really reinforce the consistent metadata across a variety of state and local data collections.
My final thought or comment is actually more of an ask because you were talking about the roundtable tables you are trying to put together around federal data usability. This is essentially what the focus of the October roundtable is from the community perspective. If you have any suggestions about federal folks who should be involved in this conversation, just please let me or Tamara, she is no longer here, or you can just email me and we will make sure they get invited.
MR. DAVIS: It sounds good. Thank you for that.
DR. QUEEN: I was just going to ask Damon. Have you all been involved with Sarah Potter’s group within ASPE? She is the team lead for the Strategic Planning System.
MR. DAVIS: I believe so. I do not remember all the names, but I believe that is correct.
DR. QUEEN: Because she is also looking at — there are no more strategic plans to come forward unless they have a progress towards measurable goals. There is an interest in having ensuring that as the department moves forward that you always have something that could be measured to indicate progress on meeting goals and data. I am sure she would want to be in touch with you.
MR. DAVIS: I will go back and look at my notes. Her name is super familiar.
DR. QUEEN: There is a strategic planning system and their entire data elements and what is going to supply the data for this particular action plan.
MR. DAVIS: That is a really good point. We focused on the HHS Strategic Plan as a whole, which has measurable goals in it. Those are clearly the places where the data will be most valuable. I hadn’t necessarily contemplated next steps beyond this mapping of the HHS Strategic Plan, but the disparities plan and other kinds of things along those lines will be valuable places to continue. That is a really good idea and perhaps we can talk a little bit further about what other plans we have out there that would require us to map some data sets to them. I think that is a valuable exercise for us to be engaged in.
DR. QUEEN: She has quite a lot of information.
MR. DAVIS: The open data policy requires departments to be more strategic in its consideration of its data and I think that this is a perfect example of that sort of more thoughtful approach to your data.
DR. ROSENTHAL: I was going to save this for a little bit later, but as long as you are talking about strategic plans and public access plans. We have heard about the future and strategy. Metadata obviously being a huge piece of that. What are your thoughts on basically visibility or frequency, not of regular sets, but even of the new sets? For example, Part D and Part B releases make the news. Everyone goes crazy. Are they going to be repeated? Is the data going to be updated at any point in the future?
One of the core things you asked is how do businesses or entrepreneurs use this data and why they haven’t been using it as much as we might like them to and say there are all sorts of wonderful things happening, but not nearly as much as other sectors to be quite blunt about it. One of the reasons for that is we do not have visibility. Should I build a system or a business on Part D or Part B if I have no guarantee or even any visibility that is going to be updated at any point in the game? Metadata being one piece like I have droned on and on about and good progress is being made there. The other thing I would like to turn our attention to and specifics of strategic or open data plans as well as broader for the committee. If you want people to use the data and build something around it, we need to be able to give them some visibility and whether it is a priority. I know obviously budgetary things preclude some of those things from time to time. What are your thoughts on visibility into frequency of updating or repeatability of these assets in addition to —
MR. DAVIS: I think that is incredibly valuable. What I think I am hearing you saying is if I am building a business that is based on Part B data and it comes out every January 1 or at least I hope it is coming out every January 1, the last one did, how can I have any kind of reliability from the department that has a schedule of data’s releases, the valuable data sets that actually have some level of for lack of a better word backing, some kind of subsistence that is going to make them —
DR. ROSENTHAL: Exactly. And whether it is a business or a community intervention. Right now, we can pull up every provider in Boston. I can do basic actual footprint analysis rather than contracted footprint analysis. Now I can do efficiency curves — back pain and run it down to severity. That is great. But with those great sets, I have no visibility and are they ever going to be repeated. There is nothing in the notes about that. We have given everybody the world. We just do not know if we are going to be able to use it for more than a year.
MR. DAVIS: I think that would be a great thing to have in our metadata about each of the data sets. What is the frequency of the report that creates it? What is the actual schedule for time after the report is produced that the data is available? And is it actually going to be made available every — annually? Are there quarterly updates? Some kind of visibility. I understand.
DR. ROSENTHAL: Obviously budgetary. It is only allocated for certain parts, but I know it is important so it is not just shooting it out and getting — how do you have regular updates of it?
MR. DAVIS: It would probably be challenging, but could it be valuable to have a data schedule? Is that what you are —
DR. ROSENTHAL: Anything on this thinking. It is just like we are starting to think about metadata for the first time in a real meaningful way and that is great. When you look at other sectors where they do commercialize this stuff and build public programs on top as well. You always find metadata. You find models or ERDs. And then you find something around frequency of it or something around is it going to happen again. Huge fan — the stuff hits. Great. Eight billion downloads. That is fantastic in terms of meaningful value creation for the market. If you want to get to a number, that is the great missing link. If you want to get to a number, is the data valuable in how we are defining value? There has to be some forward visibility. Is it going to be out there again? It is one of those esoteric things and I hate to bring it up in this context, but it is actually — important.
MR. DAVIS: I appreciate the comment.
PARTICIPANT: My question was —
DR. ROSENTHAL: Basically, everything except for health care — everything else with open data that has hit and a thousand daisies have bloomed, but it hasn’t quite worked in health care in the way we would have hoped it would have. Part of it is around sharing metadata, sharing basic models. As we have talked about before, it is not rocket science. You have an ERD when you are creating the file. Just put the thing up there. I can see it. To accomplish Bruce’s goals of sharing with the states is put the thing up there you are already doing if nothing else. You do not have to put a full standards committee to review it. No offense. You can do something quick and easy. Weather and geo-location are great examples.
MR. SAVAGE: One of the strategic plans that is out there that is coming soon is the Health Information Technology Strategic Plan. My understanding is it is going to be put out in draft form at the end of the year. Maybe first quarter of 2015 for public comment. The Office of the National Coordinator is supposed to be coordinating with sister agencies. It seems like it is a really good opportunity to try to get some things woven into a process that is both trying to capture data at the individual patient level, but also at the population health level as well. One of the things we have been pushing actually is stratifying quality measures by disparity variables in order to identify and reduce disparities. They have talked about how that is important, not only for that purpose, but also to go across group populations like ACOs. You have some fairly large opportunities there and some coordination with the National Coordinator before the draft comes out might be helpful.
MR. DAVIS: It sounds good, Mark. Thank you.
DR. MAYS: Do you know the timeline?
MR. SAVAGE: I think the draft is coming out in the fourth quarter. I am guessing the end of the year. There will be an opportunity for comment, if I remember correctly, from the HIT Policy Committee and the HIT Standards Committee. There will also be an opportunity for public comment on that. It may be staged in two stages. One is to try to get some workgroup advice on it so that it is even better when it goes out for public comment. I am not sure that I have that right though.
DR. SUAREZ: Is this the report on the interoperability?
MR. SAVAGE: It is different. It is a strategic plan and they have done it before.
DR. SUAREZ: ONC strategic plan.
MR. SAVAGE: It is not just for ONC. It is for all of the HIT efforts because that is their charge.
DR. MAYS: Walter, is that something that you want to know more about?
DR. SUAREZ: I know we receive a report from you yesterday about October 15 ONC releasing publicly through a joint meeting of HIT policy committee and standards committee a report. I believe that is actually the ten-year report, not the strategic report. We might be talking about two different reports. The ten-year strategic plan or the ten-year plan that they published a couple of months ago has already gone through public comment review, process and all that. It is our understanding based on the report we heard yesterday that October 15 at a joint meeting of the HIT policy and standards committee, they will be released what they called the interpretability roadmap.
DR. MAYS: I think this is different. What I would suggest, Mark, is that if you see the announcements and for whatever reason we haven’t if you could send it to Lily and Lily can make sure that we send it out and then that way Walter and others, you can be informed and decide whether or not it is something that you want to offer comment on.
MR. SAVAGE: I will take care of that tomorrow.
DR. MAYS: Thank you.
DR. GREEN: Vickie, I have a question for Susan and Damon. I need to ask your indulgence to frame it just a little bit so it will make sense. This is very stimulating what is going on in the room about where opportunities lie and how many are just burgeoning out. I just ran all over GovLab and Open Data 500. GovLab appears to be trying to connect people and Open Data 500 appears to be a study of 500 companies. We keep saying the word we. I do not know who we is. NCVHS represents seven decades of work on data. We are presently deeply committed to and interested and concerned with creating enduring infrastructures that allow for proper access and use of data to improve things particularly if they relate to health. We keep spotting missing infrastructures. We have digested and I think we have articulated this whole idea of data stewardship. We think that stewardship is probably foundational to proper use of data going forward. Where in the federal enterprise is the data stored? When you say we are doing this and we are doing that, what do you mean? I am not asking this for academic purposes or to be difficult.
I am asking this because the work of the working group needs to run properly through NCVHS to make a difference. We need to understand the setting, the environment into which this work needs to go and whom it needs to go to. We could do better work, I think, if we just have a better idea about where it is going to go.
DR. QUEEN: When I think of we at the department level, it is federal systems, federal data, federal stewardship. That is why the issues of what kind of standards we could recommend for states or localities that would be something where you would want to look at what authority do we have in terms of — even recommending a model, I just think of all the efforts that have gone in over the years with NCHS, working with states for birth and death certificates. There is a careful development of relationships on a state-by-state basis that have to be nurtured and a lot of collaborative effort that goes into doing that. When I think of the we with the Department of Health and Human Services when we are disseminating data, there are certain clear statutes, regulations, authorities, policies to which we must comply or under which we are operating. They may not apply when it comes to local community, any of the things that are not outside of the purview of the federal systems.
When I say we, I am thinking of the relationship to the data schedules, NCHS, AHRQ. They certainly do publish their data releases. They let you know when they are going to be released. There is an established schedule. I do not know about CMS. Within the federal system, there are things that I know we are doing. We can certainly do a better job. There are things that we can improve and look to completeness. But beyond the federal level of data, that is where I do not say we.
DR. MAYS: I want to interrogate Larry’s question just a little more and to say the issue of data stewardship, who does it belong to? Is it that NCHS has its own and NIH has its own or is it that it is in the hands of ASPE?
DR. QUEEN: And I am not thinking of ownership as much as the stewardship issue ensuring that appropriate statistical disclosure techniques have been applied for data that you have privacy and confidentiality requirements. That is the data stewardship that I would be looking for. It is not ownership. That is probably a question — I do not know if it is philosophical. For the data-producing agencies that have been disseminating data for many years such as NCHS, there may be an ownership in terms of restricted data and concerns about access to restricted individually identifiable information, but for the many years of producing public use files, they are disseminating and making it available to the public when it is out there, it is out there.
MR. DAVIS: I was going to just add basically in support that it is a federated we. It is a we as a collective as a department, policy coordination from the Office of the Secretary with the secretary’s guidance as well as consideration for the existing policy climate, open data policy, privacy and security policies and things along those lines. And then as I see it, all of the individual components within an operating division who are making sure that that division is then compliant based on the different data systems that they make available. Again, like a federated — there are privacy officers everywhere and there are open data people working on specific projects everywhere. There is a lot of need for communication and coordination across each of the divisions for that very reason.
DR. GREEN: You cannot see this on my iPhone, but that is the open data compass on Open Data 500. One side of it is a list of federal agencies and the other side is business sectors. Look at the middle. That is what I am asking about.
MR. DAVIS: What do you view as the middle? I am not following.
DR. GREEN: On the compass, it has about 300 lines connecting each to another back in different places and going through there. It implies that that will happen in a federated model just by some sort of gargantuan natural force that just harmonizes things and puts it together and makes it happen. I like this. I like this a lot. I am taking up too much time here, Vickie. I really think this work group is over the target about potentially actually helping some things move in a constructive way. Use cases where we really see access and use blossom and do something that really matters. But I am very skeptical that it is going to happen in a durable way.
The two things that I got excited about here. One turns out to not really be defined exactly what it is. It is just efforts to connect. And the other is a study. I do not think NCVHS is interested in another study. We are interested in proper data policy that is durable, enduring, supports knowledge development for generations to come and that sort of stuff. I am wondering what you think about the conclusions NCVHS has already come to that we are missing proper data stewardship during data stewards and we need to who the hell they are so they can be in charge of this thing, not in charge in the sense of dictating, but in charge in the sense of these business sectors know who to go talk to that get a resolution and they do not want a resolution five years from now. I will hush.
DR. QUEEN: What do you mean missing data stewards? What do you mean that we are missing data stewards?
DR. GREEN: I am saying there is a tyranny in absolutely freedom. That is what I am saying. Proper data access and use will not tolerate absolute freedom. There are going to be rules for when these data are ready to be released. There are going to be rules for how you express metadata. You are going to use those rules or you are not going to play. Who will enforce that? It is not enforcement in the sense of sending someone off to jail. It is enforcement in the sense of enablement and allowing progress to be made, the deliberation of some standards and this sort of thing. It is not clear to me that there is even a concept that this is missing. Just the way you are looking back at me makes me —
DR. QUEEN: For statistical data, there is the Federal Committee on Statistical Methodology. They have produced working papers and guidelines for the techniques that have to be employed prior to data release to protect the data.
DR. GREEN: If I am at the Department of Education, is that where I go?
DR. QUEEN: Yes, it is all across the federal government.
MS. BERNSTEIN: There is a lead at the Department of Education. That person sits on a federal committee of statistical agencies that think about privacy and confidentiality across the federal government.
DR. MAYS: We actually saw that group come into one of our hearings.
MS. BERNSTEIN: But this group, I think, if I may, also talking about administrative data and other kinds of things.
PARTICIPANT: I think you are talking about something other than —
MS. BERNSTEIN: I understand, but I am saying there isn’t one — if you are asking if there is one steward for all federal data that is being released, not really. There isn’t one person and one office that is —
DR. GREEN: Actually, I do not think there is one office. I do not think there is one agency.
MS. BERNSTEIN: That is right.
DR. GREEN: And I think for what — we are talking about big data. We are talking about for health purposes, we need the Department of Labor’s data sets to be able to be lined up with the Department of Education to be lined up with HHS.
DR. MAYS: I am going to take the chair’s prerogative because there are a lot of cards up and people are starting to wave and all that kind of stuff because they really want in. And we are starting somewhat to get into the later discussions. I want you to keep both your comment and your passion so that when we get to the discussion at 4 o’clock that you can also bring that back up.
Of the cards that are up, let me just figure out if you are on this point or if you have a different point. I am going to say if you are on the point, if you keep to the point and make it short because I am going to move to other points so I can get the other people in.
MR. DAVENHALL: I am curious as to where the Healthy People program, document, plan — I do not hear much about this. I would say to me that is like the national health plan. In that document, it outlines all the data that is required to measure what is planned. In one way, I would say that is one of the guiding lights here that if you are wondering about what kinds of data the department needs to go after or create or to reposition or to refine, they probably ought to start there. The question I have is I do not really know where that fits anymore. I do not know if this is just a program of the CDC and it is for CDC purposes or it is much broader. I would like maybe to get some clarification on that.
The other thing I would have to say is you need to figure out how you want to use the Health Data Consortium who envisions itself as being a broader spokesperson for both the business, the research, and I would say somewhat the non-classical, traditional data users that are coming into the fold. It is just an operational issue of how you want to fold them into your deliberations.
MR. DAVIS: I am going to hit the second point first. I actually, as Larry was talking, was thinking that the Health Data Consortium could be valuable in this space. I know that they are very much — they have outlined some programs and some activities that they want to focus on. But one of the reasons that HDC was even created was to help at least be the demand aggregator. I don’t know that they necessarily see themselves as being in potentially an enforcement role in the stewardship conversation that Larry brought up. I think you raised a really good point that the HDC is in position to at least in some way to be valuable in this space of moderating and stewarding the data’s utilization so that external organizations who are looking to use the data could potentially go to them for some of the insights that they might eventually get from the data. Agree with the HDC comment.
In terms of the Healthy People program, I had thought about that previously. Admittedly, it has fallen off the radar for various reasons. There is multitude of activities. But that was something that I had in my early notes as an opportunity to figure out where can we do a data mapping that is going to be instrumental in moving a program forward that is going to be part of population health across the nation. I am glad you raised that again. I have noted it here. I will definitely go back and have a look to see where it is that we can make that be part of the value proposition of the data that we put forth.
DR. MAYS: Linda.
MS. KLOSS: Thank you. My question was triggered by Josh’s comment about the timing of data coming out and thinking about this more as a life cycle management of these data sets, which is of course part of stewardship. The question is are we ever retiring any of them and are there guidelines for that. I suppose taking a more lifestyle approach both to when we bring it on and when we retire data sets and do that under the umbrella of stewardship.
MR. DAVIS: That is a really good question. It is one of the things that came up in the article is you can have tons and tons of data out there and our superficial number for how many data sets you have looks great. But if people are only using 15 percent of your data sets and some of them have been out there for ten years basically untouched, you do have to consider the cost of that ongoing storage, et cetera. That is a conversation that we are trying to have. It is not one that I have an answer to, but it is definitely part of what is on the radar is as we begin to really manage data across its life cycle, how is it that we actually say these ten data sets are definitely ripe for some level of retirement and archiving?
MS. KLOSS: It is stewardship at its core.
MR. DAVIS: I agree. Thank you.
DR. MAYS: We are going to let Leslie have the last question. We are going to close this out.
DR. FRANCIS: This actually goes back to what additional data sets, not from the feds, you are having in the same hodgepodge of data sets, which is just I think it goes back to Larry’s stewardship question. Those state or other data sets might or might not meet not only metadata standards, but have been appropriately combed to avoid revealing identifiers and a whole bunch — and I just don’t know how that gets handled.
MR. DAVIS: I think what I am hearing you saying is they may have some best practices at the state level that we could use at the federal level.
DR. FRANCIS: No. What I am hearing about is if there is an increasing push to — I do not care about Massachusetts data from 1843 because after all people have been dead — because dead people over 50 years do not have privacy rights. I start worrying about more recent Massachusetts data. My point is that —
PARTICIPANT: Some of those people are still voting.
DR. FRANCIS: I spent a summer working at the civil rights division of the Department of Justice reading voter rolls and obituaries. That was a really exciting summer.
But going back to just the whole question of how do you make sure that if you put other people’s data or data from other sources up there that it meets the same standards you would be requiring —
MR. DAVIS: That is an interesting question. Keeping in mind, healthdata.gov is a catalog of where you can find data. It is not the repository for the data. We are not responsible for the data set itself. It really does ultimately — if you go to healthdata.gov and you find Bruce’s valuable data set on people back to 1843, you have left a federal website now. This is just an opportunity for you to find the data resources that you need. I would push back just a little bit in saying that we do not necessarily collect and curate that data as much as say here is another opportunity for you to find some additional data sets. Why don’t you go to Massachusetts’ website? Once you are there, it is basically incumbent upon them to have the proper stewardship of the data from their state’s perspective. I am excited by the fact that we are putting state data in the catalog, but I also want to be cautionary in making sure people understand we are not actually taking a copy of that data for healthdata.gov.
DR. MAYS: Lily.
MS. BRADLEY: Going back to Larry’s point about data stewardship, I think it is helpful for folks to know outside of the federal government that everything we do has a — we are granted authority by Congress and that is supported by the president. There are laws that give us authority to run a public health program. Unless we are also given authority to collect information then we are not allowed to collect information. There is something called the Paper Reduction Act, PRA. It is one of our favorite acts that creates paperwork for us, but less for the public. Any time we collect survey data or administrative data, there is a supported form. What is really great is there is a control number on it. They are reviewed every three years. They are pretty much reviewed every three years. We have to provide a supporting statement that says why we are collecting it, why it is not a burden to the public. We cite which authority both either the public law or the regulations. We publish in the CFR, the Congressional Federal Register, what our regulations are going to be. The more we are going to require anything of the public, the more it is that we have to invite comment. We have to allow 60-day notice, 30-day notice. We have to respond to the public comments. There is a lot of the data stewardship built in, but it is so hard to answer just because each kind of data may have a different authority because we have done so many different things.
If you just actually go back to the form and then to the law, there is a paper trail back to who controls that. For instance, I would want to point out that CMS has a — office of information products and data analytics. That was — now, Chris Cox is in charge of it. Within that group, they have a privacy review board. If you are a researcher and you are requesting restricted access data, you have to submit your particular protocol of what you want to study and which data variables you are requesting and they review whether or not they consider those things to be reasonable to be part of your study. If it is not reasonable, they reject and you have to review and revise. It is a long process. There are a lot of good things in place. I am not sure that that ultimately gets to your question.
But I do think that while we might not make it overly clear, it is there and we are just more familiar with the ins and outs of where the wires cross. Finally, on the Healthy People 2020, there are in the health indicators warehouse, actually 913 indicators that support Healthy People 2020 that are captured. They are released in a very nice API. There are two kinds of ERDs that are associated with health indicators warehouse. Josh, somebody was listening to you. It is both for how the ERD maps the indicators and how the metadata are mapped to each other.
DR. MAYS: I saw that you had your card up before and then you put it down. I am going to let you have the last in this round because you actually had put it down once.
MR. SAVAGE: Thank you. The discussion about uses and archiving reminded me of the distinction between basic and applied research. That there are sometimes where we just do things and we do not know what the results are going to be, but then we make discoveries and we are thankful that somebody tried it out. I do not know how applicable that is here. I do know that there are databases that were collected a long time ago that we start turning to, for example, in environmental things trying to see what happened to the planet. We are glad that somebody wrote it down, didn’t perhaps appreciate why it was being written down at the time. I just throw that out there as maybe something for people to think about that there may be basic data gathering and we do not know what the uses are. My background — I am an advocate. I am mostly thinking about how useful something is, but maybe there are some other ways to think about it too.
MR. DAVIS: So appropriate archiving such that you can document its existence, but it is not necessarily in the front of the card catalog anymore.
MR. SAVAGE: Appropriate archiving, but also thinking about gathering data that you might not otherwise collect because you do not have a vision now for how it is useful, but it could well prove to be very useful down the road.
DR. ROSENTHAL: Or it could blow the whole thing up and do the exact opposite — archive it. Just put it out to the distributor — or you could blow the whole thing up and do the exact opposite of that and not archive it, but just put it out to the wind so you see every other vertical. We can save that for the 4 o’clock discussion. But all of this conversation is premised on top-down, which we started off the conversation by saying we should figure out what other sectors do successfully and how we might want to run this for the next three or four years. This part of the conversation has been — Larry, your discussion of we are going to go by the rules, et cetera. I do not mind that. There are certainly ways to do that. We did with that with the API. Who is going to do that? You require it in an act and say anything that gets federal funding has API. Anything that gets federal funding has metadata. There are very specific ways to do that we can answer, but that is all top-down. There is a whole other approach, which is not only out there dominated outside of health care, which we can go through at 4 o’clock. I won’t take any more time. I just want to throw out there as a markedly different way of thinking about this as well.
DR. MAYS: We are going to get to talk about it.
Can I get the other slides up, the two slides that I have? Part of why I was not overly upset about the time is that you are actually beginning the next discussion in many ways. I think there is some richness that is out there that I want to make sure that we continue with.
What we are going to do in this next time. Remember that at — I think it is — was it 3:30, 3:45 — when does Chris come? At 3:30, we will have a discussion of the NHIS data. What they are going to do is to tell us not all about the data, but how they disseminate the data. Dana was great at giving you the background of where all this comes from. Part of what we are going to try and do is have a discussion about what should be some of the guiding principles that you want people to think about in terms of the dissemination of the data, how it is accessed. Again, we want to try and come up with these higher-level principles.
We will listen to the person from NHIS tell us what they do. We will try and then come back and think about what things specifically we would want to think about sharing with them relative to the NHIS data. Part of what we want to think about is when we listen to NHIS, do we have some other suggestions? Is this the best it can be? But I want you to remember whom you are talking about here. As I say, these are imperfect levels that I hope over time we are going to develop more in terms of understanding whom it is we are talking about. There is going to be the first level where there are data builders, big warehouses, people that can deal with big data.
The second is to think about a level where we are talking about people who have some level of training and skills in terms of the data. What you will offer somebody at level one may be very different than level two. And then at level three, what we are going to try and also think about are the consumers. Remember, we were told to think about the consumers and patients. When you are talking about making the data accessible and usable for whom. As I said, these three levels are very imperfect.
Our goal over time is to get clear about who falls in these various buckets so that as we make recommendations to these agencies, at the end of the day you might say your data is such that it can only be used by level one and two. We think you have a real problem with three. You are just not a three person. Or it might be that they are just at level two. The researchers love you. You are perfect for the researchers. But even the data warehouses you don’t provide enough information. It is too costly for them to take your data set and try and link it and do all these other things. Researchers ask for money and that is why they are able to use your data set in this wonderful way. We need to help people to understand that their data may be accessible and usable very well by only certain groups. It is up to them. We cannot fix it, but I think we can give them some feedback.
I think HHS’ task is really to deal with three because getting data down at those other levels, I think the word is that that is where we should push to acknowledge if you are doing it or not doing it. I do not know that we can fix it. But we can for level three be very clear about — I was going to say restaurant ratings. In LA, we have A’s and B’s and C’s. You have 1’s and 2’s and 3’s. It is like saying it just does not work. That is part of the discussion that as you give suggestions, think about who it is that that suggestion benefits.
MR. SAVAGE: Vickie, can I ask a question? Where would you see HHS falling as somebody that is using and accessing. HHS itself or federal agencies in general?
DR. MAYS: I think it depends on which — that, I think, it depends on which federal agencies. If I did a comparison of the type of data sets and the audience for NCHS, it differs from the type of data and the audience for SAMHSA. I think those two have built some very different approaches in how it is they get their data out the door and who their primary users are. If you asked me about HHS itself, I probably would put HHS up at — HHS itself is probably very much one and two and some parts three. That would be my take, but that is my bias.
Agenda Item: Building a Framework for Guiding Principles for Data Access and Use
DR. MAYS: This is Josh. He sent us out some information in terms of talking about metadata. Part of what I would like to do now is start some listing of thinking about what some of the principles should be in terms of guiding our access or use of data. Josh has put us on the table in terms of thinking about metadata. We heard that from Damon. We have been embracing the value of metadata. Any others?
And then a little bit later, Bill, I am going to ask you to — they have your slides. I am going to ask you also to bring the slides up. At this point, they are facilitators. I specifically ask a subset of you to facilitate this discussion. The reason I asked them to facilitate the discussion is they have key areas in which I want to make sure that they look and make sure we have discussion about some of their areas of expertise. Let me turn it over to them to get the discussion started.
But what we are looking for in this discussion is starting to be a listing of some of the principles in terms of guiding access or use of data. Cards go up if you want to get started. Josh, they want you to describe that.
DR. ROSENTHAL: The glossary. If you close your eyes and pretend you want to know something about data, you go to some website or something like that and you see a file and it says PQRS 2014. The question is is there a way to find out information about that before you have to download it and dig through it.
And then secondly, if you actually do download it and dig through it, you will find very frequently — you will see — let’s say you are looking at it in the simplest thing you can, Excel, you will see something in a column and it will say A, B, C, 1, 2, 3. It will have a variety of numbers in there. And you might want to know what that thing actually is, what it represents, what the definition of the thing is. Often times in HHS data, you will find columns named column. It is a real example. You will find it all day long. That is not terribly helpful for figuring out what is inside the stuff.
This is a simple definition of what we might mean by metadata. You can also use a glossary, which if I want to get familiar with a field or learn something about the data itself, I might want to have it in a lexicon or a glossary. And then you can expand out from there. The relationship between the metadata elements. If I look at what is a payer to a plan. What is an HRR to a hospital? What is a group to provider? What is a GP to a specialist? I might have little arrows going up and down and saying groups and providers are all physicians and payers and plans are all other types of entities. I might build up a classification around that. That would be one of the ways we think about what we call an ERD or an entity relationship diagram. But knowing the domain or the array — knowing all the terms that we are talking about in a data set is one of the really helpful things that you tend to find outside of health care. Is that what you wanted?
MS. KLOSS: Could you reframe that as a principle?
DR. ROSENTHAL: If you list something somewhere, you should have the definition for that thing.
DR. MAYS: Okay. Can I get one of my facilitators to call the table? I am going to actually let you all do that. This is to facilitate the discussion.
DR. ROSENTHAL: And the question is what are the basic guiding principles for data —
DR. MAYS: We actually started this discussion somewhat. And what we want are actual examples and to the extent that — and Linda did it so well — to the extent that you can list it as a principle. That is great. Because eventually what we want to do is to give people like here is a set of principles to think about in terms of your data set and then here is a set that is specific to your particular data set.
PARTICIPANT: Is this where you want me to jump in?
DR. MAYS: In terms of facilitating the discussion, not yet, in terms of the slides you have. Let’s get some of the discussion from the group and then your slides.
MR. CROWLEY: I will jump in. Over the course of the discussion, at some point I want to build on what the metadata discussion specifically now, but also I want to present to the group at some point during the discussion. I found this article. It is socio-technical impediments of open data. It is an evidence-based approach covering the literature on what is usable in open data. What are the problems based upon literature? About six different workshops. Interviews with “expert” data users.
There are ten specific categories that make data more useful. Rather those are impediments to the use of that data. We can go through each of these later. One category as it relates to metadata and that is one of the ten categories, which we would expect.
PARTICIPANT: Was it number one?
MR. CROWLEY: It is number eight. To seed everyone’s thinking since I am designated as facilitator for part of this. I will go ahead and list all ten and then go to the metadata question.
The impediments that open data process currently encounters across ten categories. One, availability and access. Two, findability. Three, usability. Four, understandability. Five, quality. Six, linking and combining data. Seven, comparability and compatibility. Eight, metadata. Nine, interaction with the data provider. Ten, opening and uploading of the data. Across these ten different categories. Then there is a further sub-categorization. About 118 different things. We won’t read those now. This is one framework across these ten. Across each of these, there are some specific actions. We might even call them open data heuristics that we can apply to this problem.
One when I think about metadata and the ports of metadata and we sort of bridge that to the issue of how do we make consumers and those end users better understand these data sets that may have been generated by professionals or in scientific context or other places. One way that seems to be worth exploring would be to let those users comment and provide feedback on that data. This gets back to the social media and the tagging question. If you had experts tag ten data sets and then you had a community of users tag ten data sets, odds are you would probably get a very different group of tagging across these two groups.
One thing that we should explore is how can we increase the tagging, the metadata provisioning by the community members themselves to help in the understandability, the usability. I will just put that out there.
DR. MAYS: Also, there are some cheat sheets at your place just so that you are know. There is one about California public health data, which Bill sent. I think you have this as part of your homework ahead of time. And then there is another one here in terms of the guidelines for OSTP data access plan. A lot of these will list some principles and the kinds of things. It is very much the way that Kenyon is talking about as well.
MR. DAVIS: Is your list of ten in our package of information here?
MR. CROWLEY: I do not believe it is. I will forward this article —
DR. COHEN: Kenyon, I really liked that last point. Metadata for whom. Because I can see communities being interested in very different things than an app developer in terms of descriptions of what is important about the data. I do not know how to discern a principle, but in terms of what metadata should be universally provided, but I like the notion of tailoring or thinking about metadata from the perspective of different potential users. Findability and usability certainly are characteristics that differ by your background and your knowledge.
In our previous conversations with communities, it was incredibly striking about how little some folks knew about data that I would have thought were readily available. It is just because they do not live in the same space that I do. How do we connect them? What can we do to make that space more accessible to them by providing information about the information and where we locate that information so that they can grab it more easily?
MR. CROWLEY: Indeed. One thing your comments made me think of — if we want to see what do developers think or what do consumers think or what do others think, you can categorize those tags by those different subgroups. If I am going to build an application, let me see what all the developers say about this data set. Instead of going through everything to consumers and others, it may not be relevant for me. I think maybe finding that framework to segment by specific community member types might be something that we can think about as well.
DR. ROSENTHAL: This sort of gets into the top-down versus bottom-up. What you find in a lot of other areas is there is some sort of official top-down representation. But then in terms of bottom-up, if you have it classified by user types, novice to expert or if you have particular community types like we have three of them here starting out with those types. Any time you do a bottom-up kind of learning center or tagging or even ratings, if you can basically cut it by those types and say I am looking at a data set. Is it five stars in general? Is it five stars for the expert? Is it five stars for the community members? Is it five stars for the developers? These are axioms that you tend to find in other areas. If you were going to frame it into a principle, you could say in order for usability and findability, you should have both top-down and bottom-up mechanisms. Bottom-up mechanisms is a corollary. It should be defined by community types. By the pre-existent or naturally occurring would be one way to frame it.
MR. DAVENHALL: There is also some fundamental metadata that it would not make any difference whether you were a rich person or a poor person. It would be like having a hotel and you said we need a radiant system for rich people and we need a radiant system for poor people.
DR. ROSENTHAL: That is right. Bill, that is why I said top-down and bottom-up usually having both of them. In most systems, you have top-down, which is fundamental agreed upon units and that is what HHS does really well. That is committees. We can hash that out and do that. But then underneath it, if you want bottom-up, the other side of the system represented, this is the way you typically do it.
MR. DAVENHALL: Like a hotels, they ask you now your ratings needs. Are you an elderly couple that is traveling or are you a young active one? And then they are differentiating. That could be the same thing as top-down and bottom-up. By the way, I am the older couple traveling.
DR. FRANCIS: I wanted to just tuck in building on what Larry said and something Linda also said to me although I don’t think to the whole group, which is we are really talking here about improving the usability and user friendliness of data. We are not talking about a general stewardship framework, which is a really important distinction. When I look at that title, essential principles guiding access and use of data, I get confused on that point. Maybe Linda has a better way to put that.
MS. KLOSS: I will offer one suggestion. I think when you develop principles like this, you need some framing mechanism. We are saying across this whole continuum of stewardship, we are focusing on access and use principles. I think that makes sense. It does not say that we are covering the whole universe like origination. I think that works as long as we keep it in that. There are probably going to be times when we back up and say there needs to be some principles like we did with our stewardship principles. What is the purpose for this data? What was the original purpose specification? Because that may be very useful to whoever is accessing and using it although it may not be a principle of access to use, but having that knowledge of other dimensions of the life cycle of that information could be useful.
I think you can drill down under access and use, but there are going to be times when things come out that aren’t really access and use. There are other parts of that. Life cycle stewardship. You may find that you need to broaden this question to principles — in which case then I guess I would say maybe there is a usefulness of the framework of stewardship that we just have been agonizing over at NCVHS with some ways to categorize some of these concepts that go beyond strictly the guidelines for access and use.
DR. FRANCIS: The reason I put in something there about use that was worrying me was data repurposing — there are some limits on when it is okay and when it is not. Some data can be repurposed however. Other data maybe not.
DR. QUEEN: Let me just ask a question for clarification. How to spell assume. I was assuming that in this discussion we were talking about data that would be disseminated to the public. For HHS, it is data that we are making publicly available. In terms of repurposing, it is a public use file versus when you are obtaining restricted data for which you can use it for this project. If you want to use it for anything else, you have to go back to the privacy board and get a re-review or make a new request.
MS. KLOSS: I really was not putting limits on it. I was just saying that there are going to be times when a guiding principle might get us into other parts of the life cycle where you need to know something about how is this data defined.
MR. DAVENHALL: Do you see a stewardship? I guess I would put it another way. Do you see that data access and use is an attribute of stewardship?
MS. KLOSS: I see it as one of the building blocks. Absolutely.
DR. MAYS: Bill, do you want to share because you actually have some slides, but if I can get you to do it probably in ten minutes rather than — because there are lots of slides. Can you say what you were thinking in terms of your slides?
DR. COHEN: I was wondering whether it would be helpful to actually take an example or some measures that we can think about. That would help me understand. This discussion is getting a little too theoretical. I would like it to be more practical. If we took a measure — let’s say teen birth rates or smoking and at the community, they want to see the prevalence. A researcher might want to see the sampled design and the way it attached to that came up with it. Someone who is trying to build a database with aspects about the data collection. I just need some more concrete — some substance to tie this discussion. It would be helpful for me.
MR. DAVENHALL: What you are going to look at here is just some ideas about how to rain in the chaos that we seem to always circle when we get talking about this. Let’s go to that next slide. Another framework just to explain what a W4H2 is. It is just answering these questions. These come up over and over again. Who, what, where, when and how much and how many? We keep asking ourselves these questions. It all depends on what you are talking about.
Down at the bottom and I would say just — I don’t mean for us to read these now, but if you start to go through there, you will see it makes a difference whether you are talking about poverty or disparity when you go down this road.
DR. MAYS: You have these also.
MR. DAVENHALL: This could be easily a world factorial. If you get a couple layers deep in, you are lost. What I did was I said if — where do we get this data. Wet get it from transactional systems. We get them from inventory data, utilization data, and cost data. Most of the people probably in this room know where some of this data is hid or hidden from view. We use the same data over and over again. You get to number seven and you realize there is not a great smorgasbord of data to select from particularly if you want to approach it for every community in America. While we have 1600 data files out there by HHS, you go through there and you can ask that one question. How much of this data is available for every zip code in America? You quickly get down to a handful of files that you could even touch. It is an academic question about whether we have a thousand or 1600 because it boils down to what I believe are just a very small number of files that represent to me the greatest stewardship opportunities in the federal government. Obviously, CMS and National Center for Health Statistics, I believe, are the two greatest sources of most of the data that we continue to use across the continuum of answering the questions of community health and human services.
These files are not like there are thousands of files. There is a few like originally equipment manufacturers. There is one data set. Take apart the payment files. A, B, D. What we do is we all get into the business of refining these data sets and these data sets sort of propagate the world with other kinds of data sets. But in all reality, they are all drafting in behind these two great data stewards, the National Center and CMS. I am sure there is many that I have missed. If you try to force it to the issue, what is going to have the greatest good for the greatest many? You are going to come back to these two data sources that are replicable. They have been coming out over and over again with the exception of the part B, which Josh mentioned. He does not know when the next version might come out. There is no indication of whether that is a renewable resource or not.
DR. ROSENTHAL: That is just an example of the part D and part B where they made the news. They got greater fanfare than almost anything else put together and there is no indication of any forward plan with either of them or in the example I sent out yesterday about hospital mistakes. We made the data available, which is fantastic. The question is is available available this one time or is there any priority going forward? It does get into a bit of the stewardship, but is primarily making that in reference to are you able to create a value proposition from it. That is the key question you need to be able to answer.
MR. DAVENHALL: Remember that little problem we had in the healthcare.gov website where physician specialties were screwed up and locations of where doctor’s offices and practices were and then how about in the part B when one physician was billing for 25 other ophthalmologists in the same clinic. This has everything to do with stewardship of data because if you had stewardship principles, you were following. No agency would ever allow data that was that inaccurate to see the light of day. That is where I would say this is where the stewardship principles have to be in play. Otherwise, what is happening in the world that I see today is a lot of this data has been static data used by the research community in which it was not important exactly when that research would be concluded. It is now moving into operational use where people are in real time connecting to this type of data.
It is like if a health care provider this afternoon moves their office practice to a different location, there are people building systems now who are actually piggy backing on that operational database to give maps to patients as they leave the offices of where that referral is located. You can begin to see the demands on our data — this is why we have this committee. We need to tell you that these things are beginning to occur that suddenly now what was good for research isn’t going to be able to satisfy the operational requirement. And the level of precision and accuracy at this data, the stewards have to begin to change.
This is something to think about. Every time you bring up about what data you are talking about for what purpose, what are you trying to measure and then try to thread your way through that so that you stay on focus. That was just explaining what each one of those lines in that framework looked like.
We ought to define what a high-value data asset is and begin to distinguish this from low value. If a state — one state data set — if we are going to spend all our time talking about California’s data set, that is not going to do the other 49 states very good. That to me would not be a high-value data set for this community to spend a lot of time on. Now, if we got to where there were 25 or 30 states that had a similar kind of situation, maybe it would turn our heads.
This is a new file. It does not actually exist today, but it could easily exist. I called it the CMS payment and use, public use file generating from existing files. If you take a look at that, I have pushed the envelope and said if you just added this phrase to it with the caveat that it applies all confidentiality rules apply and we go to the Census Bureau to get their methodology who have done a very good job of suppressing low-cell frequencies, you can meet that test.
Now, I don’t really like zip codes that well from a geographic point of view because they are not historically consistent. But here is the problem. Everybody lives in one of these and you know what it is and it has great ability to communicate descriptive differences and variability across the country. I will just say if you want to go to not necessarily the worst case, but the best case, you would force that. That principle would be if this does not have adequacy, if this does not meet the needs of every American community, why are we even discussing it.
And then you have all these various categorizations that have occurred from the professionalization of the data that would allow us to start to step through this data in such a way that people get usefulness out of it. You may want it by every ICD-10, but who could stand that data load? You can take advantage of some of the categorization that has happened and start to put out these files at a national level by zip codes in some sort of fashion where they are immediately usable.
Now, there is another thing that is happening here, which I call an analysis ready data. I actually have another name for it called smart data. This is where you combine two pieces of data together to create a new piece of information and save the poor soul from having to link it on their own. Most of the files that we get from our federal government now require extensive data linking. They will have a DRG code and then you have to go find the DRG literal if you want to show that to a mere mortal. If you think about all these files we have created in this field in terms of usability, it is long off into the future. You have to have a lot of computer knowledge and resources to be able to link all this data together so you can have some meaning. Think about the provider numbers that CMS has used over the years. How many different numbers do we have? There are four or five. We have UPIN numbers. Any doctor could tell us.
They give you these numbers in the files, but you have to be a student of this to make use of that data. There — for steward. You have this responsibility to make it understandable in clear English. That is what this means about analysis ready. Decode the information. Make it smarter. Don’t make people go through all these hoops.
MR. CROWLEY: I just had a comment there. If you think about analysis ready, analysis ready for a consumer might be an Excel file. Analysis ready for a research might be a Stata or SAS file. If you talk about the concept analysis ready, that needs to likely play in.
DR. COHEN: For a consumer, it is more likely to be a dashboard than an Excel file.
MR. DAVENHALL: It really depends on what you are trying to do here of course. If you want usability, you have to start answering those kinds of questions. If you expect consumers to use this information intelligently — think about the price of gasoline. Think about who collects that, how is it collected, and how the various different levels, levels one, two, or three use it. You get down to the point is where I just need to know on my Onstar — that is an operational use of the price of gas that I can get in my car as I am driving down the road. And if you follow that back, you see that we have the same kinds of issues. At each level, you have the professional who actually gets into calculating. What parts of the country are experiencing what kinds of increases in the price of the gallon of gas? Then the people that do the forecasting of what it is going to be like on Labor Day and you begin to see we have the same issues.
In our case, we have never been really focused in on following that line all the way across satisfying all the potential users that could be using that piece of data.
DR. VAUGHAN: I think just to pick up on a bit of the thread as well to not pre-judge by the way in which we publish the data who the users might be. While it is very fashionable to dis on a PDF, for consumers, that is maybe what they want and need.
For a researcher in public health to say there is an API for the most part is what are you talking about and yet they are critical users of this data. It is inexpensive. And in the rush to publish as an API, which is a changing ecosystem in and of itself, it is inexpensive to publish it in multiple formats. That is an easy fix that could open it up to a whole other universe.
The other point that I just want to double back on when you referenced the Census and all their fine work is also to maybe include in that conversation the amazing work of FGDC, which has been a collaborative across many disciplines. Federal Geographic Data Committee. They have done such fine work across departments, agencies, public, private, nonprofit to really hash out a bunch of these issues already.
MR. DAVENHALL: This first data set was answering the question what is going on in our community. Who, what, where, who, how much and answering all those questions. There is a data set.
Then we go to number two. Health service demand forecast. The next thing most communities like to do is figure out what is it we need and how much of it do we need and where do we need it. All those same questions again.
Here you have the steward of the National Center for Health Statistics who have massive amounts of data, sampled data that now could be constructed in which our statisticians at NCHS could begin to model this data so communities would not have to figure out how many AMIs they are going to have in their community. We have enough data now to know that there are use rates for every possible ICD-9, DRG, MDC, physician office visit by subspecialty that exist. And the problem is we just have not envisioned that that is something that we could do with our data and give every community in America the ability to answer the other half of the question, which is how much of this is going to happen in our community. Obviously to be adjusted by experts who come in and say the reason why we are not going to see that many AMI is we have this very successful cholesterol-lowering program in our community.
While all payers claim sounds like a good thing, we sit already on a mountain of data that we already are supposed to be stewards of that we are really not exploiting to the full extent possible. Just an idea of how we might do that. If you start to do this kind of thing like age, sex, and race adjusted, think about from an analyst point of view, you do not have to try to do this yourself. I would say if errors are going to be made in trying to estimate the demand for a health care service, they are going to be made here because people are not going to take that into consideration when they do that.
It is almost like you need a PhD, a biostatistician to help do this. They are not cheap and they are in short supply. Someone has to take the responsibility of building those kinds of methodologies at a national level. Again, the analysis ready means that we would decode this.
You have worked with the Health Interview Survey, but this is a massive amount of data that I am afraid that sampling gives it a bad name because people think since it is sampled data, maybe we do not need it anymore. But I would argue with you. This is probably one of the greatest assets that the federal government and DHHS have. It just needs to go to the next mile.
Number three is — I call it the NCHS national low birth rate public use file. There are three states that now produce a zip code file of all the numbers of low birth weight babies that are born in their states. We reveal that to the public within six months. I said there is an easy asset to capitalize on. Have every state in the union step that up so that we get it within maybe a week.
Part of the issue that I know our committee has had over time is we live in a world of real time. We get drug into sitting here thinking about things that we — if every birth is electronically reported 99.9 percent, why do we wait 18 months before we know how many low birth weight babies are in Larry Green’s zip code that he lives in. Why? What is the answer for that? A good steward would tell us what the reasons are for that. And then we as a group of people could decide whether in fact that makes sense. As Joshua would say, it does not make sense in any other industry. If you had that data last hour, you should have it the next hour. It should be immediate. There are all sorts of caveats I am sure that Bruce is going to tell me.
DR. COHEN: I will only name six or seven of them. First of all, it is illegal for many states to provide that information.
MR. DAVENHALL: That is interesting all by itself.
DR. COHEN: There are variations in state regulations around release of data at that level.
You are right on in terms — I think there has been historically this desire to make sure from the data collectors. I do not know if they are actually the data stewards. To make sure every file is 100 percent complete and 100 percent accurate before you release any information. I do not know whether this is a cultural phenomenon that needs to be overcome, but I think people are reevaluating what it means to provide provisional and preliminary data and provide folks access with incomplete data that can be used for a variety of purposes. I see this changing at the federal level. NCHS is certainly committed to more timely data release. And many states would like to be able to do it, but they just do not have the resources to be able to accomplish essentially what you are asking.
There is the impediment of the law. There is the impediment of the resources. And there is the impediment perhaps the most egregious impediment is the politics of this situation releasing information at the local or community level without it going through the proper chain of command to make sure that we can respond to the vagaries in the data.
None of these are insurmountable, but if you ask why it takes 18 months to get the data that are electronically transmitted from a hospital to a central registry, there are reasons. It does not mean it cannot be sped up, but carefully.
DR. MAYS: Two cards went up. Let’s try and get them.
MS. KLOSS: I really like your concept of illustrating new high value. I think this discussion teases out something that might be a potential principle. But this notion of fit for use. If the data is released today without complete scrubbing, it is fit for certain levels of use, but not others. I think it is possible to describe that as a principle. There is always this tradeoff between timeliness and accuracy. That is something that can be attached to a data set and be a principle.
As you went through there, I heard five of these kinds of principles emerge. This fit for use providence. Where does the data come from? I think that is something that we probably should articulate as a principle. Confidentiality. Where it applies. The small numbers issue. Metadata. And this quality timeliness cutoff. I think what you are asking Madam Chair or Dr. Chair is very reasonable. This illustrates some real good areas of principles. It could be articulated in a way that do not limit.
MR. DAVENHALL: There is this concept of operational usefulness. For example, you have to decide whether knowing a baby’s low birth weight has operational value to society. Just like you have to decide if you are in social services whether a hotline call on child abuse is an operational principle. In other words, are we required to act within so many minutes or hours or do like my state Arizona where we have 6000 that are over 18 months old of hotline reports of child abuse. I would say that to me is a problem of stewardship. It is like there is no stewardship principle there that says that each one of those needs to be handled in a certain kind of fashion with certain kinds of ability to audit and check it out.
I would say we need to expand and be more inclusive with the people who might use something like a piece of birth data from an operational data point of view. If people who are into nutritional support of low birth weight babies, if you think that is an important long-term goal of society to improve the health of children then you have to give people the information that they operate on and discover what those variabilities are. That is what I would argue, Bruce, that in some of these cases, I think it is more cultural. It is like let’s make sure people understand what the operational importance is of these data and then the rest of it will be a conversation they can figure out.
MR. DAVIS: I just wanted to ask a quick question with regard to the provisional use of the data. Does anybody know of a methodology for saying it is good enough? Let’s put it out there now with the indication that it is going to evolve over time. Here is what we have done so far. In six months, you are going to get these two additional valuable components of this data. How do we make the determination as to when it is in fact good enough for use when in fact we have already said don’t try to determine all of its uses because you never know how somebody is going to use it. How do we balance that?
DR. NILSEN: Government does this all the time — labor statistics. They are always changing.
DR. VAUGHAN: Exactly, but that is my point is how do you do that.
PARTICIPANT: On a narrower timeframe, USGS, when they are reporting out an earthquake, their first numbers are actually provisional. They have a whole system of how that is reported out. Similar for commerce, which is a longer time horizon, but in terms of actionable quick turnaround stuff, probably USGS does it best.
DR. MAYS: I thought that was either — I cannot remember if it is Medicare or Medicaid has recently decided to give us provisional data. They have a set of what you should and shouldn’t do with it. I think they tell you very specifically which analysis are not the best to use with provisional data and all else is fair.
MR. DAVIS: Open FDA has done the same thing with their APIs. What I am wondering is — here is where we cut.
DR. COHEN: There is a lot of development in this field about definition of what provisional means and what preliminary means. The way I think of provisional is a real time up to this point in time. I think of preliminary as files that are almost complete and that are useful for a broader range, but have not been completely scrubbed and then final data that is generated for use as complete data files. They are deterministic. It depends on each particular file. For vitals, it might be the percent of all births or deaths that are reported. The percent of deaths that have cause codes established so it is based on the completeness of the file and the number of records. Where there is an inter-jurisdictional exchange agreement, some states get all of their in-state data a lot before they get data from surrounding states. I think each data set establishes criteria for what it considers provisional and its use. NCHS and the states are beginning to have these conversations.
One approach for the need for real-time data is the emerging idea of the last 12 months available because year-to-date does not work because of seasonality and small numbers and how data get transmitted to the central repository whether it is the state registry or the survey entity. If you take off the 13th historic month and add the most recent month, you can get a string of 12 months that address a lot of the data so you will have virtually real-time data, but there will be enough data to use to make stable estimates. I think there is a lot of work that needs to be done about how you generate statistics around this concept, but I think it is gaining more credence in the field to produce more real-time data. This is a whole topic for the national committee or for this data workgroup to get into and how to evaluate the quality of data, different points in time to be used by whom.
MR. DAVENHALL: I would just like to say I would piggyback on Larry’s issue about who is the grand steward in the middle is that a group like this needs to weigh in to determine whether such a data set as low birth weights is a, I use the term loosely, business imperative for government. It means start to look at what we are trying to do as a government and say this one is going to have a higher value to the entire population. If we are unhappy with the status of the way babies are going or children are being abused and so forth, we together as a data steward say okay. This has a business imperative as an attribute of stewardship. In my personal opinion, while we collect all this data for statistics, birth statistics are not very valuable in the operational sense of improving the lives of children and infants period. Historically maybe, but not on an ongoing basis. Who is going to bell that cat? I think that is the question Larry was rhetorically asking. Whose job is that?
DR. FRANCIS: I just wanted to go back to what Linda was saying about fitness for use because there isn’t a one size fits all answer here. How quickly you need the data, how accurate it needs to be depends on what you are using it for. To use your example of a hotline, with a hotline you want somebody going out there maybe. You need identifiable data and you need it stat. On the other hand, changes in the frequency of low birth weight, it may very well be that you have some pretty good longitudinal data about how quickly these kinds of things change and which way data are trending. It is the need to spend a lot of money or confront state laws or whatever you need to do to get it very close to real time is probably much less there.
What I see here is that there all kinds of different reasons people want data. Everything from producing an app where people can find doctor’s offices in real time. That you want real-time data. On the other hand, looking at trends of numbers of physicians in a community, maybe six month’s lag isn’t so bad if they are the way things trend tend to be relatively stable.
DR. MAYS: We have about 15 minutes before — I just want to make sure we understand how much time we have left for this discussion. We have about 15 minutes left before we have to move to the next discussion. In terms of comments, let’s get them in and particularly if you could say what the principle is. That would be helpful as well.
DR. KAUSHAL: I have two principles. Number one, again my bias. The intellectual capital or the volume of the sheer intensity of it will always be greater outside the committee rooms. There are 15 to 20 of us. We are nothing compared to the millions of people outside. That is one principle.
Number two, which is the next point on afterwards. I find it very difficult to come up with a hierarchy of value of data here. I think we can understand some use cases. But the best use cases are always going to be figured out by those millions of people outside. Much of the discussion has been around the encyclopedia thesis and I think we now live in a world where Wikipedia is out there. Decentralization, commoditization, and transparency will create way more value than 15 of us sitting in this room.
DR. ROSENTHAL: That was the top-down versus bottom-up. Can we say as a principle there should be some access for not only generating metadata from bottom-up, but for actually use case from bottom-up or usage from bottom-up that we shouldn’t value the data on a preset hierarchy based on what we think the market will value, market beyond business, market of communities, intellectual square, et cetera? Can we also say something to the preliminary versus other types of views like principle? Should we err on the side of getting it out earlier than later with the appropriate caveats whatever that typology may be?
MS. KLOSS: I would think any data set should have some estimate of its accuracy. Confidence intervals, however, it is expressed.
DR. FRANCIS: We need to be careful because certain forms of inaccuracy can be really scary.
DR. ROSENTHAL: Anything we are saying as general principles like absolute. Let’s have a principle to caveat every principle. I am serious. That way we can just get —
DR. FRANCIS: I am serious about how some inaccuracies could be really scary if you don’t — they are dangerous. It is not just — suppose somebody released information about five cases of Ebola in New York City or five suspected cases with a caveat.
DR. ROSENTHAL: And there is over precision. We could go through a list of 12 of these things for any one of them. All things being weighed. I am offering. My suggested principle would be erring toward the side of getting out early with accepted — some sort of scale. I would call it confidence interval, but whatever you want to call it.
DR. MAYS: I might call it benefit and harm, something where it is like — does this outweigh the harm that it might do. Is there a benefit by issuing a warning based on so few cases? I think CDC does this. One could turn to CDC for when they do this.
MS. KLOSS: What about a principle of where the data comes from? Just the providence.
DR. MAYS: Can you talk a little bit more about that what it would actually be?
MS. KLOSS: I think we have great examples here. This comes from birth certificate information. Where did this data get gathered in the first place?
DR. ROSENTHAL: If you were doing a top-down approach where you would typically have proper generation from source and also type, is it a survey? Is it a medical record? Is it a claim? Is it a device?
DR. MAYS: Other principles? There was one of the slides that I did want you before you finished a comment on.
DR. ROSENTHAL: May I add one other principle on this analysis? Lily was mentioning that that was Niles old shop. There are departments. CMS has a department dedicated to this. When they released 1 to 30-day post-discharge for hospitals, it is actually on an index. Are you over-indexing, under-indexing? You are starting to see some of this coming out of CMS. As a principle, the ideal would be release interpreted data in addition to regular data. That is kind of, Bill, your part of having a biostat. You are seeing that at CMS at least. But if you do so, clearly delineate which is which, which is a primary versus a second order fact, which is a compound versus something real.
DR. MAYS: I would also say in addition to that to release it in ways in which you are able to use it at multiple levels. You might have an infographic for one group and statistics for another and then the file for somebody else.
DR. FRANCIS: I would also monitor, evaluate, and reassess something like that. It is really important. It might be you release something, but then you need to revisit that.
DR. MAYS: Reassess what? Let’s be clear what you want to reassess. Harm and benefit or use?
DR. FRANCIS: What you got out there. The metadata. The whole thing.
MS. KLOSS: Reassess it to update it or reassess it to determine if it is still relevant or how it has been used. I think there is a lot of learning from how it is being used.
DR. NILSEN: Isn’t that part of the bottom-up approach where you are having your users tell you how usable and how everything in rating it based on your level?
DR. ROSENTHAL: If you just allowed your users to pick a drop down one of three categories, the three categories we had, do a rating and leave comments.
MR. DAVENHALL: It is like each one of us becomes an expert. You can imagine. I look at every data file from a geographical perspective. I can tell you where the sins and the warts and pimples are in almost any file I touch from the federal government. At one point like in this thing about the problems in access and use example I am providing here is, how many errors do you need in a day to file before you discount the value of the data file? Again, it is bumped up against your operational use of it and how you intend to use it. People who do workforce studies of figuring out whether we have too much or too many or not enough of a certain kind of health professional, they can seem to do okay with data that is a couple of years old. Maybe even a decade old. I do not know how they would do that. They do not seem to have a problem with it.
In this particular case I pick on, the fact that this government does not have a certified database of all health care providers with accurate geographical locations of the people we send checks to as a health care provider. I keep wondering when is this going to dawn on somebody that we need an accurate phone book.
DR. ROSENTHAL: That is probably outside the providence of this. I would say that if that is the goal of like you are absolutely right like different data for different people. Anybody been to eBay? When you go to eBay you see stars? How many stars? Quality of the product? Shipping? In terms of solving that issue if we are after a solution, there is a pretty simple bottom-up solution that has been used all day long that I have mentioned three or four times. We don’t have to look into, but I am just going to throw it out there yet again. Like allowing your users to self-identify. Am I using this operationally? Am I a researcher? Am I a community member? And allowing them to rate each element of that. Is it speed? Is it size? Is it frequency? Is it latency? Is it adjudication, et cetera? If you are after principles around that like allow your users to self-identify and rate different aspects of it and then I can go in and sort and see. Show me the best data sources for this or that based on my peer’s reviews. It is not objective top-down. It is not something Walter would be doing, but it is something that is fast and easy to get done.
MR. DAVENHALL: I am just picking on these that if you want to look at what could be high value data basis that the government owns and manages, they are sitting right in front of us. This is like one of them. Can you imagine how many people leverage the master file from CMS of the provider of service file? The only problem I would say with it is there are thousands that do is that everybody has to reinvent a quality QC program to deal with that data. There is no process by which that data gets cleaned up. There is another principle. Does the stewardship have the capacity to take corrections and updates in some sort of process that improves the quality of the data? This is one little file that if you cleaned up this, you could be assured that everybody — there should be only one place where someone wants to know where all the hospitals and doctors are located that we send checks to from the government. There should be one place you go to get that data. Right now, there is a whole host of people you get that data. They all start off with the original equipment manufacturer and then clean it up. They would not be happy, Damon, to have the government fix this and make one file available. It is because they made a business out of cleaning that data up and reselling it to all of us. It is an example of one of the problems you will get into from a business perspective where you are actually going to trample on somebody else’s business idea of taking federal government data.
DR. ROSENTHAL: This happens all day. What do you think the part B and D — basically, puts disease management. We can go through examples. Any time you open something up — something else. I am sorry. I am losing clarity on what we are trying to accomplish here.
MR. DAVENHALL: In that case, Josh, it is kind of interesting because I say the fact that they threw all that data out there, I see a lot of comments from people who have plowed through this data, making comments on blogs about the quality of the data they were finding.
DR. ROSENTHAL: As opposed to their commercial — this is all outside this field of this discussion unless we want to go this way. I am happy to go this way in great detail if we want to, but this is probably outside the scope of what we are after.
DR. MAYS: We are actually also hitting our time where we have our guest is coming in.
DR. BERNSTEIN: I just quickly wanted to respond to what Josh said about eBay and the star ranking system. eBay and the star ranking system is not self-identification. It is everybody is registered with an email address that eBay can find. It has a huge infrastructure for appeals and ombudsmen and so forth. That is a lot for the federal government to take on. What you are talking about maybe is more like Yelp or something, which has other problems like trolls and other stuff that it would also take a lot of management for the federal government to deal with. I am not saying that it is a bad idea. I am saying it is more complicated than just throwing it out there and saying we can self-identify with star systems. There has to be something behind that that the federal government is willing to back up with some resources.
DR. ROSENTHAL: Or to state that they are clearly not doing it. I used eBay because I assume with this crowd it is the most notable. We could use Yelp, but actually use a typical community forum. There are clear instances I can show you and a variety of the verticals where they do throw this out and it works easily.
DR. BERNSTEIN: To the extent that if the federal government is going to make that available, I am not saying we wouldn’t do it without —
DR. ROSENTHAL: That is the core assumption. Top-down or bottom-up.
DR. BERNSTEIN: I am just saying it is possible we would do that, but it is unlikely that the government would throw something out there and say we are going to make this available, but we are not going to be responsible for what happens on it.
DR. NILSEN: We always moderate those things. And what you are talking about is that stewardship again. You cannot just kind of let things run freely.
DR. ROSENTHAL: I understand. There is a range. I understand. You can’t just throw it out. You can’t just let it run freely. What I am proposing is a very different system. I can show you a variety of examples where you can moderate it. You can do it in a way you have been doing it. It seems as if we have heard a lot of complaining and rightful criticism around why things aren’t working as they should be. I am proposing another alternative like completely on the other side of the table where you could put it out there and disclaim. Should you do it? I can show you a variety of examples.
DR. BERNSTEIN: — other agencies. What I hear you saying is that the federal government only works top-down and never does this other thing. There are plenty examples where we could do that. We could work in these kinds of principles that you are talking about into principles.
DR. COHEN: The principle should be there should be a mechanism for data —
DR. MAYS: We are going to put a hold on this. If it is a summary statement, fine, but our person is here.
DR. COHEN: I think the principle we are all talking about is there should be the capacity for users to evaluate government data sets in some kind of ongoing way so that other users can gain from that information. Right now, there is no ubiquitous mechanism that I know for users to provide that kind of input and feedback around those data for most health data sets.
DR. ROSENTHAL: I apologize. I thought I said that.
DR. COHEN: That is essentially the principle. One other thing around and it is a perfect segue to Chris who is going to talk about NHIS. Most states end up using BRFSS rather than NHIS because our focus is on state and sub-state data. When we think about geographies and the kinds of data that are generated by the National Health Interview Survey at the neighborhood and community level, those data just are not accessible. And the two options are — BRFSS and through the county health rankings have generated county-level data and they are pushing it to sub-county levels.
I think the NHIS discussion is great. The question is what is the federal government’s responsibility for sub-state data. There is a huge variation among federal entities and policymakers about the federal requirement or commitment to sub-federal level data. Perhaps regional data because there is a regional federal network. I haven’t seen that consistent commitment to generating data at the state and sub-state level for the federal government data providers. I really think that is an important fact or context we need to keep in mind.
DR. ROSENTHAL: I guess I might not have been clear at all. My apologies. My point is you have users either self-identifying to groups who can do it or you can do a top-down approach and mandate it. Whatever it is my absolutely principle I think we should seriously consider some sort of bottom-up mechanism. As with any analogy, there are faults and holes. You use an analogy to illustrate a concept. You can use a variety of other ones if we want to. Yes, you will have to pick a choice as a continuum between ease of operation and resource dedication and compliance and control. There are all sorts of choices to make along that continuum.
DR. MAYS: With that said, we have a principle out of it. We have several good principles out of it, but let’s ask Chris to please join us. Welcome. Thank you for coming because I know that you literally were at a meeting in which you were integral to the meeting and ran over here to join us. We very much appreciate your time.
Agenda Item: National Health Interview Survey (NHIS) Data Release
DR. MORIARITY: I am sorry about that for that reason. I have been out of the building most of the afternoon and I have missed the discussion you have had up to this moment. I may say things that you have already covered or whatever.
Anyway, good afternoon everyone. I work here at NCHS. I have been in the division of Health Interview Statistics since 2005 and that division’s main responsibility is the National Health Interview Survey. I have been asked today to give a main focus in my little talk about what we are doing for data release specifically in the electronic arena.
I think you are mostly familiar with the National Health Interview Survey. I am going to use the acronym NHIS for it. I will say a few things about it very quickly. I then want to go on to a flavor of some of the products that we release and that is evolving over time. I will talk to you about what we are doing now. And just to provide a little bit of background because things like the Internet were not always here, et cetera. I just want to give you a flavor of what were some of the things that used to be done and how we have moved from those beginnings to where we are now. I, of course, cannot predict the future, but I have some idea about some of the initiatives we are looking to do to continue to improve on how we do things.
As you probably are already aware, this survey has been in continuous operation since 1957. It is a large national survey. We have sample in all 50 states, the District of Columbia. On a given year where we don’t have any sample cuts or augmentations, we are interviewing about — we are collecting information on about 90,000 people in about 35,000 households. And the regular content stays pretty constant from year to year. It is a cross sectional survey in a given year. We interview at addresses and we typically try not to go back to those addresses for the remainder of the sample design period.
When there are new technologies, for example, the flu mist vaccine because that was new and we didn’t have anything about it, we add it. Again, we are one of the official federal government surveys, legislation that put us into existence along with what is now the National Health and Nutrition Examination Survey. We are mandated to collect these data and disseminate the results of course in a way that we do not disclose the identity of any of our respondents.
The current questionnaire was put into place in 1997. As I say, it undergoes some incremental change from year to year. We are looking to another redesign a couple of years from now. I will mention that in passing. We have what we call our core. And other than some fairly minor changes, most years it stays pretty constant.
But one thing that typically does change is that we typically have other federal agencies who want to sponsors questions in a particular topic area and we refer to those as supplements. There is typically some variation in that from year to year.
Because of the structure of the current questionnaire, we release a variety of data files. I do want to mention that there is one that is not shown here in the diagram. In addition to producing a family file with one record for each family, we additionally produce another file for one record for each of the people in that family. That one is not shown here explicitly. And the number of supplement file or files will vary from year to year.
I could talk for a long time about these products in detail. I do not intend to do that. I do want to say a little bit extra about a couple of them though. On an annual basis and we try to be timely. We try to get these data out to the public within six months after the end of the calendar year. Actually, there was a lag a little while ago. We weren’t six years behind, but we were about three. We had to dig out from under that, but we got on the track that we are on now. For about the last eight or nine years we have gotten our public use files out within six months at the end of the calendar year. It consists of all the files that I showed you on the previous page along with other products. A couple of months later, we have imputed income files and so forth.
DR. MAYS: Can I just ask you to say the format these things are in just so we know like whether or not it is available in what? Like if you have reports, is it paper? Is it Internet? That will be helpful to us.
DR. MORIARITY: The micro-data files — I will come in more detail. I will come to back to that. Our current main method of dissemination is through the Internet. And the format of the files is just regular ASCI files. It is not in any kind of a proprietary format.
The annual reports that we have been releasing on an annual basis along with the change of the questionnaire in 1997, we have three main reports. Historically, they have been paper. We then also started releasing on the Internet, PDF, electronic versions of them. We are currently in the process of moving away from any paper release of these and moving more towards just the content of those reports released as Internet tables.
As these come up as they come up, we will have reports on specific topics, for example, alternative and complementary medicine, et cetera. A lot of these are related to the supplements that we run.
There is a CDC weekly report that comes out that can typically contain something called a FastStat. That can come from many different sources. Some of the time, the FastStat comes from National Health Interview Survey data. Something that I am going to come back to on my next slide so I won’t say anything more about it right now is our early release program. And then we have some other things such as our — that says healthy. That is actually a typo. That should say Healthy Stats. Sorry about that.
Let me say a little bit more about this because this started with one thing and over time it has gotten bigger and bigger and bigger. First of all, this has always been electronic. When it first began, it was just the one report that was issued quarterly. It was a short report about selected estimates based upon a subset of year of data. We do them quarterly. The first one comes out at around September and it is based upon the data for the first three months of the year. And the next report comes out in December and it is based on data from six months of the year. We need six months between the end of data collection to check the quality of the data, work with our colleagues at the Census Bureau who conduct the survey to get a stable data file. Then we need to do our processing, develop our sampling weights, come up with our estimates, write the reports.
Now in more recent times in addition to the selected estimates report, we also now have a report available quarterly that is a focus on health insurance coverage.
Twice a year. This one comes out in December and June. It is based upon the first six months and then the whole year is a report related to wireless substitution. And then very recently this just started a few years ago. What we started doing with the first three months, the first six months and the first nine months we created preliminary microdata files. And we transferred them to our Research Data Center, the RDC, and researchers who wish to go through the normal application process to put in a proposal, have it reviewed, et cetera can have access to these preliminary microdata files in the Research Data Center.
DR. MAYS: Can I ask you a question about the reports? The reports are — like who is the audience? Would the report be something that the consumer would want to read and look at? Are the reports given to the press? What we are trying to get at is the level at which these things exist? That would help us to have a sense of its advanced stats or something like that or is it something that is a one-pager that can go to the press?
DR. MORIARITY: These reports have evolved over time since the inception of the program in 2001. The selected estimates report is typically of lengths of 15 to 20 pages that includes a brief technical appendix in the back. It is primarily a presentation of data in the form of tables, graphs, charts with a little bit of accompanying text to provide information so people can understand. We are trying to make this report available to a very wide audience. It starts off with highlights. Someone who just wants the very brief summary gets that right on the front page. That is the way that we are doing the early release reports now. They try to follow that. Probably the smallest thing in terms of our releases is the FastStats, but it is very limited at the same time. What we are trying to do with our early release reports is we have tried to focus on specific estimates that we think are of particular interest to consumers of information from our survey and provide ongoing estimates on a quarterly basis for those particular statistics.
Did I address your question? Thank you.
MR. CROWLEY: Can I ask a question about the release? How does the NHIS data flow from CDC to data.gov and healthdata.gov?
DR. MORIARITY: Honestly, I do not know the answer to that question. That is not something I personally am involved in at all because I am not in the programming part of the division that I work in. I would presume and again I do not know if our agency is an active partner in this or if external sources from the other government websites are just coming to us and downloading it. I am not aware that we have a mechanism in place where we are automatically transferring copies of our data to other government websites. We do publicize to our listserv on our Internet site. There is information released through various press avenues. Again, I do not know about that because that is our public relations office that does the details of that. If I had to hazard a guess, I am assuming that the other entities come to us and obtain it.
MR. DAVIS: Excuse me, Chris. I am sorry. My name is Damon Davis. I work in a CTO’s office on healthdata.gov where NHIS is catalogued I believe. Once again, it is always just a system of cataloguing where you can go find the data. Yes, we come to you.
DR. MORIARITY: When you say it that way, does that mean you don’t actually have a mirror. You are actually directing people through your portal to us. I think that makes sense because we try to limit the number of times that we have to do what we call a re-release where we discover some anomaly that we consider serious enough that we feel that we need to replace the resource that is out there with something else. A hazard of someone maintaining an independent mirror and they don’t do an ongoing check some or some other procedure to verify they have the most up-to-date data. Then if we were to do something like this and it wasn’t noticed on the mirror, then they would be out of date at least for that product.
MR. DAVIS: That is exactly why we don’t maintain copies of data necessarily for any data system. We simply catalog where it lives so that the data stewards, the owners, the curators and collectors of the data actually maintain the most current copy and there is none of that needing to reconcile mirrors of the data.
DR. MORIARITY: We are aware of places like the University of Michigan that historically has a separate archive. As long as they wish to continue doing that, we of course have no objection. But anyone using the data that comes from the University of Michigan they need to realize unless the University of Michigan is doing an ongoing process to make sure they are up to date with us, they probably are most of the time. We actually provide references, check some values that people can go do these things themselves if they wish to verify. But we do not have the resources to do anything like that ourselves. We would have to depend on — for example, your resource if you were to do that to do it for us.
Now, of course, we don’t just put the data out there and then go hide in a corner and don’t answer our phone or anything like that. This is, again, something that has evolved. It is continuing to evolve. One of the things that we have been doing since the 1997. A lot of change has occurred, the questionnaire, and a lot of other things. Beginning with that year, we started producing what we call the survey description document. This is something that many of us in the division contribute to. It cannot be comprehensive because we could not finish it and it would be too big. But it is a typically a rather voluminous document that can be in the range of 75 to 100 pages. It provides information about the various topics that are covered and the various files. It describes the files. It describes recommended procedures for analyzing the data.
A group of my colleagues within the division are the data request team. When questions come in from the outside, they are the initial recipients. If they can answer it themselves, they do. If it is a question — let’s say they have to come to me because I am the one who generally provides the technical stuff related to weighting and various estimations and so forth, they will reach out to me and say can you help us and provide an answer to this question. We do that as part of our support.
For a long time through CDC Atlanta, we have had a listserv, National Health Interview Survey listserv. The subscription is open. It is a moderated listserv. This is one of the avenues that we use to announce to people who are subscribed to the listserv when we release data and when we release reports. We publicize the existence of the listserv and encourage people to join it.
MR. DAVIS: Do you have any idea of the number of people on the listserv?
DR. MORIARITY: Approximately 3000 subscribers. They are primarily in the United States, but the last time that I did a check, which was a little while ago, but I would expect it would still be the case. We had a nontrivial proportion internationally as well.
MS. KLOSS: Have you seen a difference in the data request volume now that that information is also on dotgov? Are those requests coming not so much directly to you, but they are accessing it without you having the ability to keep track of it?
DR. MORIARITY: I would say that we have never had data requests declined. We have had some ebb and flow over time. One of the things we try to do is if we notice we are getting a similar request over and over again, that tells us that perhaps we need to look at our online documentation and determine if something needs to be changed or added to.
We are keeping a better track now of the volume of requests that we are getting through the data request team mechanism. It used to be a little bit more informal. I think that is useful to us. I think that we can do more. But again, like a lot of other things, the people who are on the data request team, that is not their only job.
Coming back to your question, we do get requests on pretty much a daily basis. There is some ebb and flow.
MR. DAVIS: Chris, if I could. This is for the group’s understanding as well. One of the things that I have been interested in doing from a data and communications perspective is do a more closer coordination with folks like those in your group. When you have a listserv announcement, I would love to support that with also a healthdata.gov blog. I would also like to go out on Twitter and announce the fact that you have announced something so that there is much more robust set of communications that support the data. I am using you as an example because it came up in your presentation, but that is the strategy that I am looking to employ across the board. I have been reaching out to a lot of communications elements of the department in order to try to create that suite of communication.
But I would love for you if you could to just take back the idea that I would like to reach out to my communications team and say every time you guys are thinking about some kind of listserv announcement, let’s consider its weight in terms of being worthy of being on healthdata.gov as a major announcement. I know some of the listserv things are probably small updates. Just if you would consider that, I would appreciate it.
DR. MAYS: One of the reasons I was asking what things were like as we are trying to get a sense because this is the premier source of health information for the nation. To the extent that like if a regular — it often makes the news. Then on the news the person says oh my gosh. Statins are bad or something. And then if they track down calling you. Like I am really trying to get a sense of — because the way I am hearing it, this is what we do in the statistics community. We are just trying to figure out some other ways where just person Joe X, if they call or if they could get the information from you. I think Damon kind of got to some of the things that we probably would suggest. But we are just trying to get a sense of the level of that.
Bill, you had a comment you wanted to make as well? Bill, did you have a comment you wanted to make because you had raised your hand?
MR. DAVENHALL: I am familiar with Advanced Data. I was just curious as to whether your publication is similar to that. Is it called Health Facts?
DR. MORIARITY: NCHS has stopped releasing Advanced Data. I forget what the successive report is called. I did not have it explicitly on my list. We do release — in the successor to Advanced Data and I am sorry I don’t remember the name of it right now, from time to time, NHIS-related reports are in that series. The list that I gave you several slides back was not comprehensive of everything. It was more of a flavor, but definitely wanted to include what we consider the most important things.
Again, over time, as we have had resources to do so, we have put up extensive online documentation. That includes our survey description document. That is available online. But again, particularly when we have been asked the same question over and over again, we have tried to put up more. For example, we have several areas that we call special topic areas where we have gotten a lot of questions related to the topic smoking. We have gotten a lot of questions related to health insurance, questions related to race ethnicity. We have what we call special topic areas that are a collection of information that focuses specifically in that area.
Another thing that we do as an integral part of our microdata release. We are not looking for people to use any particular kinds of software, but we do have a sense of certain software that people tend to use quite often when they are working with our data. We do create some example programs in a couple of different languages. We make those available publicly. This enables users who wish to analyze our microdata that instead of having to write their software completely up from the ground, hopefully they can at least borrow from our example programs and save themselves a lot of time and trouble.
Because the National Health Interview Survey is a personal visit interview, it is much different from a simple random sample. It has what we call a complex sample design. If you wish to get valid sampling error estimates, you need to use specialized software. Now the variety of software has expanded a lot in more recent years and our support has also expanded accordingly. At one time, there were only a few software packages that could do the analysis correctly. Now there are more and over time, we have expanded our support accordingly. It includes even examples, not complete programs, but say if you want to do this, use these particular key words. Again, it saves people the trouble of having to go get their manual and read it and read all our stuff and try to figure out what is the match. We are happy to help with that because we want to make sure when people are analyzing the data that they are getting valid results.
MR. CROWLEY: Just a quick comment. I think that is a terrific service that your group is providing. I would add just along the social construct. As people take the sample data, make their changes to it, create new insights and new code, we might think about what are ways that we can take that code repository, bring it back to the community and re-share it.
DR. MORIARITY: Now, again I just want to provide a little bit of background in terms of where the survey started with data release and some of the major steps that have occurred over time.
For the first five years of the survey, the data were collected. They were tabulated, analyzed. The printed reports were released, but the microdata were not retained. At this point, for the first five years of the survey, all we have are the printed reports that were done. We do not have the microdata anymore. The microdata files started to be retained. When we were having fiscal year here in the beginning, the survey ran on a fiscal year basis, which was July 1 to June 30 of the following year. July of ’57 corresponded to fiscal year 1968.
Now, of course, back in those days, there was no Internet like there is now. Some things were happening with various universities starting a network, but nothing like we have now. There was a time when people could call a phone number or send a mail, send something through USPS mail to order our data, which were usually on mainframe computer tape or cartridge and they usually cost money.
And then back some time and I don’t know the exact date, our agency got a website. And then we were told our agency has a website. Go start putting stuff up related to the National Health Interview Survey. Essentially, we were starting with a blank slate at that time.
We were not given any new staff to populate this new resource. This was done particularly in the beginning as we had the resources and because this was a new medium for us, it was a learning process. One of the things that I think often happens in these situations is people put up content. But the way they structured it, it goes out of date. If they don’t come back and look at it often, now we have inaccurate out of date content online. We have tried to move very hard to not let that happen and for the stuff that must remain dynamic that we update it on a timely basis to make sure that if it is changing, it is changing in the correct way.
In terms of releasing the data, the first year of data that we put on was the 1997 survey. For the first couple of years, we only had the resources to move forward. But then finally after several years, we got resources that allowed us to also going backwards. We then started to — for the older years that we had the microdata, we started going backwards as well. Again, this took a long time because it wasn’t until 2009 that we finally finished getting all of our existing public use microdata files online.
Looking to the future at least one little bit of it I happen to know, we are continuing a trend that started several years ago and it certainly began in earnest with the early release reports in 2001 where more of what we release is electronic, less of what we release is printed. I guess the day could come, I do not think we are here yet, where there won’t be any more printed reports. But I think that day will come at some point in all likelihood.
Another thing that we have devoted a lot of resources to in the last several years is the development of an online analytic system that would allow people to come to an Internet website and request real time analyses of NHIS data. This is an entirely new thing for us. It is has taken a lot of effort to develop the prototype that we are currently testing. We still have to get several important clearances and approvals even after we complete our own testing before we are going to be allowed to put it online.
Again, as we recognize either ourselves or by getting feedback from external users that we need to make additions to our content, make our existing content better, we are doing that.
This is always important to us and we are not going to let this change is that we are a federal agency. We must maintain the quality of what we do. We also recognize that it is very important to be timely and we want to do as much as we can to be as timely going forward as we have been and perhaps in the future — I think at this moment, we are about as timely as we can hope to be given the way we do things. If we can continue to find innovative ways that allow us to do things while maintaining quality even faster, we will be happy to move in that direction.
DR. VAUGHAN: May I ask a question? I am so interested to hear that you are working on this online analysis system. Do you anticipate seeking out groups of beta users to perhaps give feedback as the final approvals are evolving?
DR. MORIARITY: I am not in a position to give you a succinct answer to that because in particular we must obtain clearances related to security within the Centers for Disease Control and Prevention. I do not think that they will allow anyone who is not a federal employee within our agency to have any access to the system prior to that. I think certainly once we first get what we would call something that we are actually allowed to release to the public, I think we are going to be very interested in getting feedback from people using the system. We just hope that people understand that it took a lot of time to develop what will be out there. It is probably not something that we can re-tool instantly, but we understand. This is a first step in a process that is going to be going forward.
DR. VAUGHAN: For example, though, there may be folks within universities where there are already the secure facilities where you can actually get those graduate students, get the faculty into using it and maybe give back that next little feedback as you are rolling it out.
DR. MORIARITY: The major barrier to doing anything like what I think I understood you to suggest is that this online analytic system will include some of our internal data that we do not release in our public use files. So far, security people — this is within the Centers for Disease Control and Prevention. They have been adamant that we do not release such things outside without the most — for example, the only way I am aware that we currently allow that to happen at all is that we have a relationship with the network of research data centers that are run by the Census Bureau.
DR. VAUGHAN: For example, the University of California has two of those. You have to be sworn and there is a whole security procedure already. But there are thousands of us who are sworn and use this. Just a suggestion. It would be exciting to hear about it. I am glad it is coming.
DR. MORIARITY: Again, I don’t want to get too technical in a conversation like this. It seems to me that given that we would not want to try to go make a mirror of the entire system somewhere outside, I think the only feasible way that something like this could be done of course with all the necessary approvals is that we would have to have some kind of special encrypted connection. But again, I am very concerned that the security people who still have to give us the final approval, they might say we don’t even want to take a risk on something like that because what if somebody was able to break into it and then start stealing.
DR. MAYS: We have one summary slide and I have a few people I know that are going to catch planes.
DR. MORIARITY: Again, just to give you a flavor of what I have talked about today. We started with just releasing printed reports. We have moved way past that. In terms of how we distribute the microdata, we don’t do the computer tapes anymore because we do not have a mainframe anymore. We stop doing CDs about eight or nine years ago. It is very clear to me and I think that this is a general trend both within and out of government that the way we get information out is electronic.
Thank you very much. Here is the URL for the NHIS webpage.
DR. MAYS: Thank you. Chris, let me say thank you for doing this because this is the first time that we have done it. You kind of came in blindly in the sense that we didn’t know what questions we wanted to ask. We did not know exactly what there was. We appreciate your patience with us in the sense that we are just starting this process.
One of the things that we are going to try and do is to think through. The charge of the workgroup is really to give feedback to HHS about ways to increase and to think about innovation in terms of both data access and data use. The NHIS, as we know, is flagship survey. It is the survey that provides the US an incredible amount of information. We are going to try and think about it. As we have gone through, we have asked you questions and given you some suggestions. But we also want to give you some suggestions very intelligently. We are just starting this process. We are going to ask you to bear with us in the sense that we will probably come back to you at another time or come back to NHIS. It may be more questions or it may be some suggestions.
Before I end this, I want to see if people have any other comments, suggestions. They can throw some suggestions out. But we will try and do it in a much more coordinated fashion as we get ourselves together. Charlie kind of let us start with you. Again, it is kind of like the test case.
Any other suggestions, comments? Anything you want to ask us?
DR. MORIARITY: No, I am finished talking. I am happy to answer a few questions.
MS. BRADLEY: Regarding the listserv, I have been curious if there are any examples or have you ever heard it discussed that those would be turned into some sort of more of an open archive — a question we were asked earlier this year. Someone might just find that history rather than if they just joined the listserv having to re-ask the question. I participate on the NHANES listserv, for example. They are fantastic and they get really into the nitty-gritty about questions about variables and changes over time. Do you know of any limitations regarding publishing listservs?
DR. MORIARITY: Do you mean having listservs where subscribers can be posting to the list? Is that the kind of model you mean? Did I misunderstand you and you were talking about accessing archives of things that were previously posted? I did not get your point.
MS. BRADLEY: A bit on both. NHANES actually does say that they will not post it or there is some restriction on privacy. There are Google groups. That is what I am most familiar with where you can just go back and see what has been said before. W3 consortium. The prior dialogues are just available online. Would that be amenable at all?
DR. MORIARITY: I believe it is correct that it is possible to use the listserv query mechanism to query the archives of past posts. I am not aware that — this is something I do not think would be specific to the National Health Interview Survey. This would probably have to be an agency policy decision or perhaps even a CDC policy decision before we would be allowed to be an active part of some kind of a forum, an electronic forum. I am not aware of any plans to do that. If anything, I have seen over the time since we first received electronic mail us constantly being told when your official capacity, do not do this. Do not do that in terms of going to public sites and appending your official email address to something that you are putting there because the attitude is you might be doing something that we haven’t cleared, we haven’t approved. I think that any initiative of that would have to first — there would be have to be approvals and clearances before any direction like that would be taken.
DR. MAYS: Damon and then Josh.
MR. DAVIS: I was curious and I apologize if I missed it during your presentation. Did you mention the development of APIs at all into the data?
DR. MORIARITY: I am not aware that we have done — we have not moved in that direction at all. We have just tried to focus on getting out — it reflects the structure of the questionnaire and how where in various places in the questionnaire we are taking from a family only one adult, a sample adult, only one child if any children are present as a sample child. And we are typically releasing these things in separate files, but we are just releasing simple flat ASCI files. Nothing more complex in terms of XML or anything like that. We haven’t moved in that direction at all.
MR. DAVIS: Has there been a conversation about alternative or more advanced file release types out of curiosity?
DR. MORIARITY: I have not been a party of any of those kinds of conversations. We actually at the moment have some homework to do related to things that we have already released because in the beginning when we started doing our — we have always compressed our files so that they are easier to download. But in the early days of us doing that, the contemporary operating systems that were available did not have by default a mechanism built in that allowed people to automatically decompress those files. For a time, we were releasing our microdata using a Microsoft specific self-extracting file. Now, we are realizing that that does not work properly with other contemporary operating systems and actually, it is even causing issues now with some of the more recently released Microsoft.
But for us, it is not simply a question of we just go fix that. What we have to do before we do that is we also have to change all of the documentation to reflect that update because it will have a different file name and so forth. Again, we know this is on our to-do list, but because it is in our older data and we are so much absorbed in current production and being timely, the resources to even get to that basic homework — we have known about this for a while and we haven’t gotten to it. That is a long-winded way of saying in terms of something like what you have brought up, I don’t think it is even on the horizon right now.
MR. DAVIS: I appreciate your honesty in that. But I think it is important to underscore for the folks in the room and on the phone that this is a perfect example of a lack of resources to modernize something to what we would very much like to have. This was, as was said, a very important survey that has some incredible data in it. Unfortunately, this is a staff that does not have the resources to allocate to some of the more modern data access types. I did not mean to put you on the spot as much as hopefully highlight that you either yes were thinking in that direction or as you have said, no, don’t have the resources to even fix some of the legacy problems that you have longed identified as issues. Thank you.
DR. MORIARITY: Lack of resources has been a challenge we have faced as far back as I have been involved in this survey. We need to make do with the resources we have available and use the taxpayer’s money in the most efficient way we can. We do make a good faith attempt to do just that.
DR. ROSENTHAL: This piggybacks off Damon. Whether it is an API versus flat file or whether it is finding customers and generating demand for usage with limited resources. How do you view the role of someone like an HDC, Health Data Consortium, a public-private entity that is just dedicated to doing that, doing that by what I mean is helping people find the data? Realize the applicability of it. And also a team and network of public-private individuals. There are kids I work with all day long that can turn ASCI files into APIs in about ten seconds. They would gladly do so for HDC. Within limited resources, how do you think about other partners and expanding your employee work hour account through those sorts of resources?
DR. MORIARITY: Well, I did already mention a very close collaboration that we have with the group at the University of Minnesota that has done a number of initiatives with both the decennial census, public use microdata files, and other data systems beyond our own where they have produced a derivative product, the integrated health interview series. When someone else outside of us has the resources to do that and we can establish a collaboration, again, within our resource constraints, we are happy to do that.
DR. ROSENTHAL: It seems like looking at HDC not only for the file release type, but in particular the user demand would be a conversation that would be mutually beneficial I would think.
DR. MAYS: Chris, thank you very much. We have kept you over your time. Your time here with us was much appreciated. It was very helpful.
Is there anybody online? I keep hearing somebody going in and out. I guess they dial in for what they want.
We are very close to the end. Can I push through or do I have to take a break? The staff are saying a five-minute break. Five minutes. Please make it only five minutes. We are so close to the end that we want to make sure we finish up.
Agenda Item: NCVHS Usability and Access Parameters for Various Use Cases
DR. MAYS: Okay folks, we are very near the end. It has been a full day and it has been a full discussion. Part of what I would like to do at this point in time in terms of the working group is talk about what kind of feedback do you think we can start thinking about giving to NHIS. I think some of you were really good about things that you are thinking about and giving it now, but part of what this was is this was the first. I think one of the things we are learning is maybe questions we want to ask beforehand. I think we can do that by email. I do not want to do that now. But what kinds of feedback before we lose the momentum would you suggest for NHIS? Let’s start with Wendy and then Bill.
DR. NILSEN: I think Josh’s comment about HDC, what the data consortium can do. I was thinking of Lily’s comment when she was talking about archived. It is hard to archive in the government or make it easily archivable, but it could be in a public place. They are extending what we can do through others as appropriate, not delegating our congressional authority. Figuring out what links there are and how they can complement seems to be that HDC could help. Because obviously, they are doing a lot with a little.
DR. MAYS: Exactly. They are doing a lot with a little. Good idea. Bill.
MR. DAVENHALL: As you all know, I serve on the advisory board of the Health Data Consortium. I would say that they actually — I am not speaking for the whole committee, but as a member of that, we see ourselves as somewhat of a translational group capable of as objectively as possible, looking out for the interest of the wide group of the ecosystem who could have and entertain that kind of conversation. All the way from open data kinds of issues to specific federally-sponsored data sets.
I would say it is our vision that we will end up helping educate that ecosystem to what is available, how you would use it and start to put into practice all these things we talk about here. The stewardship standards or whatever we want to call those. And then invite in the residential experts, the national centers, the CMS, the NIH, everybody who is actually in a business of producing this data to help translate this body of data and data collection methods into a group of people that have a much richer understanding of what is possible.
DR. MAYS: Let me just get specific because that is what I am trying to get a sense of. Would we give you a recommendation to the HDC and then you would contact them or we recommend them to contact you and say what it is that you could do?
MR. DAVENHALL: You could give us the job of going out into the business community and begin to discover what it is that they would like to see happen in the data environment. You could give us jobs like that to do. It is not to be representative of the whole ecosystem, but to the people who joined the Health Data Consortium as members and there are a lot of people who join that as members. We could use the Datapalooza. The Health Data Consortium owns the Datapalooza event, which every year gets about 3000 or 2500 people. We could organize certain learning groups around that.
DR. MAYS: For example, a letter memo, et cetera, from us to HDC making that request would be something that you would receive and then you would do a little bit of fact finding and then come back and respond to us.
MR. DAVENHALL: Correct.
DR. MAYS: Is there any cost to this?
MR. DAVENHALL: No. First of all, I would say I would put you in touch with a person we just hired to be the liaison with the membership. In other words, this person would get engaged and figure out how we were going to do that.
DR. MAYS: I am a little blown away. Any other — we are expanding on your HDC.
DR. ROSENTHAL: You were talking about the liaison they just hired for this. That is great.
MR. DAVENHALL: Vickie, I think part of that feedback question you asked about for the National Health Interview Survey would be able to — HDC could help sponsor listening events where people like Chris come and other people within the national center could do exactly what he did. We start to ramp up what people really know the national center does and can provide.
MS. KLOSS: Could you also float a question like what principles would be used. What would be useful to — just send some e-surveys to gather data from that broader community before we lock into principles and that sort of thing.
DR. ROSENTHAL: You do a session on Health Datapalooza so users come here and pre-submit the principles you think would be helpful.
DR. NILSEN: We did novel data sources at Health Datapalooza last year. If I had thought about it, we would have captured all those comments. There are opportunities. You can put them up there. You can get exposure. We could have collected data.
MR. DAVENHALL: I would say we could continue to curate those. That would be a good way to put that. You would get a filtering process going on if people would comment about it. Robert Wood Johnson Foundation is still a heavy sponsor of the Health Data Consortium. They are very much interested in the methodologies like that that bring forums together for solidification of what I would call principles.
DR. MAYS: Let me ask two very specific questions just so that I am very clear. If we wanted to do something like what is being suggested in terms of using Datapalooza, when would we need to have this by in order for you to float it? Datapalooza is usually in June. If we wanted some questions, is this something where you are doing it during Datapalooza or something where you just can send emails out if we are like floating a set of principles to get people’s comment on those?
DR. NILSEN: I think we need to talk to them and figure out what would be the best way to do it. Datapalooza may or may not be the best, but they would know.
MR. DAVENHALL: The Datapalooza is just — it is an event that attracts a lot of like-minded people. That would be the place to launch something in a more formal way.
DR. NILSEN: Maybe you could get 15 minutes on the big stage to ask a question.
DR. ROSENTHAL: They will have stuff beyond — they do things beyond the event as well. There are a number of users and — there are things beyond just that event. They would be good to tap into regardless.
MR. DAVENHALL: The working group could actually have a — like Josh said, the working group could have a session and invite people.
PARTICIPANT: A listening session.
MR. DAVENHALL: Right. And even a moderated panel or something like that. But I would say you would have to act pretty soon because things are already in motion for Datapalooza for 2015.
DR. NILSEN: We could weave through strategically too. We could have done that in the research session last year without even thinking about it. You could have done it in your sessions. The tracks could have been — you could have weaved these questions in.
DR. ROSENTHAL: We were given carte blanche to however we wanted to.
DR. SUAREZ: What about the regional Health Data Consortium organizations? There is a whole host of those that are out there. I was looking at the website. They announced a couple more additional consortium members. There are regional groups that can be brought in as well. Some of these principles can be queried and tested with them I would think.
MR. DAVENHALL: And actually, the Health Data Consortium is in contact with, I would say, almost every one of those as either a supporter or a soul mate in their activities.
MS. KLOSS: Is it a federated model?
DR. SUAREZ: They are affiliates. They are considered or called or referenced as affiliates of the Health Data Consortium regional affiliate members.
MR. DAVENHALL: There is one in Chicago. There is one now in California. There is one in Texas. There is one in New York.
DR. ROSENTHAL: They have users and people who engage with them, classified into developers, business, data, communities, research, thousands of them. Around events or organizations or just at large if you want to do it online asynchronously. They would be good to talk to for this sort of thing. That is the mandate or the charge as part of their mandate so they need to be doing it anyway.
DR. SUAREZ: They are closer to the ground ultimately.
DR. NILSEN: Also, it does not have some of the same constraints. If HHS is doing it, there are very different constraints. If they were looking to support the efforts of HHS, it would make life much more flexible, usable, faster —
DR. ROSENTHAL: That was kind of what was behind my question in terms of what was your goal and how do you view this.
DR. NILSEN: If they were looking to help us be able to use these things more usefully then —
MR. DAVENHALL: So most of these events in the states are not really totally funded by the states themselves. They have commercial sponsors and so forth. The ecosystem is even much larger. You get a lot more participation, but that is what makes it go.
MS. JACKSON: Just so you know, you did have a session at the June Datapalooza with Damon and Justine Carr and a lot of that information is — we can look it up and see just where that is archived, use it as a building point, as a point of reference where this group already has a inroad and then kind of build up on it.
DR. MAYS: Okay. Great suggestions. Here is what I am going to suggest in terms of next steps. Since we have decided that is okay to work in between meetings, part of what I need to do is work with Lily and others to capture the richness of what we came up with in terms of what the principles were and then what some of the suggestions are.
And then we should schedule — we will send you that material out. I will also send you — I am assuming if you presented it here, Bill, I am assuming it is okay for me to also share their slides with everyone. We will send you those so that you have them.
And then I think what we need to do is to figure out what it is that we can accomplish in terms of looking at what we have developed in terms of the principles, figuring out if there is feedback to give to NHIS or whether we want to put on hold giving them feedback right away and get ourselves a little bit more organized.
In the future I think what we want to do is to have a set of questions, kind of specific things that we want to know kind of based on some of our guiding principle discussion. We may want to know, for example, tell us about your metadata. Tell us about specific things. I do not want to rush too fast and not be as helpful to NHIS because that is what we first want and we want to do it well enough that other agencies say it is worth it because they are going to chat with each other. I spent four hours putting slides together and what they gave me I could have thought about quickly myself so instead. I want to see once we can pull everything together what we have.
I also want to explore this notion of the HDC. Debbie, I will have to talk with you more. Staff will have a much better sense of when we start going outside kind of the best way to do that. Bill, we definitely will want to talk with you and hope that you will work with us if this looks like a feasible thing to help us generate a letter. That will go to your group.
Are there other things that you want to suggest that we try and put on our plate in terms of next steps?
MS. BRADLEY: How many meetings would you like between now and early December?
DR. MAYS: What I would say is it kind of depends on how quickly all this stuff can get pulled together? You tell me when you can get the other notes and then I can tell you how many. I would love to have at least two.
MS. BRADLEY: We will have one in October and one in November.
DR. MAYS: That makes sense because we are meeting in early December.
MS. BRADLEY: One thing I heard was you would like to schedule them earlier.
DR. MAYS: That sounds good. Anything else?
The next thing I would like to do is several of you have — Bruce is gone. He did his in the beginning. Some of you have some opportunities, things that are coming up. I want Wendy to talk about mHealth. This is a time for you to go around and share if there are particular meetings, something that you think this group should be aware of, something that you are doing that you think is so cool that you want to share it with us so that we know a new skill set that you are bringing to the table. Let’s start with Wendy.
DR. NILSEN: We have talked about it before, but the Wireless Health meeting is coming. That is October 29 through 31 at NIH. It is a co-sponsored meeting between the Wireless Life Science Alliance and NIH. It is all about wireless devices. It is all streaming data. It is contextually rich data. It is a source. NIH will be coming up with a couple of announcements later. I would say it would be this month. I will say by November we will be coming up with some announcements in this area to try to make use of these data and to really start to think about them in a better way. There are a lot of opportunities in this and there is a lot of data being collected. We are talking now about some of that is going to be part of the discussion in there. It is all good. It is a totally rapidly growing area. Obviously, there are so many questions about privacy and security right now. We are working with everybody on how do we think about it.
DR. MAYS: Anybody else have any announcements?
MR. DAVENHALL: I wanted to let you know that they have hired someone to replace me at ESRI. It is Esti Garrity (phonetic) who is a physician from UC Davis. He was the former deputy director at the California Department of Health. I would like to tell you that you would be well served, I think, by her participation. It is not that I will go away really quick, but my goal would be is I would introduce her to this group down the road. She drove that open health data initiative in California. I think she will have a lot of great input that she can bring to the committee.
DR. MAYS: Okay.
MS. KLOSS: I just want the group to know that the Privacy, Confidentiality, and Security Subcommittee of NCVHS will finish up in the next month or six weeks a community data reserve toolkit, which has some guidelines on stewardship that are really written for data use projects in communities or really outside the scope of the traditional HIPAA boundaries. We hope that you will help us widely disseminate these. We think that this will be practical advice on — checklists. You can drop into whatever section you might have some questions about. I think by the time you meet next, you will have the official final version.
DR. MAYS: I am just going to do a thing on behalf of Bruce and ask because we were and Susan is working on this. We were also trying to think of federal agencies to invite to the October 27, 28 meeting on community data readiness. We were trying to reach those fed groups that often have big committee constituencies like SAMHSA and HRSA. We were also trying to think within NIH. I guess I thought of NCI that has patient navigator groups, NIDA, and NESARCs. I do not know if there are any others that you can think of that we might want to reach out to.
DR. NILSEN: I will shoot some emails out and let you know.
DR. MAYS: Any other announcements?
DR. SUAREZ: Very quickly. I am not sure if anyone attended the Healthy Communities Data Summit in June in California.
DR. MAYS: It was when we were here. It was June 11.
DR. SUAREZ: I do not know. It might be helpful to look into how that event produces some really worth looking ideas. I heard a lot of very good things.
MR. CROWLEY: If you haven’t already heard, on October 10 and 11, my group is hosting the workshop for health IT and economics. It is supported by HRQ. It is focused on the latest research around the adoption of impact use of information systems in health care. We have about three dozen research organizations from across the country coming to share their research. These are predominately academics like for an academic conference-type program.
But we have also mixed in a number of leading policy makers and keynote speakers to have a multi-disciplinary conversation including Wendy Nilsen who is in the room who will be on a panel that will be hosting on different funding opportunities in health IT. We have some open data related sessions. Dr. Taha Kass-Hout from the FDA who is the chief health informaticist who will be giving a talk on Friday. Kelly Cronin from the ONC on their health transformation efforts. We have some really good panels. Some wireless health related activity. Looking at some of these issues of ubiquitous care as we have more data collection going on in real time everywhere including Doc Kim Patrick(phonetic) from UCSD who is a real leader in this space. American Telemedicine Association. Just a lot more. It should be a fun two days. It is a great opportunity to really get a deep dive download into the research as well as network with a lot of the folks who are looking at issues in this space. We will have a nice reception on Friday evening at the Carlisle Club in Alexandria, Virginia.
I will send Debbie a note around or Lily along with — I can send you a discount code. But I know for some of our friends where budgets are tight, just send me a note. If you need an admission, you can be my guest. I would be fine. With that, I hope to see some of you there. It is October 10 and 11. It is two full days.
DR. MAYS: Any other announcements?
What I would like to do at this point is just spend probably about ten minutes evaluating the structure and the approach that we use and getting a sense of going forward if we want to do this. I think one of the things that I will ask Damon is kind of to do a follow up. We are going to treat it the way that we treat with Jim where often he will bring an issue. We will either make suggestions or ask questions. And then as we plan the meeting, Lily and I will try and say can you give us an update on what happened on X and Y and what have you. It will need to get more and more robust. Right away, he may not have as much. But over time, that is what the plan is.
I want to get a sense of splitting this meeting into two parts and how you felt about it and what kinds of things might we do a little differently or if we are okay with trying this or that that we have. We are probably not going to have a data user come in and one of the surveys come in next time because we need to take some time and go through this and process this. How much time we need to do that I think that is what we are going to have to find out. But figuring out how to very specifically help HHS and kind of being on the spot responding to HHS or kind of the two ways that we have broken the meeting down. I would like to go around and give everybody an opportunity to talk about the meeting. Please, feel free to — if there are things that you would like to see differently. This is how we got here, which is I listened to you. I would like to listen to you again because I think each iteration we are going to get better and better and actually be more productive. Can we start with you, Mark? We will go this way this time.
MR. SAVAGE: I missed the last meeting. My apologies for that. What I found myself doing in this is both appreciating what we were hearing, what we were learning, but also wondering how this fits into what we were supposed to accomplish this year. How does this connect to what we are going to be giving or what our outputs are in order to improve things? I am not sure how this fits, but I trust you to know how this fits and that is the important thing.
DR. MAYS: I will just answer briefly because I want to make sure we get around, which is one of the things is that our discussion fit in terms of what our charge is. More explicitly, this was a request from HHS for us to do this. Now, my understanding is that most of the committees in the December meeting are actually talking about what is it that we are going to do for the next upcoming year. Am I correct, Debbie? That will be one of the things that will be on our agenda is to really say what do we want to try and accomplish and kind of some timelines. The staff actually have to know what the work output is and how much we are trying to do and how to coordinate that with the others. That is about the best I can answer it right now.
MR. SAVAGE: I will just throw something out. The interoperability plan that was mentioned earlier. The national partnership and the consumer partnership for eHealth, which is the coalition we run did put in some comments. One of the things that we said is to — yes, we need to build interoperability for what we have now and we need to build it for what is coming. We talked about the increased attention to social determinants of health. I lift that up as an example of data sets that we may not be — if we look backwards, we may not be thinking about. I think there is a lot of help that we can provide in getting that into the system.
DR. MAYS: I am going to count on you as the looking forward person. If we are looking back too much, I am going to count on you to do that. Thank you.
MR. CROWLEY: Did I understand what you are asking — what are some of the ideas for programs and projects?
DR. MAYS: No, more evaluating if this is working. If this is going to get us to having better outcomes and better products in terms of meeting what our charge is. Kind of the process that we went through today.
MR. CROWLEY: What are some thoughts around how we might evaluate our success?
DR. MAYS: And whether you like the model that we just used in terms of the meeting of having Damon with us and then working on some specific product or being helpful in a very particular way to HHS.
MR. CROWLEY: I think the format works well. I think being able to bring in those who are closest to the data, closest to the issues and having some feedback back and forth in the group makes sense. Having the data owners who are charged with communicating, disseminating the data and reacting to their needs and thinking about ways that we might be able to influence that makes sense.
I think it would be useful, as we have mentioned before, to come up with what are a few specific projects that we can put some teams on. Everybody does not have to be on the same project. Let’s put some projects together that have some defined goals, some defined timelines, some deliverables and outputs we want to get out of them. I think they naturally should flow from the charter elements and perhaps for the next meeting maybe we can think about having a part of the session where we brainstorm. These might be the five or ten projects and then we could go through a filtering exercise based upon everybody in the room. Maybe these are the top two or three that can have the most immediate value or the longest-term value for the group.
DR. ROSENTHAL: Bruce has said this. Some concrete examples to work through. I am always about output. What are we trying to do essentially? Are we trying to come up with a list of axioms? That is great. I also am very — I get confused easily and lost in conversation easily. I love having a big old slide up there during a meeting saying this is the thing that we want to get done by the end of the meeting. We want five axioms a person or we want to agree on five axioms or whatever it is just so we all know what we are talking about and what we want to get done whether it is a project or whether it is something else because otherwise I get lost in all the big words and back and forth.
MR. CROWLEY: To add on to what Josh said. I think a project Josh and I might enjoy working on together with someone else is this whole bottom-up approach. What would that future state system in which we are engaging with the populous in appropriately technologically current ways look like?
DR. ROSENTHAL: What are the arrays of possibilities? It is always a balance between ease and openness and control rightfully so for accreditation. What are the permutations of it?
DR. SUAREZ: I couldn’t agree more with what has been said. I am very much a deliverable person. I am focused on what is the outcome, what is the output. We deliver six items for this meeting earlier during the last couple of days. I think it is what allows us to focus on activity and results ultimately. And ultimately, until we have something in writing prepared, finished, and published and disseminated, we are doing a lot of discussion and gathering of information. In terms of the process, we are working fine.
I think the one thing I would say is we should have a list of these are the — I like very much the idea of having what are the top projects, but then also what are the deliverables. What kind of reports or letters to the secretary or documents to someone, whatever we are going to produce, we should identify them because at the end that is what reflects on our ability to produce something that is going to be helpful to others.
MS. BRADLEY: I thought that the format was really nice. I do wonder — I thought there was a shift away from letters for this working group and that that we would be able to be more responsive. I wonder if we could capture some of this. And for sure, it is in the transcript, but if you wanted to just summarize some of these things in blogs. Is there some way for members to take this a little outside of us? As Chris was saying, we are not always encouraged to go and talk to the public. As an agency, perhaps, this is more about the workgroup talking to the public.
One thought I have was just is there a way to interact via the blog. I think healthdata.gov is always looking for the blog, but just sort of understanding. I am trying to capture some notes here about what are the principles. But I am also trying to envision what is this going to look like. Is this a Wikipedia page? And then helping me understand maybe — Vickie was talking to me about setting up a data-sharing site. What technology that would be most helpful to you? What universal calendar would be most useful to you? I really did enjoy this. Thank you.
DR. SUAREZ: Let me ask one quick question on that. When I suggested letters, I did not mean the other letter that we do as a national committee that takes some time. I am just wondering as one of our roles and responsibilities of advising agencies and specifically HHS, that advice has to come in some form whether it is a written report, a summary, notes, something. It has to be somehow prepared and presented.
DR. MAYS: I think it is a work group. First of all, we do act and we do it in conjunction with the full committee. But I do not think we have exactly the same constraint. For example, we could communicate with a group, as she is saying, a blog or some other way. Product-wise, yes. In terms of there being something like talking about a letter, for example, going to HDC. That is an example of a letter, but it is not going to the secretary. It is probably going to HDC.
I think part of what we have to figure out and this is where I think we will have to spend a little bit more time with Jim is what are the number of ways that we can do this and still be fine in terms of the territory that we sit on. I do not think that at every moment we go through a full process the way you do. But at the same time, I have to understand a little bit better the process that we do go through. That actually hasn’t been worked out. That is one of the things I think we are still evolving is as a workgroup, how does it work? We know our relationship to the full, but we do not have to stand on the exact same rules that the Full Committee does. But I also do not know how far off on our own we can go.
DR. QUEEN: I like the idea of having practical projects, something that is concrete. For me, it would be useful to have something that is very targeted so that we know that the discussion when we are talking about publicly available data versus all of our data, administrative versus survey, it would be nice to have even some small concrete things that are very clear and targeted that we can demonstrate something.
MS. KLOSS: I am not a member of the working group, but I guess I would probably guard against too much process. I think I would be wanting to push the agility button as far as you can take it because I think that is the spirit in which the working group was formed. I think bite off small chunks and move it and minimize the process. You can always apologize later, but just move.
MS. JACKSON: Along that line, I was just writing in my notes. This is like an incubator think tank, the closest that the national committee has that has the requirements of push outs. This is where things kind of ferment to see what possibilities are out there. This group has a handle and fingers in all kinds of pots of events going on. We are looking forward to continuing and maintaining that and keep it percolating up.
MR. DAVENHALL: I think the format worked very well in my own personal opinion. It kept us a little bit more organized and focused. I would have to say I think the working group was always expected by virtue of how we were appointed were asked to serve on this to be bold, innovative and fearless. If we move away from that, you are not getting the value out of some of us around the table. I would say I agree with Linda. I am not scared of this process and this is why I go at liberty of suggesting what high-value data sets we could work on. I would feel like we would make a serious contribution to this country if we were to pick a handful of things, see it through from womb to tomb, think about all the things we have talked about from stewardship to practicality and usefulness and make these examples, not say we are going to do everything, but say this is what we envision as a way forward and throw it up against the wall and see what happens.
We have to listen to our agencies. I would say — Chris, who was here — he needs to come back to us and get value out of his employees who say this is how we could push the envelope and feed that information to us so that we can react to that. It is like a two-way conversation we have to have with the producers.
DR. MAYS: Okay. Bold, innovative, and fierce. Those are the new words for this group.
DR. VAUGHAN: I would make a conclusion based on an N of 1. I am interested to see what happens.
DR. NILSEN: Can I just say ditto? Everybody said it, but ditto.
DR. MAYS: Okay. I think that this group has great marching orders. We need to some work in between. I need to think about how we can accomplish this, be reasonable, and have very specific products. I am going to be coming back to you.
MR. CROWLEY: I found something online today. HHS is having their next class of this HHS IDEA Lab innovators group. I think October 1 through 31, they will be doing — it is their internal incubator academy group.
DR. NILSEN: It is the HHSignite.
DR. CROWLEY: As this process is beginning, it is the October 1 through 31 is their sign up. People are applying to this program. Maybe there are ways that we can see how HHS open data innovation projects is part of the targeted portfolio of activities that we want these bright minds to attack. Then the workgroup could potentially work as mentors with some of the teams.
DR. MAYS: That is a great idea. I saw that, but didn’t quite know what it was about. I think that is a good idea.
It is that magic hour where planes have your names on them. They have seats held for you and I don’t want you to miss them. I want to thank all of you for trying this out, seeing where we can go. What is it? Bold, innovative, and fearless. The meeting is adjourned. Thank you very much for all of your help.
(Whereupon, at 5:02 p.m., the meeting adjourned.)