[This Transcript is Unedited]
Department of Health and Human Services
National Committee on Vital and Health Statistics
Work Group on Data Access and Use
June 15, 2016
Capital Hilton Hotel
1001 16th Street, NW
Federal A Room
Washington, DC 20036
- Call to Order: Work Group on Data Access and Use – Dr. Vickie Mays
P R O C E E D I N G S
Agenda Item: Call to Order: Work Group on Data Access and Use
DR. MAYS: Good afternoon everyone. This is the afternoon session for the National Committee on Vital and Health Statistics Work Group on Data Access and Use. I am Vickie Mays, University of California Los Angeles. I do not have to do conflicts. I am the chair. I will just start by introducing myself.
What we will do is go around and have everyone that is in the room introduce themselves starting with the members. And then we will check on who is online and have you introduce yourselves and then we will check in with anyone else that is here and then we will get started.
MR. DAVIS: Hi. This is Damon Davis. I am with the HHS IDEA Lab.
MR. LANDEN: I am Rich Landen. I am with QuadraMed and I am a member of the NCVHS.
MS. LOVE: I am Denise Love. I am with the National Association of Health Data Organizations and a member of the NCVHS.
DR. CORNELIUS: I am Lee Cornelius. I am a member of the NCVHS and I am at the University of Georgia Athens.
MS. KLOSS: Hi. Linda Kloss, member of the Full Committee and co-chair of the Privacy, Security, Confidentiality Subcommittee. No conflicts.
DR. COHEN: Bruce Cohen, member of the National Committee, co-chair of the Population Health Subcommittee.
DR. CROWLEY: Good afternoon. Kenyon Crowley, University of Maryland, Center for Health Information Decision Systems, member of the working group.
DR. DORSEY: Good afternoon. Rashida Dorsey, ASPE.
DR. SUAREZ: Walter Suarez. I am with Kaiser Permanente and chair of the National Committee on Vital and Health Statistics.
MS. HINES: Good afternoon. Rebecca Hines. I am the executive secretary of the Full Committee and I am with CDC and CHS.
DR. RIPPEN: Good afternoon. No conflicts. Helga Rippen. I am on the Full Committee and the Subcommittees for Population and Privacy and on this working group.
DR. MAYS: Can I get the members that are on the Work Group on either video or online to also introduce themselves.
DR. NILSEN: This is Wendy Nilsen from the National Science Foundation.
DR. MAYS: Thanks Wendy.
DR. ROSENTHAL: Josh Rosenthal, Roadmap Incorporated. How is everyone doing this afternoon?
DR. MAYS: We are weathering the day. Do we need to do the staff?
MS. JACKSON: Debbie Jackson, National Center for Health Statistics, CDC, committee staff.
MS. JONES: Katherine Jones, CDC, NCHS, and staff to the committee.
MS. MTUI: Jeannine Mtui, contractor.
MS. SQUIRES: Marietta Squires, staff to the committee.
PARTICIPANT: Susan Karins(?), contractor.
DR. SORACE: Jim Sorace, ASPE, staff.
MS. GOSS: Alix Goss, member of the Full Committee.
DR. MAYS: Let’s get started. Welcome and thank you to everyone for participating in the work group. One of the things that is a little different is we typically start earlier and I know some of you were looking at 1 o’clock, but our start today is at 2:30. Given that that is our start, what I want to try and do is to be a little focused. Some of you listened in yesterday I think in terms of when Josh and Helga were presenting an overview of what the data matrix is designed to do.
What we want to do today is to really try and stay focused and work on the data matrix. But before doing that, I want to give us some context. First, part of what this group is now tasked with doing is to come up with – and if Maya were here, she would tell me to not use this word guidance. I am trying to find another word. Apparently, guidance means something in federal parlance. We are trying to come up with something that will help HHS in terms of advice. We are trying to come up with advice to HHS on ways in which they could increase access and use of data. We want to come up with some best practices, some principles. We are the group that has worked on data stewardship. We want to bring these things to bear to think about producing a product that would offer advice to those who work in the space of using HHS data.
We are also thinking that we will want to give that advice as well to the secretary to see if in terms of the many aspects of HHS’ data that we might be able to come up with some commonalities of things that would benefit regardless of whether you are talking a survey or if you are talking studies that have been produced by HHS. We are going to think about that as the direction that we want to take and try and come up with what we think will be some best practice approaches to do that.
In doing that, part of what we want to do is to figure out if, for example, you had a data set, what information is it that you really want? What is it that you need? What kinds of things have not been there? In designing a data matrix, we want to start with – and I want to take advantage of the fact that we still have some of the committee members here to ask you the broad questions, which first led to our design of the data matrix, which we will show a little bit later.
I would like to be able to talk about the use case that we have been thinking about, the use cases we have been thinking about and then move us to this set of questions. And then I will check in and see if there are any other things that people want to ask.
In thinking about making data more usable and accessible, we have been thinking about four kinds of use cases. One is those individuals who tend to use HHS data as entrepreneurs or data warehouses so that what they do is actually take that data, repackage it and get it out in ways in which they can tell that there is a demand for. The researchers. The kind of typical data users who plug in, ask for money to produce data or who are using data produced by the federal government.
The third group is the community groups. We are now talking about coming down to a more local level. We are talking about community organizations. This can be state. It can be all the way down to community organizations that serve different, smaller geographic regions. Finally, as the consumer.
We are well aware that we are not going to come up with something in which every data set is going to meet every group’s needs. But it really is with keeping those different use cases in mind that we should be trying to expand the use of the HHS data to beyond the group that it probably currently focuses on.
Part of what we did in terms of the design of a data matrix is to start with the notion of what is it that you look for. There are two questions. Helga, do you want me to say the questions and then you chime in if there is anything else at this point that we want to ask? Looking at a data set, what information do you really need to decide if it is useful to you? Second question. What information about a data set have you looked for, but found hard to find if at all?
Part of what I would like to do particularly while we have people from other subcommittees here is to run the table and ask you those questions because it will help us in trying to put together pieces for our data matrix. I think what I will do is actually bring up the data matrix so that those online can see what it is that we have done so far.
Lee, can we start with you in terms of those two questions. What information do you really need to know to decide if a data set is useful to you? Second question is what information about a data set have you looked for but found hard to find if at all.
DR. CORNELIUS: Usually for me, I am always thinking about the indicators themselves and how they are presented, measurable, how clear they are – local level, whether they are survey data, record data or what have you.
In terms of looking for data, that is a bigger question. I was actually with David Ross when he was on the Pop Health call for the letter in terms of our conversations of DeKalb County. It is more the practicality. All data is good until you need it. In my role, I am used to dealing with surveys. But the problem is on the local level. Persons are having a hard time trying to get that either whether it is a questionnaire or something that the department is collecting or they are working across data entities. There is a disconnect.
MS. KLOSS: I am not a researcher. I am not often looking for data sets. I am concerned about two things as a data policy. Where does it come from? Ideally, I would really like to know whose hands it has been through, not just who had it most recently, but its providence or something. I tend to judge something by the credibility of where it has come from.
And then secondly, depending on the application, I would really want to know what is the judged quality of it. How much can I rely on it? That may help me align with whether I can use it for this purpose or that purpose.
DR. MAYS: Can I just ask? When you say how much can I rely on it, can you give me a sense of what?
MS. KLOSS: When we talk about data quality, we often realize that certain amount of error is tolerable for aggregated data use to look at broad trends. It is less tolerable if we are looking to judge a trend in a smaller community. Some level of error could be really distorting.
DR. MAYS: Bruce.
DR. COHEN: I have worn a variety of data hats as a data generator in my work in the state health department and as a data user in community coalitions. There are lots of different perspectives. When I generate data, I want to have it be easily understandable and meaningful to real world issues and serve not as an answer, but an input into policy or processes at the community level.
We have several examples. There is a very wealthy town in Massachusetts. They had an enormous infant mortality rate one year. It was because there were three sets of triplets that did not survive. You look at this infant mortality rate. The local folks – their minds went through the roof. What is going on here?
Packaging and presenting data from the data collector, the data generator. You need to be really careful to understand what it means because there are going to be folks you are going to be providing the data to that do not have the experience and understanding that you do. That is important.
Right now, I serve on a community coalition in my local community that is looking at substance abuse issues in young teenagers 11 to 14 years of age. There are no good data on young teenagers drinking and abusing substances. The most they have is statewide data and those numbers are very variable to say the least. Where can I help folks find information at the community level that will fit into whatever problems and priorities they have? It is more identifying data gaps and trying to figure out whether there are ways that we can provide quantitative information that help quality decisions be made.
MR. DAVIS: Question. Bruce, in your data producer role, you mentioned something along the lines of providing context to the data. I assume that is well before you put it out. In your prior role of data generator, did you find that you could actually be slowing the data production process by doing that pre-analysis? How do you balance the let me make sure that I put some context to it with the expediency of getting it out so it can be usable at a broader scale?
DR. COHEN: Great question and there is always a balance. One thing that data providers – a responsibility that they have is to look for outliers. I think that is really important. That is not necessarily a routine process for data generators. You can find anomalies really quickly that you need to investigate before you release the data. There was a geocode for one community that was wrong and last year they had 28 births. This year they had 1012. If you do not do that basic build into your data release process, some kind of outlier assessment then I think you do a disservice to the users of your data. That is one level.
But on the other level when I started generating data 30 years ago, I felt my obligation was to clear up every mistake in the data. Over time it went to 99.9 percent, 98 percent. In order to get the data out more quickly, I need to be satisfied with the level of confidence that the data that I am generating is representative and generalizable. At that point, we can create the final data file for the researchers six months from now, but letting communities know that there is a decline in STDs right now is more important than waiting for the last clinic to send you the information. It is an art.
As you pursue your art, if you can come up with criteria a priori so it is not always a judgment call. I am comfortable with data if I have 95 percent or 98 percent of all reporting units. I have looked at outliers. Whatever the criteria you come up with impose and that helps.
MS. LOVE: I do not want you to take the whole burden for the data generators. As a former data generator, we always have the conversation and, Bruce, you articulated beautifully. But one of the misnomers for hospital data was we want it in real time. It is like you want garbage because you cannot have timeliness and good data.
Our argument was you want clean, validated, documented data to make these decisions. Real time data can be gotten somewhere else. The timeliness I think sometimes gets in our way. But I have researchers who would say I would rather have three years of stable, valid, retrospective data that I can do predictive analytics on than dirty data pushed out the door with poor documentation. Those are education points. I just wanted to follow up because we have had a lot of those painful conversations around the table. Each data set will have its own time where it needs to be baked and some more than others.
MS. KLOSS: And again it depends on your use because your reliability has different tolerance levels.
MS. LOVE: Communicable disease may not matter. You want to start hot spotting. But some of these larger data sets you want a clean, valid, and compliance. It was a big issue in my world. Resubmissions until it is good or fine. You have to play that card sometimes.
DR. MAYS: It is interesting because it really is probably an education point because communities often will say and consumers that they want real time. They want to know what happened very recently.
PARTICIPANT: They do not know what they are asking for.
DR. MAYS: That is it. I think that is something for us to also think about. Kenyon.
DR. CROWLEY: From my point of view, using the data from a couple of different vantage points. One is a research organization. Some of the same things we have heard. What is in the data? Is it clear what is in the data, how complete it is, how clean it is? I want to know if it is exportable and machine readable. One of the challenges that we face even though when you find things might be exported to Excel or CSV. They are not actually exportable in a way that is immediately readable and to some of your standard statistical programs like R or SAS or that sort of thing. From a research point of view, that is really what you want.
MR. DAVIS: Can you say more about that particular piece there? I think federal agencies when we move away from just PDFs, you feel really good about putting out an Excel. I think there may be a little bit of a misunderstanding as to how machine readable it is once it is machine readable. There is a gray area it sounds like you are talking about there that I think would be valuable.
DR. CROWLEY: For example, I was looking at some Census data that had come out and it was in some monthly reporting formats. The field structures that you find in the PDF were how it looked in the Excel file. But if you are actually going to load a CSV file, which you can save in Excel into R. Who is familiar with R here? But R is an open source mathematical program language, which has become the standard at most universities and for most research organizations and most people worldwide how are doing research. There is also scripting languages. Many people in the public health space know SAS, but in any case.
What you are looking for and able to use that type of data, you want clear header rows and then clear columns of data. It is about that simple. When you are exporting data and even if you put fields here and different header rows and then you split rows to put a description of a particular table before it, what that does is it prevents you from being able to directly import that data into a statistical software package that researchers may use. Some of the fancier human readable elements of these Excel files like when you start merge blocks so that you can describe two columns at once.
MR. CROWLEY: That actually impacts usability of the data for end users and researchers. I will just add appoint there. One thing that we are starting to see in some areas along those lines, as that code is built to do the analysis of those data sets by different research groups, one thing that could be usable contribution sort of the broader market is at that analytical code for different data sets is produced if it is made available to the community as well. If you start with R and SAS and some of the most common packages, that is very helpful.
Also, you want to know how representative the data is of the different – within the data sets. The degree of granularity of the attributes that are being described – there is a lot of really good federal data sets, but a lot of times the granularity is so masked by either privacy concerns or other types of cost concerns and there are other concerns. Providing data with that lack of granularity impacts the actual strength of the analysis if we are trying to get signals from changes in market dynamics or changes in program design. What is happening at a specific geographic location, which may have other affiliated things going on either with the environment or with social policies or other things? That degree or granularity is important.
The recency of the data is also very important and how often that is being refreshed.
From a development point of view because we also participate, we teach an engineering mobile health app class, for example. In that class, when we look at data sets, we want to know is the scheme of a high net data set well defined. Is the API that is provided when we actually build it into an app, does it work as expected? Sometimes it does. Sometimes it does not. Part of that too is having a well-defined data dictionary that clearly describes what the individual attributes in that data set are, but it is also in plain language. If you are looking at the entrepreneur community, they are not well versed, PhD, public health practitioners. Sometimes some of the language is in that type of format. When we are thinking about these other audiences, does the language match the way these audiences think about data and it is clear to them?
The last perspective I will mention here is on the consumer side. We do sometimes look at data from a consumer perspective as well because we are doing a lot of research on quality and transparency data and how that impacts cost and health outcomes and what consumers do. Is it clear that as the data is described what questions it will answer for a consumer? Many things have high-level descriptions, but is it clear what the types of questions can answer are? How findable is it? Again, I think language for consumers for those types of applications are important.
And then are their visualizations associated with the data that are clear and understandable for consumers? I think those are some of the ways that we are evaluating these things.
MS. MTUI: A question from Josh. He is asking can we ask Bruce to speak as a data consumer.
DR. COHEN: No.
DR. ROSENTHAL: Bruce, you are – community side. I think that would be very helpful as well.
DR. COHEN: What would you like me to address, Josh?
DR. ROSENTHAL: — as a data consumer – working on behalf of the community and you are looking at a data set, what is important to you?
DR. COHEN: I think Kenyon mentioned some of the attributes. Can I understand what the information means and how does it fit into the other priorities that are not data driven? Is it organized in a way that makes sense for me and my community or whatever the priorities are? Visualization is a really good point. Most communities – they want simple metrics or simple graphs or charts or maps if at all possible, but maps are difficult to understand actually. They look good, but translating them into action is more difficult at some level.
I think it is really ease of use. Sometimes if the data do not confirm anecdotal impressions, it is actually more helpful to tease out the dissonance between the quantitative information impressions and other inputs from communities leaders and focus groups. It is a really difficulty question, Josh. I will try to think some more about it.
DR. MAYS: Let me ask Kenyon a question first. Would it be useful for you in our developer role to actually have some of the back end algorithms? For example, do you want to know something — if you were trying to develop something, do you want to know who visits the site of this data, what they use? Are those things that a developer would want?
MR. CROWLEY: In a sense, the developer wants to understand its end user. If the data set and the user are the same one they are developing for, then perhaps. It really depends on specific use cases.
PARTICIPANT: Or if they want to give competitive information.
MR. CROWLEY: When we talk about answering some of these questions, we can build in to the data presentation systems or data access systems, ways to collect that feedback from end users. Somebody goes and accesses the data site. Was this useful to you? Yes or no. Why or why not? Did you find what you were looking for? Yes or no. If you did not then what were you looking for. I think that type of feedback you might start to see patterns. Maybe our audience is actually looking for this and not finding it. But we actually have this. It is just because of the way they are finding it or getting to it. It is not in the way that they would expect.
DR. MAYS: I am really hoping we can be big on making that happen because then I think people can understand who is breaking away, who is not using the data because they are having difficulty and then try and do something to increase that.
Lee, you had a question?
DR. CORNELIUS: The issue about the consumer piece. Two quick comments. I think the essence is think of it as ready to use. One of the issues is do I have to understand code to actually touch this data. Then on the other side is think about the things I mostly likely want to use that data for or rather see out of the data. That is a better word. Frequencies, descriptors, cross tabs, graphs. How can I get that without doing all that programming work that we are all talking about? Just the challenge that happens on the ground. They want this without all that kind of stuff we are processing about and quickly.
DR. SORACE: I was just really curious. I would like to hear from the developers especially on this and the data consumers. I think everything you have said about what you know about a data set and once you found the data set is right on. Are we adequate in terms of supporting search so that people can find data sets to begin with? Are people more or less having to hire expertise and domain expertise who already know about it in their heads to join start-up companies or whatever? They have to be just ex-HHS employees or whatever. They know about these survey things out there – I want to find data on cigarette smoking by zip code and I want to know what the state law is on tobacco taxes.
MR. DAVIS: If I may, I would love to tie back to something that either Kenyon or Bruce said and what Jim just said. One of the things that we were working on healthdata.gov was the idea of demand driven-open data and trying to help people to understand that you could request data if you have not found what you were looking for. What I wanted to tie back was if you go to healthdata.gov, look for something and do not find it, one of the things that we would love to eventually implement is the opportunity for somebody to – demand-driven open data request to say I was looking for this. I did not find it. I would love for you guys to track down the office that has this information and give it to me in the following format. Obviously, they are not going to get every single thing that they want, but just creating that feedback loop that I think goes to something that you were saying, Vickie, of how are people using the data and if they are not finding something that they were looking for, do we lose them? Do they fall off the radar? Do we give them an opportunity to continue to engage and actually request that that be something that they are provided? Hopefully at some time in the future, we will be able to offer that as an additional service of locating the data that are present within the department.
MS. KLOSS: I just really had a quick question or maybe it is not quick, but a challenge to the group. I am not sure of what do we mean by consumer. I could be the data subject in some of the applications that you have described certainly at the community level. Then I want to know different things than if I am just a generic person who is getting a data set to do something with it. Is that what we mean by consumer or do we mean health consumer or do we mean data subject, all of the above? It seems to me to make a big difference.
DR. MAYS: I will take a stab at it, but it probably is something that we as a group can decide exactly what. If you look at the push of the secretary, it has been that say – I think I have read this stuff. A person suffers some kind of health problem and they want to know something about what their odds are at treatment. They want to know something about how many people have this disease or disorder. This is where Josh probably will come in. It is like getting the Googles and others sometimes will kind of tell you something about that.
MS. KLOSS: I want to know something about other people like me. Now we are using patient access where I want my specific information versus I want aggregate information – or I am a researcher and I want to be a data user.
MS. GOSS: This has been part of my confusion. I do not usually sit in with the group. I am happy to be here today. Who is the consumer? I, as an end user person – I have a diagnosis. I want to go find research. That is not who you are targeting. You are really looking at supporting the innovators, the entrepreneurs and business, researchers more so is what I have been hearing in the conversation. My big comment is who is your end audience for this. It is not Suzy Q Public.
DR. MAYS: But it is one of the audiences and many of the data sets –
MS. GOSS: Directly or downstream from the primary users. That is my confusion.
DR. MAYS: Many of the data sets are going to support the first two use cases, which would be the entrepreneurs and those individuals who are researchers. What we have been trying –
PARTICIPANT: And the communities.
DR. MAYS: And some of the communities. And I think what we have been trying to get down to, which is what Pop is doing is how can we get down a little further in terms of the community and can we be useful. That is the feedback loop I think more so. I do not think we are going to start with the consumer being able to say Suzy Q, you just have diabetes. If you just go to the HHS website, here is everything you wanted to know. We think the data entrepreneurs will do that better. We think the affinity groups will pull that data in and help people.
If the consumer is going to the National Health Interview Survey and breaking off, they should know that. What happens is that if they can begin to put out infographics or other things, they can.
Josh and then Wendy.
DR. ROSENTHAL: As we are defining it, there are different data consumers: entrepreneurs, research, communities, and then consumers. Those are different – work with other folks on the research side. My main points of interest are meta data including ERD, entity relationship diagram, which is free and easy and has been developed somewhere, but never shared. That is usually the first step of what somebody wants to use the data because entrepreneurial or research I would have to recreate. Then geography. Literally if it is national or state, it is useless. I need to look at geography. And then finally plan continuation. That might be represented by funding or it might be represented by priorities, meaning if I am going to tell kids to develop something or if I am going to think about developing something, I want to know that the source is going to be continuing on.
Finally, I guess I should say just one other little point – cost as well. What is the cost of this and if there is a cost driver – cost associated with something meaningful – cost associated with just a technical decision. Someone shows you – in an enclave so they are going to pass on 40 or 100 – to a bunch of kids rather than using R, which is free and easy.
I guess finally I would say it is probably worth looking and having in our heads the idea of being stacks and layers. When we talk about search, there are different layers. There is the data itself. There is the information architecture. There is the UI and UX. And then finally there is the distributor – answer the question – do people know where these sets are? The answer is usually no. Most people starting up and even experts are either hiring people out of HHS or they are hiring consultants to do it. Part of that is because it is in a historic pull model, meaning I need to require people to go to my site, my enclave, my destination and figure it out rather than a push model where I have basic elements and I push it out and that allows browsers and other people to pick it up. As long as you are in a pull model, you are going to have a barrier of access.
The other thing I think about usually when looking at the data is someone else had mentioned earlier is how do I review the technical need to get it inside. We talk about tech and code and building that sort of stuff. But once you get it in a browser – distribution mechanisms, you do not have to have those technical skills. That opens it up to a much wider audience.
DR. MAYS: Let me just let Wendy answer the question because she is going to have to jump off soon. Wendy, can you give us your input in terms of the two questions?
MS. NILSEN: I was having trouble with Josh’s echo, but I think it was a really good point. I really like these questions about who is the consumer. And what I ended up thinking about with this was thinking about NOAA’s framework, National Oceanic and Atmospheric Administration. The idea of pushing your data is important. The consumers get to use NOAA’s data every day. Everybody uses it all the time. It is in an accessible framework with right meta data set up in the right way that people can use it and really use it effectively. I think it is one of the prime examples of a database that has gotten huge – because it is set up correctly. Obviously starting that is not easy.
There are a lot of efforts in other areas. I work in NSF. A lot of our efforts in big data is how do we bring these data together. How do we think about this? I know that the EU is doing a lot of work in this area too right now. How do we keep it private? Somebody was talking before. When you have three sets of triplets die, not only does that mess up your numbers, but it messes up your privacy because it is pretty obvious what happened in a small community. We saw that happen with ALL and their part with the preemie babies. Does that answer the question?
DR. MAYS: That is helpful. Let me just take a couple of other questions and then continue to run the table. Bruce, did you have a comment?
DR. COHEN: I really appreciated Linda’s question. Who is the user? I guess for one data set, you can have multiple levels of users. Look at the BRFSS data. You can look at for policy planners. It can be used for policy. It can be used for community development. It can be used to answer individual questions too. It is all how you package the data. I guess there is an interaction between the use and how you package.
My favorite way to package data in general is clearly not one size fits all. I think web-based data query systems eliminate some of the problems that Lee was talking about around needing to be a technical statistical expert, but you can generate information depending upon how detailed the system goes down to very low levels, not personal levels to answer the question. But you can say I am a 65-year-old white male living in urban Massachusetts. What is the prevalence of high PSAs for somebody like me? You can answer that question. I can look at the change trends in prevalence in smoking over time for black females in rural areas. I can drive policy. I can answer individual questions. I can maybe decide on priorities in my community whether we should be focused on teen violence or chronic heart disease, depending upon how I package the data.
DR. MAYS: In California for the California Health Interview Survey, as a consumer or I could have been a participant in the study, I can actually go and use the data query system and try and get a sense of things that would be about people like me type thing, but really data query systems. The cost is often the issue.
Kenyon, you had a comment.
MR. CROWLEY: Since Bruce mentioned packaging, I will mention one of the other four P’s in marketing is placement. When we are talking about reaching these audiences, there are also some places that are becoming very popular among entrepreneurs and data scientists. Places that have the built in capabilities to do many of the other things have been discussed here as well. Kaggle is one of the biggest data repositories. They are hosting a lot of public data sets now and then they provide the ability to share scripts. They provide the ability for forums to interact on those data sets. They provide for user profiles – what is using data. If you want to be able to find out what people like you are also looking at and finding value from, it provides that. GitHub is another example. I know HHS has used GitHub in a number of ways, which is great. But just to make that point.
In terms of the push model, we have it in HHS, but it is also pushing it into those places where our audience lives and goes online.
The last point. I think it is great in terms of the healthdata.gov requesting data. We might also consider making those requests transparent.
MR. DAVIS: One of the things we try to do is make the requests transparent and allow people to actually vote them up and down so you can get a little bit more of a signal as to who was feeling that was something important to them. That functionality unfortunately is not working that well right now, but that is the idea. How do we get signal from a broader piece of community to help prioritize some of the data requests that are coming in?
MR. CROWLEY: I think HHS is experimenting with Reddit somewhat to do some of that voting up and down. I thought I saw that somewhere. I do not know if it was an actual agency letter or just an individual effort. These different platforms, which allow for the functionalities, which we seek to empower the data and to make it more accessible and useful and where audience lives – on the marketing plan, distribution plan for many of these data sets.
MS. HINES: I am pretty simplistic. I am always looking at the geographic level. Because of my population health orientation, I want to know how far down does it go. That is really the main thing and of course looking for indicator data, outcomes or intermediary outcomes.
DR. RIPPEN: My bias is already embedded. I do have a question. I know we had some conversations about the target. Again, for us, the question of how do you at least for one component of the overall work, which is a very small part is what are the important things to let people know about so they can make decisions about data. But as I was hearing people talk about it and again thinking about if it is possible and that we like data that maybe the full spectrum could be a graphical summary of the data that actually could be something that people could just look at and then get the other thing so that way at least the barriers of understanding what it might be showing may be reduced without the fancy application. Just something that again based on the wisdom of the group, it is maybe a compromise.
MR. LANDEN: I am really a pretty rudimentary data user. I am not a researcher. I am certainly not an entrepreneur. I think that puts me somewhere between entry level sophisticate and complete moron. Searches are important to me. I seldom know where I need to go to find the data. When I need to look for Medicare or Medicaid data, I usually start with HHS.gov or CMS.HHS.gov and then plunk some term into the search engine. I know from experience that a .9 probability exists that I will have totally useless returns. I do need help finding data.
PARTICIPANT: Go to healthdata.gov.
MR. LANDEN: Why doesn’t my search take me there? I have been there by the way. The point is why do I need to go where – if I knew where to go, I wouldn’t have to ask the question.
Again, as a non-professional with data, I assume data is complete until it proves to me that it is not. I assume that it is clean. I know better. But I have no way to judge. That kind of begs the question. Is there something that can be done to tag to data for people like me?
Pragmatically, the last time I really needed to do a data search for business purposes was when meaningful use was first passed and CMS published the formula by which the incentives for hospitals were calculated. For several hospitals across the country, I needed to go in and find basic utilization data including the hospital obviously, discharges by payer and things like that. Finding that was not always easy because payers were not always broken out in the same way. I could not tell what was traditional Medicare versus Medicare Advantages. Sometimes it was Medicare and Medicaid and others. Sometimes self-pay was in there. Sometimes the names of the third-party payers were broken out. It is not always comparable. It was not always clear what the time range of the data was. Was it an annual set of data? Was it something different?
Sometimes I needed facility-specific data like St. Mary’s Hospital might have three campuses. We could not always tell if I was getting one individual campus or I was getting all three collectively. That was one of my pragmatic use cases.
The other is less specific and that is when I am either reading literature and there is a study in there and it has methodology that I want to validate the methodology or the conclusions and I will need to look for data to replicate that, come up with incidence or prevalence or the percentage of ED visits that are associated with a particular demographic group or percentage of ED visits that somehow involve gunshots or blunt trauma. Those kinds of things are the type of uses that I would make of data.
MS. LOVE: I do not think I have anything more intelligent to add. I do not know if it is because I am brain dead or I just cannot think. One thing that when we release large-scale state data sets and I guess I am curious on the federal side, I also want to be clearer on that data set I am using what the restrictions are because if I download a public use file and use that for the State of Utah or whatever state, the re-release restrictions are pretty hefty. You can package it and use it. It is not regulated, but I cannot resell it or repackage it off to someone else. That is good to know if there are those restrictions if an entrepreneur or if you are working for a community assessment if you have to a priori permission to repackage these hospital comparisons and push them out to third, fourth, fifth end users. It is probably okay, but there are some data sets where it is —
DR. SORACE: I actually think one of the consumers for HHS data should be HHS. We have a lot of people who develop stuff, but we do not necessarily – we should be users of these search tools. We have new personnel coming in constantly. We have reasons to want to search our own data. Not everybody is able to really – many people at this table know a lot of things about certain data sets, but not others. There is an expression among developers. I believe it is called eat your own dog food. Fundamentally, we should do that.
And the second thing – Josh – I would love to hear his opinions on this. One thing I think you mentioned was ER diagrams in your earlier list about 15 to 20 minutes ago. Am I correct?
DR. ROSENTHAL: Yes sir.
DR. SORACE: One thing we really want to do is re-share ER diagrams from our own data bases so that we can start to drive down development costs and also really start to facilitate linking. Those are my two thoughts.
DR. ROSENTHAL: We talked about meta data and ER diagram. A lot of the stuff is done in verticals. It is sharing that, meeting people where they are through various sources. You do not have to build the stuff all internally, but you do need little tiny pieces that make it meaningful. I cannot agree more. That is fantastic.
DR. DORSEY: There are just two points I want to make that will hopefully fit into this discussion. The first is when you are thinking about data sets to also include NIH-funded studies and not just clinical trials, but surveys and data that are PI investigated. If there is some way that this group could think about in your guidance how to talk to researchers about innovative ways to use some of these data sets. I often go to conferences where I will hear presenters who use a data set that is completely different from what the intended purpose or why this data was collected. They were able to do an investigation on discrimination because these variables happened to be included. We might not get that in some of other – the main HHS-sponsored data sets – when I say sponsored, the ones that are required to go through OMB. But I think that is still very important data that could be useful. In some ways, at HHS, we pay for it. It might be through grants. We fund it.
And then when you are thinking about users, that researcher tier is certainly important. But when you start to go down to the community level, just thinking about – even communities who actually want to use data, what resources they have and what do they think they want when they say they want data. Data can mean different things. Are they looking for tables? Are they looking for actual raw data that is at that level that they can actually analyze to answer certain specific questions? That might be the next step.
Several years ago, the Office of Minority Health hosted a workshop called Data Makes a Difference. They brought together many different community-based organizations who were interested in data. I am sure I have some of the information from that and some of the findings from that, some of the questions. One of the things that came out of that meeting was that at that community level, there is just a diversity of skills in terms of what people are able to do with data, what they think about when you say data. When you are talking about access and use and helping HHS to do that, once you get beyond the researcher level, there is a great diversity in who your audience is and just to keep that in mind as you keep moving forward.
DR. MAYS: Paul, are you online?
DR. TANG: Yes, I am.
DR. MAYS: In terms of responses to the two questions, your thoughts.
DR. TANG: What do you need and what information you look for and find it hard to find? The first one is I am looking for social determinants of health for people like you. I would like to know who are like you and what works for people like you. What do I have to offer the right resources in the right way? For example, there may be a special counselor at an elementary school or a high school that can address some of the social needs that you have. It would really be nice to know about that because I think – available is just unevenly distributed. For the corollary is that help is available if you just know where the needs were.
I guess I am looking at the nontraditional health information because I think that may be one of the right limiting steps in taking our country a big step up.
DR. MAYS: — in terms of nontraditional –
DR. TANG: It is community data. It is community needs and community service. The issue is that the services are there. They are even funded in many cases. It is just unevenly distributed, but most importantly, it is not connected. The data that we need are how to match the resources that in many cases are already there with the needs that are not properly articulated.
DR. RIPPEN: But let’s say that you are looking for data and there are sources of data within the local community because the public health department got a little bit of money or the local school system has money or at least data and the criminal justice system has data. Let’s say that you are interested in actually trying to combine them or use them. What would be important for you to know about each of those sets of data so that you can make a determination of whether they could meet your needs?
DR. TANG: I actually think most of the needs are in the social state, the “social determinants state”. We are capable and probably over capable of dealing with the medical side and we probably over medicalize health issues. I am looking for essentially to complement. It is not just for people with low economic means. I think of kids and suicides or vets and suicides. Huge unmet need. I think basically what we need to do are find how to reach these folks. Through our analytics, I think we can do a better job than we do now, which is we train human to apply the same thing to everybody. That probably works for 15 percent.
DR. MAYS: Josh, any questions? Before I turn to the next thing, which is going to be about use case, I was asking if you had any other questions or comments.
DR. ROSENTHAL: — it may be worth just looking at those layers – that keeps us from bleeding these conversations together. What is search? What does it to communities? The point is if you have a pull model, you have to solve all this stuff. It is really difficult and it tends not to work since the demise of Health Indicators Warehouse.
If you have a push model then you do not have to solve for different communities and different definitions and different users. If you have a push model with the basics of the data around there, it goes out through the browsers all in a web page query system and those are free now or it goes out to the distributors who take it to the consumers downstream. If you look at these sets, a couple of them are now reaching hundreds of millions of consumers. But you are never going to see that from the analytics when you looked at that internally. In my mind, the idea is really to keep these levels in mind of what we are talking about and make a decision around push versus pull. Even if you do not want to make a decision around it, what succeeds in push will also help in pull, meaning putting these attributes in there.
DR. RIPPEN: — excellent point with regards to how can we ensure a broader use and leverage of the information. I think the nuance that we also have to remember is that people will still need to be informed about the nuances of the information itself. As people make policy decisions, resource decisions or even community activities, understanding how the information can be used is just as important as giving it to them. How would you then differentiate between the information that you would need to share with users if it is a push versus a pull or are they the same?
DR. ROSENTHAL: I think they are pretty close to the same. That is why I am actually really excited about this data matrix. The data matrix is the definition of what needs to be there to allow it to work in a push model, but also to Jim’s point earlier, it allows it to work in a pull model. Why are certain data sets picked up outside the enclave, outside data.gov and outside things we know about and why do they meet people where they are at, where they are searching? They have these basic elements around it. I think it overlaps.
And the – we did yesterday, we walked through and said if you get these basic pieces around it, it will not only help if you want to continue on in a pull model, but it will definitely help in a push model. The beauty of a push model is then you do not have the resource drain. You let different people come up with different solutions.
It is very much in line with open data model on the business side. Put the data out there and send it to the wind and allow people to come with uses that we can envision. It is the same thing on the access and use model. Rather than trying to search it and tag it and do this and that, you have to do some of that, which are defined in the matrix, but put that out through the different distribution systems and it will reach different audiences in ways that are interpreted in that and are meaningful for them. I think the data access and matrix is literally the basic definitions that would super turbo charge it. You can do it with less than that. Healthdata.gov has a couple of pieces up there and that is fantastic – have pieces up front and that is why they get picked up.
To your point on the unintended use whoever said that, that is absolutely. BRFSS is a great example of that. BRFSS is now being used in half the states by half the payers to project cost for building health plans on exchange and market – it was never intended to do that. That is massively valuable. While payers are going to CMS and asking for hundred million dollars – all of a sudden they are actually using HHS data to solve their own problems. And HHS is not aware of that – use in those data elements would be hugely helpful I think.
DR. RIPPEN: — might also be an opportunity then if we believe that some of the characteristics that we need to include to describe the nature of the data would also be portable and presentable in the push model too, for example, some of the earlier points about data quality. There is either a warning or at least an icon that again goes back to some of the very wise things that people have talked about throughout a few meetings that we have been on.
The other thing is even the question, the challenges of real-time “not clean” versus this is suitable for a true analytic evaluation policy. Again, I think those are things that we can think about as we start capturing these characteristics. It becomes really important then to understand what it is that we have to actually capture and define that will reduce unintended consequences, but also facilitate the decision-making process of people who want to consume the data.
DR. ROSENTHAL: The way I think about it is the data matrix when we are done with it is everything you need to take all the fantastic work and you do not want to make it onerous for the producers, but to get hundreds of millions of Americans using it like there are certain sets rather than just a couple thousand – 80 percent of the work. You needed a pull model if you want to say that. I do not mean to complicate it by pull versus push. But it is interesting. The push model takes – viral. It all of a sudden takes off a life of its own and massively minimizes the resources. Then you can say what do we need. You get out of all of these debates in the 80/20 – pretty quickly.
PARTICIPANT: Bruce and then I am going to move us to talk about use case.
DR. COHEN: This is sort of a combination comment about the matrix and the issue or the observation that Rashida made, which I think is right on. There is enormous variability in communities’ ability to use data. It will be wonderful if as part of the matrix or a tool that we can develop a self-assessment for communities that gauges their level of sophistication so that we can link them with the right level of data that maximizes their efficient use. For some, it will be simple bar charts. For others, it will be individual-level data so they can do their modeling and everywhere in between.
There are many real issues. One important issue around providing community’s data is linking their level of ability to use the data with the right data set or right data source. If the matrix or a tool could be able to do that, it will really increase the ability to use HHS data and health data in general.
MS. KLOSS: Could I put in a plug? We have the toolkit. There are a lot of data characteristics in that because it is based on the fair information principles. It deals with data quality and de-identification and security and a lot of characteristics of data set, intended use, et cetera.
DR. MAYS: The committee did it. We have it. Let me just ask before we leave this because I want to understand the vision of what you see as to how we would do it. Let me just do a description. Let’s just say that what we have is your data framework that came from Pop. What that is trying to do is give people a sense of domains and different indicators and things they can do. And then we have something that says to people. Here is how to get to this because we now have asked all the data sources to answer a lot of the things that you all have talked about. There is a relationship between those things. And then tell me how your toolkit fits in.
DR. KLOSS: I am not saying in its current format it fits in, but it includes checklists for many of the characteristics of the principles of fair information practices that include quality – the characteristics of the data that need to be asked and answered by communities using data.
I think there are a couple of areas that it could be embellished in. In one of the earlier versions of the framework, we worked on in embedding the principles from the toolkit into that so that there is a whole set of data set characteristics that relate to the fair information principles that framework that can be overlaid and at least be some kind of checklist or set of questions and queries that communities can ask. I think that work could be easily adapted.
DR. RIPPEN: I guess I have question maybe more on the Pop health side perhaps or even trying to think about unintended consequences because they are always interesting to think about especially when we start comparing a lot of very sensitive information about geographical areas where especially you cannot intervene. I think that actually adds another level, which I think is really important. Again, it goes back to some of the principles in preventive medicine that just because you can test and find does not mean you should if you cannot do anything going back to some ethical principles. That component of it becomes real important so that is great.
PARTICIPANT: We have Leslie so that is going to help us. She could not be on today because she had to end up going to – she had to go a funeral. I was thinking right away that she is going to be so happy when she hears this because part of what we thought about was that when we talked about this guidance notion is that we should start off with the notion of data stewardship and telling data owners why they should be doing this and what good principles are and that we have already sent to the secretary as well in terms of thoughts.
I can get a better sense of the wrapping of the tool kit for – it is not necessarily that it is the data owner. It is building that trust that both Robert Wood Johnson Foundation is talking about as well as what you talk about so that is there in terms of people also being able to look at that checklist and understand dimensions that would make them say this is okay. I can do this.
DR. COHEN: You asked a question about vision of what it would look like. I guess I would like — if my community wanted smoking data and went to use the data framework that we are developing in Pop Health and looked at health data and were interested in smoking, on the screen of their computer, it would pop up a three-question quiz. Which data is best for you? There would be a bar chart that had my community, another community. There would be a point estimate with confidence intervals. There would be a URL that linked to the data set where they could generate community-specific prevalence estimates by age, race, and sex. There would be some way for users of the data to evaluate the level of data they want and then in the best of all possible worlds, the responses to their data request would fit the needs of their level of ability to use data.
DR. RIPPEN: It is just a visual representation that is something that usually is provided in text form, which is more powerful.
DR. COHEN: Some people have different capacities to use a variety of levels of sophistication of data presentation or the actual data themselves. Making that assessment will link the communities to the most efficient data source or representations that meet their needs.
DR. RIPPEN: I think that Josh would say that that would be the push approach where basically if we have a common approach of summarizing information that regardless if it is a public sector data set or private sector data set that you can do a query that would bring together and present that. Again, more on the push side. But that implies in a higher level of capability as we start thinking about this matrix in a way as far as how would one standardize it in a way that it could support that kind of model, which is not always as hard as it sounds, but sometimes it is.
DR. MAYS: One of the things though I want to make sure that we do is we also learn from the data warehouse in terms of the cost. Damon, we are going to also be putting that on you. As something designed, are there really the resources for people to keep doing it in order to make it useful? As we go through and develop the matrix, I think that is also going to be important because you have a sense of what the burden, the funding, et cetera. In some instances, it may be that the recommendation needs to be – resources need to be set aside to do this. Others maybe that there is some place that does it for the data sets or it may be that we have to stand down a little bit.
MR. DAVIS: I could not help but think as we have been talking here today that every one of these ideas could be awesome and collectively very challenging to deliver on. In some instances, the data can be valuable to a wide array of audiences, as I think Rashida was saying. The challenge then becomes how do you prioritize spending the resources to accumulated in specifically structured formats that are machine readable and documented extremely well such that the user whomever they are at whatever user level understands the data’s limitations, its capabilities, what could be found in it and what might not. And should we also be in a position of then creating the data visualization and all of the other things that perhaps other people are probably better at are already doing because they have some other activity that falls specifically in their bailiwick that maybe the federal government should actually be in the process of creating that specific database or what have you. I think there is a lot of really gray area there of how much do you want the government to actually being engaged with in terms of tool production.
I think, Vickie, one of the things that you said was that it can be expensive to create the query tool that allows the general public to delve into a data set. However, it may be very valuable, as Bruce said, to have the public policymakers be able to do a specific query right to their geographic area and the constituents that they serve in order to understand what is happening there in order to inform policy. There is a lot of fluctuations in here even in just these four groups of users that we have determined to figure out which data sets should have which level of additional development or documentation or tool development or whatever done on them. That is a really big universe of stuff.
DR. ROSENTHAL: I actually see this almost quite the opposite that it saves you time — if you look at that slide or if you all pull up that slide that basically looks at these four layers, data, then AI, then UI and UX and then distribution. You are already doing all this work. You are already sending all this money. Rather than doing all the work all over the place and sending all the money and not adding just a couple little tiny elements that would allow it to take off like a rocket, why don’t you trim your effort in what you are producing and add these little tiny elements and put it into a push model where you do not have to have staff running it. You do not have to build the tools. You do not have to do the things. I think it would massively reduce your effort, your staffing just by doing different little things to it and it would massively increase demand by being able to share a little bit of that. The BRFSS example is great.
Why do a couple of the sets get picked up and Google public data explorer? They are used by lots of people. Why do a couple of the sets get picked up by these consumer downstream pieces because they have little tiny things on them like geography? Is it adjacent? By spending a little bit of time and effort on a couple of those, I think you could massively get out of the creating the tool business if that makes sense, which I think I hear you saying. It should massively reduce prime effort and budget for you.
MR. DAVIS: I think we are saying the same thing from two different angles. Basically, how is it that the government can invest a little bit in order to enable the public a lot?
DR. RIPPEN: — as an example of best practices. If you have to do a thousand different interfaces and everyone is talking about summarizing data in 20 different ways. We always think of standards more as the data standard or transaction standard. I think if we can at least have a best practice and a consistent way of summarizing information in a way that is going to be important 80 percent as opposed to 100 percent and everybody does it. You address some of the points that were highlighted by Josh. Then what you are doing is you are enabling people to leverage in a lot of different ways. You are going to be everything for everybody.
MR. DAVIS: That makes sense. I know one of the things that Josh is constantly after us for is these entity relationship models that he has already brought up previously and that seems like the kind of thing that just exposes what is in a particular data set and how can you start to create matches and linkages across other data sets. That is the enablement that I think we are talking about here.
DR. RIPPEN: Even the summary of the nuance is this matrix because, again, it becomes really critical for people to make decisions about how they may or may not be able to use the data – how do you set it free, which is always nice, but then what the heck is it and the implications of use. You cannot just do one without the other.
DR. ROSENTHAL: Damon, we are trying to get you out of sending money and time and effort. We are trying to make it easier and cheaper for you.
DR. MAYS: Rebecca.
MS. HINES: From the perspective of indicators, what has ended up happening is there has actually been a tremendous duplication of effort. Healthy people and Health Indicators Warehouse and quick data online. There are numerous entities around HHS who have basically recreated the same tool. HIW ended up being a mixed model of staff time and contractor and on both counts the resources were too demanding. Healthy People are straining under the weight. They are going to put the gauntlet down and say for 2030 it has to be smaller because we cannot support this. In fact, most of the Health Indicators Warehouse was fed by the healthy people. It was exported into the Health Indicators Warehouse.
I would argue that – it would require incredible management skills and arrangement, but there should be one place where HHS should have all of this indicator data. And then what Josh was saying, if we just put the resources into that, have these different interfaces and again I do not whether the government should be in the business of that, but Healthy People is a tool of obviously HHS. You could argue it should I suppose.
But what I have seen over the decades is that there is need for this indicator data at whatever level it can made available and that the contracts are duplicative. The efforts are duplicative. I just do not know how you would get all that in one place, but it is very clear to me that there are a lot of folks duplicating efforts and data from different sources and not necessarily saying they are duplicating data, but it is duplicating the idea. Could there be one place where all that lives and then take something like Josh is talking about so that we do not spend so much government resource on the tools and then individualization in the interface with the public and somehow that will take care of itself? I do not understand how. The government I think does need to do the back end because we have it. For Healthy People, we have all that data. It is provided as part of that process.
DR. MAYS: It is interesting because when it comes to the indicator’s data, there has often been a request to IOM to help you sort things out. I was on one of the IOMs for Healthy People where what you needed was – because you had so many, the ability the prioritize both the – was it the objectives and the goals so that they could be put into buckets per month and all of that.
MS. HINES: That was the leading health indicators, which is basically a subset of Healthy People. There really is a demand. Bruce, you can attest to this for indicator data at whatever level it can be made available. And HHS is humongous, but you would think somehow we could crack that nut like the HHS HIW and all of the indicator data would live there and then the different programs could have their interface. And then you would pay one contractor rather than six or however many that are each getting half a million dollars a year or whatever it is. You could just have it all in one place. It would require incredible coordination across agencies in the department. I do not know whether it could be pulled off. It is incredibly for my perch, as Bill would say, to see all of the efforts that are basically the same thing.
DR. MAYS: I think it is something to think about. I think crossing the boundaries has to do with sometimes data is collected by different groups. It has different privacy and confidentiality rules.
MS. HINES: Not at the indicator level. Once it all comes in, it is good. It is all good. That is how it works. The – surveillance system sends their data to Healthy People. The whatever system sends their data to the public. The laboratory sends their data. They all send it in and it all gets put into what you are trying – one of the things you are looking at – how do we set this up so that anyone can look at it and do whatever they want with it.
DR. MAYS: We should think about that.
DR. CORNELIUS: As I listen to all this, I am trying to balance a couple of my multiple personalities. One side is like the stat jockey. I can think about all this data and want to massage it forever and decentralize and so on. Yet I really like the comments about NOAA on the other side. I hear what Josh is saying that either we do this or Google will do this when I speak about NOAA.
There is one piece that we probably want to grab right away. How do we think of a way to help people understand those data elements that are invisible like when we click and see what the weather is? We have it. That is something we need to grab that.
Honestly, the piece about data and indicators – I know a lot of us have been around a long time. That is a mushroom cloud. Is it a fog? Is there a bunch of people in the room that have baseball bats? We are really trying to work through those. But I liked it. I really love the NOAA example.
DR. DORSEY: For the NOAA example, when you use that, we hear that a lot, but NOAA has very different data than HHS data. I think when you say the weather channel is able to use NOAA data and they take it and they make it, but saying that and just keeping in mind the kinds of data that we have in HHS and are you referring to some of the aggregate data. In some ways, it simplifies some of the data that we have.
DR. CORNELIUS: Let me be concrete. Forget that I happen to know that NOAA is this agency. I am really thinking as a consumer. The majority of consumers do not know NOAA exists. What they do is they want to know what the weather is. The question is from that side. Are there these kinds of things that relate to our lives in health care? What are those things? I do not know if there are five, six, or ten things that we look around the room and say these are the things we need to know about public health quality of life or something. How do we present that? How do we make that come alive in a very easy way that can work across this planet that we are on? Either we are going to do it or the private sector is going to do it.
DR. MAYS: We are okay about letting the private sector do it as long as the data – that is actually part of our job is to get the data in a way in which we can really get the private sector to do more of this. When I was saying to Damon about worrying about cost, I realized as we discussed – no, let’s worry about putting the money in and getting it there and the entrepreneurs will be selling it back to us. You will have participated in something and then you are paying to some extent to get it, but you will get it in a format that will be a lot easier than it is setting up on a website. I think it is good if we can get it out. The private sector. That is the partnership. It was earlier where Rebecca was talking about a public/private partnership. That is what we want to go for.
DR. ROSENTHAL: In the slides, it is such a good example. People have not seen them in this group. Take a look at it. You literally see – rewrapping CMS data and make it usable and then using it with Yelp and even in our little day job. If you go on US News and World Report and look for a doctor, you are using HHS data.
MR. DAVIS: That is exactly right. But Rashida is right. When we talk about NOAA data, you do not have clouds that are pediatric, teenagers, 18 to 35 and survey differently based on what it is that you are trying to do. Do you know what I am saying? The HHS data while we would love for it to be probably as direct as NOAA data, it is so challenging that we take in surveys with different age ranges and completely different sets of stuff.
PARTICIPANT: I do not want to get caught – it is just an idea. You all got it and I am good.
DR. MAYS: Let me get these last comments because I have to deal with the use case and I do have a question specifically for standards.
MR. CROWLEY: The point I want to make that ties into this is as we are redesigning these systems and trying to manage resources, I think it is worth thinking about what is that human resources/human capital component within HHS to support that. The capabilities we want to use are within the private sector and the stock exchanges – that can be done, but there is a middle ground. Maybe somebody with HHS becomes that data set moderator, the one that is watching how that data set is trending on these different social medias who can provide the trusted voice around how it is used can interact with people when there are questions and do that. I just want to bring that point up that the human element within HHS to support these outreach pieces will be part of it. Maybe the human capital resources now that are – and their job description doing different things with preparing data or sharing data might consider new job descriptions or new types of positions, which are specifically related to social media outreach, engagement with audiences on these platforms, monitoring, managing, that sort of thing. That was the only point I want to make there.
DR. SUAREZ: I wanted to explore – I know the topic I guess. This is about — we all heard and the national meeting had a discussion earlier about something called APIs. APIs of course are being seen as a mechanism to allow – filter access to information basically. One of the concepts that it has been applied to on the meaningful use is that consumers could bring their own app and try to access their personal health information from EHRs and have the app actually interact with the EHR and extract data. A lot of issues around that and API recommendations and all that.
I wanted to see what role can the data access work group take on API-related developments around access to federal data. Is there something done already around that area that has defined? A lot of the datapalooza, for example, a lot of the entrepreneurial work, a lot of the newer developments and capabilities, technologies to access data flow or follow that type of technology standard API interfaces. I wonder about that. I do not know if anybody has any sense of what kind of role can the workgroup itself in its defining of best practices, for example, and guiding principles and things like that can undertake that relates specifically to the use of API interfaces for accessing general data.
DR. MAYS: I think it is a very good space to be in I think right now because it is a big space to some extent.
One of the things that Josh and Mo were talking about is that they are going to go back through the datapalooza activities and look at what it is that is being done by these entrepreneurs with federal data. I think just the cataloging of that to some extent will give us some ideas. I would say it is an add-on that we should be able to do. I think we have members who are in that space. But I think right now in terms of the data matrix, we may want to try and get that work done, push it along a little further before we do the other. But I think that they are kind of getting ready to do that in terms of looking at the datapalooza.
DR. ROSENTHAL: I was going to say that I think that complements really nicely also, Walter. That is what RWJ is doing. They are approaching this and they just got out of one of the medians where they went through this – release. They are basically saying what is the best practice to get data out through an API through a – format and that is all the work they did with ONC. They are basically saying to increase meaning and use. They think it is best. They are tackling it from a technical side. Here is the technical format. Here is the API impact. Here are the API standards. Here is also how we want to increase usage by creating 100,000 coders who can all use that. That is very nice, but that is definitely complementary to saying what if you could also put these little elements in there. Outside of the technical solution, it is on the data creation and demand side. I think it fits really well and nicely hand in glove with what RWJ is doing. Both the folks at RWJ, Michael and other folks want to hear more about that specifically.
They have this mandate that they are trying to create better health and they are doing it through a technical solution around APIs and JSON codings.
DR. SUAREZ: — much in line with the main purpose of the work group, which is data access, improving data access, facilitating, enhancing, advancing data access to data resources. I was just thinking here we are working so much. I do not know how much ONC is working on the API side of federal data resources. ONC seems to be focusing a lot on the API side of the use of apps by consumers – health information, but I am not sure about data users accessing federal resources. In any case, it is a special opportunity I think that we have to get some footprint on that.
DR. MAYS: One of the reasons I wanted us to have the connection with RWJ is because they are also working with CDC, ONC, and Karen DeSalvo. They are moving ahead on this. There are certain things that I think in terms of our space that we can do with them. I think there are certain things where we want to turn over to them to be the implementers of it. When Josh and I were talking about RWJ, he has talked about some of these issues with a couple of the people there. He is on some activity there. I think our next thing is our plan is to bring them into the fold. When we talked with Jim, there is no conflict there. I want to talk to Dave first since Dave actually co-chairs that committee. We do have a way in. But bringing them to the table, given that they have done some of this I think would be very useful.
Let me get these two comments and then we have to turn to the use case. I want to hear from standards.
MR. DAVIS: I will try to be quick here. I want to go back to — let me start with the API thing first. Has the work group made any recommendations about APIs in previous letters to the secretary or advice to the department? Just as a quick thing on that, Walter, I think you do have an opportunity if you are looking to provide advice to the department. It could be interesting to suggest that with the re-compete of any contracts were the letting of new contracts that the suggestion be consideration of APIs as part of the data products that come out. Putting at the front end of anything existing or new. The idea that API should be strongly considered. I do not know if we would receive it well or if you want to even do this, but suggesting that there be justification why an API or some form of advanced machine readability was not a product of a contract just so that it is documented that it was considered and here is the reason why. It may just be that it is budgetarily not feasible to produce an API –- if that is along the advice, it might be interesting to explore how you recommend to the department that APIs be part of all re-competes.
I want to go back then also to Kenyon’s comment about human resource. I think it is important to note. We would not be talking about the availability of CMS data had they not invested the human resources in making those data alternatively available. They always had a data enterprise, but now they have a data shot specifically dedicated to the production and availability of data. That is something that I think could be replicated across many of our offices.
I get the impression that many of our offices are producing data already and as a byproduct of what they already do, but that the people who are producing that data are not necessarily specifically dedicated in some cases to that. It is an addition to their job that they already do versus CMS having an actual data shop that is in the business of producing data from that agency. There is a significant difference there. In the area of advice, it might be beneficial to offer up that people be specifically in a data production role in an office at HHS in each operating division.
PARTICIPANT: — with market engagement as part of that function.
MR. DAVIS: That is a good point because there is a lot of signal to be taken from the market.
DR. MAYS: It would make our lives so much easier if we knew where all those people were because we would have them involved with us. There is such a lack of them that it has been difficult.
I want to deal with the use case. I want to go back to that. I do not want to leave that because I want to make sure that we are working at a level where we have some agreement. I think the ways in which you have raised the case of exactly who we are talking about. We do not want to go off and have a group and then we come back when we are in the full meeting and it is like that was not it.
I think we are very agreed upon the first three. I think we are fine about the community one because Pop is in there. I think we are fine about the other two because that is our usual space. But I want to hear about the consumer. Is there a space at all that we are working in around that? I am taking the lead from the kinds of statements and things that have been made by the secretary on wanting average Suzy Q to be able to access data in some kind of way. But I think we also need to think about what is feasible. I think when we are throwing it out there, we are not saying that every data set or every data entity is going to be able to respond to this person, but that there is a goal for you to track whether or not you can. If you do not, can you make any changes to be able to make your data more available?
It may be that it is the public/private partnership and there is some kind of way somebody else does it. It would help us to hear where you are on that.
DR. CORNELIUS: Do you think that would be a good item to send to the Pop committee as we plan for the September hearing, tying into the consumer piece?
MS. GOSS: Isn’t that a session really more about getting the subdomains and the county-level usage? I think it would be very distracting at least from my review of the advanced materials.
MS. HINES: The target audience, as I understand it, is the county supervisor, the city supervisor.
DR. CORNELIUS: I remember from our phone calls that some of the people we are asking to come are people like the Baltimore Neighborhood Indicators Alliance. We are also targeting community-based organizations on those charts. Am I missing something?
MS. HINES: Not at all. But I guess when I hear consumer, I am hearing farther down a level.
DR. MAYS: We are going down a level. I am assuming that our third level, which is community whatever works, but I am actually talking about a person.
DR. CORNELIUS: You mean someone out in the hallway person. It actually goes back to Linda’s comment.
MS. KLOSS: I think about the unique interest of affinity groups. That is where a lot of the patient access advocacy has come from rightly so. It is not Linda asking for it. It is somebody on my behalf that is going to then take that and make it more useful.
MS. HINES: I have to just jump in. At the hearing in May, the fellow from Census had his own personal experience where he was diagnosed with a certain kind of issue. He went to his doctor and he said you are suggesting I can do A, B, or C. What are the relative outcomes of each of those procedures? No one could tell him the answer because they do not have access either to their own hospital systems data much less the hospital systems in the area because of all the privacy stuff. He wants to know why would I choose procedure A over B over C. He cannot get access to that. He is part of an affinity group of people who have had this health event and he does not have the data that he wants in order to inform what to do next.
MS. KLOSS: And generally, unless you work with the Department of the Census and are skilled as he is, you would be stopped in your tracks or you would find a book or you would do a Google search and you would do the best you could.
DR. CORNELIUS: You are thinking about a group like Patients Like Me.
MS. GOSS: Building on this conversation, it seems to me that there are a lot of complexities related to health literacy, general consumer, type of support, and technical data extracts, those kinds of things. It is Suzy Q Public. It becomes a much bigger issue to do well and be a good resource. However, there may be – we are not far along the precision medicine and that methodology at this point. However, there are key diseases and key affinity groups, more prevalent communities in the affinity groups use Linda’s term that maybe there is some alignment between the top five obesity, smoking, diabetes, CHF, take your top five, hypertension. Maybe there is some way to even focus around not a generic person use case, but maybe it is more of about a disease state or a data-based use case that we know that there is an interest because the affinity groups and the association may be the ones who want to data mine that. Maybe it is not a consumer specific, but maybe it is a disease state or a data set specific.
MS. KLOSS: In the interest of moving along, I think we do not want to get hung up on it. You have three clear groups. The fourth could be a community that is not geographic. Or you could do a geographic and a non-geographic community. And then just set aside the individual for now.
MR. DAVIS: You are going to get more benefit by engaging the group that is engaging the individual – trying to be this large entity that is trying to engage individuals. There is no way we can do that. That is why they are there.
DR. RIPPEN: I would say that if you think about federal data sets and who they serve because again the reason why they collect data is because it is such a joy to collect it. It is for a specific purpose. If you look at CDC, they do have information presented from the consumer perspective because part of their thing is to prevent transmission of disease, for example, or to understand the avian flu of where it is going and that kind of thing. If we do step away from the consumer with the assumption that the consumer ultimately we are assuming is the recipient of the benefits of sharing the information and that it is the role of the organizations that are collecting the data to make it available and actually convert it to useful information if it is part of their charter as a government agency or their business case or the private sector, one of the categories that actually knows how to harvest it and target consumers because you are combining the data in different ways to package it as a business case. Again, I think it is a wise flag to say that consumers are critical and end beneficiaries we hope, but not necessarily the user group of data sets that might be available.
MS. GOSS: I have heard you now mention the word standard several times. I have a 4:30 meeting and – leave without letting you know that.
DR. MAYS: As we have had a discussion about this, the question is are there data standards that we need to be thinking about in terms of when we are asking people to put the data out there.
MS. KLOSS: One of the things that struck me when I was reading some of the advanced materials got in the E-book that the familiarity with USHIK. Who here knows about USHIK? AHRQ is not supporting or uploading it anymore. There was a lot of work done by the Health Informatics Standards Board years ago as a part of and helped get USHIK initially populated and then eventually – I cannot remember exactly. But that was a great central spot at one point that was maintained for data standards that might have been a good useful tool for you guys to take. It is still out there. I googled it earlier this week and found it. It may not be maintained.
But from a standards perspective, if you are looking at clinical data sets, some of the efforts under meaningful use, interoperability roadmap would be things if you are thinking about as well as the way that – if you are going to be looking at anything that is adopted under HIPAA or HITECH, there are very definitive standards for data structure especially when you start to look at HITECH in making – maximal schema around the HIPAA transactions, which is one of the recommendations we are waiting for HHS to act upon from NCVHS the structure of the data being able to systematically use it and parse it would be things that I would start to look at just off the top of my head. I am seeing which – I can see the wheels turning. You might have some other commentary from a standards perspective.
MR. LANDEN: I think you described the landscape perfectly. There is a lot to be aware of and most of them are HIPAA or HITECH related. USHIK is a great spot although it is not current anymore.
MS. GOSS: It is probably resource issue, but it was hard to find a permanent home for it.
MR. LANDEN: There has been a lot of development in the last couple of years that will not be reflected in USHIK, but the basics are there. The updates are not.
DR. RIPPEN: Maybe from a data characterization perspective, maybe an approach would be to allow people to specific what standards they followed. The standards and their data. For example, if they use SNOMED – you are saying it is SNOMED. If you say that it is a LOINC code then you can put LOINC. I am making up at least some of the standards.
MS. GOSS: We should think about it from two perspectives. We should think about the payload, the data content that is in it, the code set versus the transport mechanism. I should not even say transport. Data structure syntax.
DR. RIPPEN: Because some are a little bit more convoluted. Surveying results – even though LOINC is now in the question business though too – surveys that the federal government is using. Did they actually create LOINC codes for each question and answer? It is an interesting question. Again, going back to basics and being able to then link other data sets to say that the data means the same or does not.
DR. ROSENTHAL: Or even cross sectional analysis to look at —
DR. MAYS: I have one more question. This may be an NCVHS question. Are there data standards that follow particular things? Most of these are clinical as I understand it. For example, would there be particular data structure issues relative to surveys versus something else?
MS. GOSS: That would not typically be within the realm of what the Standard Subcommittee or Review Committee looks at. We are very focused on HIPAA, HITECH, that type of stuff. The surveys – I know that there has been – CDC and NCHS could probably – related to their structure. They have had to have some kind of architectural model around that data content because of the interaction with the states and extracts of the providers. And especially when you are starting to think about some of the ability to meet your meaningful use and ultimately maybe a MIPS criteria that might allow you to submit it to a registry or survey at CDC as one of your check boxes. There has to be something there. It is just not my specialty.
DR. ROSENTHAL: That is actually where the ERD, the entity relationship diagram, is even more important. There is an ERD for that somewhere. You just need to share it.
DR. SUAREZ: Are we talking about reporting EHR data to national surveys? Is that what we are referring to?
MS. GOSS: Figure out how we have EHR and claims type standards. Is there something compatible – parallel from more the survey environment?
DR. RIPPEN: Or others. For example, I know that in geocoding, there is the zip code and the address, but then there is also the coordinates. How is the data? We do not know that much about NOAA data. I do not know anything about NOAA data except they do interesting studies in particular matter. How do you define particular matter? Where are the standards that actually describe what it is or not? Because then if there are other groups that use similar methodologies or similar standards, you can link the data because then you are saying the same thing versus an apple and an orange. You can say that they are fruit, but you cannot say they are the same. It is just making sure that if there is a standard that we would actually tie it to a summary statement so that people when they look at the data, they can understand how they may be able to use it.
DR. MAYS: What we are trying to understand is to put that in the data matrix that ask the data designer to be able to tell us. I wanted to get a sense of what the universe is like and what do we have to think about because it is exactly what Helga is saying. If you are specifying your data structure then the likelihood that there could be a linkage with something else is probably much greater than if you have done this totally different. I think NCHS will know.
DR. SUAREZ: In surveys, NCHS has done a lot of work around standardization of the data elements, of the questions themselves.
DR. CORNELIUS: One place to check in terms of surveys would be American Association for Public Opinion Research, which is basically the central place that survey researchers go to have a love fest about surveys. Check on their website. They have all kinds of discussions about standards.
DR. MAYS: And the other person is probably asking Bob Groves who did Michigan first and then the Census. He would be a good person.
DR. ROSENTHAL: — universe and knowledge literally is the entity relationship diagram. Even though if it is not language, just sharing that will save so much time and effort – regardless of whether there is a standard or not.
DR. MAYS: Let’s use the time that we have left to move forward and say —
MR. LANDEN: One more comment. In addition to the USHIK, there is another reference would be the ONC publishes annually its list of standards for the clinical side.
DR. SORACE: Just two real quick thoughts. I think it is actually – I have had this idea for a while. I think it might actually be very interesting to see if we could pull together a few different databases from HHS that are representative like survey claims. And then ask do we have the ER diagrams of data dictionaries. Are they just delivered to us?
And then the next step would be in consultation with a few computer scientists. How do we merge this stuff? By the way, I think we could do this without actually really ever looking at the data in many cases. I do not think there would be any privacy issues.
MS. HINES: Jim, actually when Vickie and Bill talked to the HHS data leads that actually was – the follow-up step to that was possibly taking a couple of HHS data systems and applying that framework. I do not know if that is something we actually follow up on, but that was one of the possible follow ups to that meeting. I do not know that everyone here has seen that presentation in the work group. It seems to me like maybe – I feel like that is a missing piece of the puzzle that maybe we could have Bill get on the phone and present that to everybody. I think it really plugs right into what you are talking about.
DR. SORACE: I also suspect we are going to have to do this as a department anyway.
MS. HINES: Here is a framework that has already been discussed. The main barriers are legislative language saying you cannot share.
DR. SORACE: That is data, but not necessarily data models.
DR. MAYS: I think one of the things, Rebecca, that we were going to do is after we flushed out the data matrix was exactly that was to take some of the surveys and see if they would help us do it. I agree with you. We do not need the data. What we need to do is to see can you answer these questions. How onerous is it for you to actually answer the questions and do it?
MS. HINES: He thinks he could actually —
DR. MAYS: But that is different. That is the framework. We are talking a little different here.
DR. RIPPEN: We just have to make sure that we have the crosswalk. I think the framework component is more the first question that is part of the matrix, which is why we are trying to make sure we use different words so we do not trip. We have that in the beginning – categories. You sent the Power Point that had the more detailed things that we are going to take a look at. And then we have the fabulous California Health Care Foundation work that actually specified some things and then Linda’s great comment about we have the principles, best practices. Doing the cross walk. That way we can at least get –
PARTICIPANT: Have you just volunteered to do it?
DR. RIPPEN: I guess so.
DR. MAYS: I was just about to say what our next work plan is. This is exactly it. What we have gotten at this meeting I think has been very helpful in terms of having the California Health Care with its domain and how it is set up and it is even possible in terms of talking with them as well as the other pieces that we have.
I think what we should be able to accomplish between now and potentially the next meeting is to pull that material together and to come up with what it is going to look like. I think what we have to do is exactly what you are saying. I see Bill’s framework as this is the front end. This is the next piece, which is what we are talking about because they are actually talking a little different than we are.
PARTICIPANT: (off mic)
DR. MAYS: Let us bring it to you and then we will have the discussion. There are some things that I think when it comes to the actual –
DR. MAYS: Right, but it is kind of not what is necessarily the primary focus for us. We are trying to push it out the door. We are not trying to do I think exactly the same thing the framework is. We may end up just batting our head against the wall and being crazy when we come in. But let us actually try and see our sense of how these things map together to make a big picture.
And then I think what we want to do is have the committee to then discuss that with us. It would be good, I am going to say this upfront, if we can be on a second day so that we can get late in the day and we can get the members to come a little bit earlier and actually have the discussion about it with us.
DR. SUAREZ: Let me understand. We have the framework that the population health has developed. We have it. That is moving forward. We are going to do something with it.
MS. HINES: Bill is talking about giving it to this group because it does not really – anyway – what is under discussion is whether it actually makes more sense for that framework to be woven into the matrix or some aspect of the work here because it is really more relevant to this work than to population health work at this point.
DR. RIPPEN: I think the nuance perhaps — let me try to articulate what I think the nuance is. The population health group is defining what the categories are and the subcategories and getting into the indicators and all the rest of the stuff. The matrix and the other thing is what you are saying, which is different, which is what are the attributes that we are talking about, which is quality, timeliness, the summary of what is in the information. The level. How easy is it to consume? Going back to Josh’s point.
DR. MAYS: There are two questions. It is what to do, which is what the framework talks about and it telling all about the indicators and how – if you are interested in data, the kinds of things you can do. I think we are not saying what to do, but we are saying how to use and make accessible. They are not 100 percent overlap.
MS. HINES: I was talking about health data framework. There are two frameworks. That is the problem.
DR. ROSENTHAL: I see the matrix as being geared toward access and use.
DR. MAYS: I think we are on the same page.
DR. SUAREZ: I was trying to understand the elements of the work that we are going to do at the work group. We have the framework. I thought Bill had talked about having some additional steps before it gets finalized, steps six, seven, eight or nine according —
MS. HINES: That is the measurement framework. The framework presented to Damon and – data leads are a different framework that discusses the different attributes of data sets and possibly mashing them up. If I am a community with a high prevalence of obesity, I can take six different data sets and mash them together to figure out matching resources and outcomes and various measures. That is the framework I am referring to. Nothing to do with this other thing.
DR. MAYS: That is why I think it is getting confusing. That is why we decided not to use the word framework. Even now not using it there is still two. I am not even sure I understand which of the two, but we will sort that out. What we are saying is we are taking something, I do not know which one it is, and we are going to utilize that as a backdrop for us.
We are going to then – the next step is to make these cross walks. And the cross walks are here is a framework. Here is the access and use. Here is the data stewardship. We are going to try and put out – the best way I can describe it is a plan for how it is for us we can increase access and use. And then we want to bring it to the full committee. We will let you know by the time we have the executive committee meeting if we have not been able to do all this. Remember, we do not have staff. We are little. We are mighty, but few in numbers.
The goal would be if we can actually present what those cross walks and what our thinking is at the full committee meeting and then we can go on and finish our work during the work group time. I think that is what would be useful. That is what I am thinking as the next step.
DR. SUAREZ: Part of this whole concept of how to again improve data access is going to be not just the data – the division of the data itself that is in the federal data resource, but it is two things. One is meta data and API. Those two things – if we have really clear meta data of a federal data resource and we have an API, those are the two main things, plus the data elements that are described in the federal data resource.
MS. HINES: That is what the white paper health data framework, not the one we talked about today is. It is the attributes, the meta data.
DR. SUAREZ: I like the concept of – was mentioning that – we, as a committee, could do, as a work group, could do frame some recommendations —
DR. RIPPEN: Building on that, what I think we are hoping to do is actually develop best practices for everyone to use in a consistent way. Everybody who owns data or is a data provider actually fills in the matrix in a consistent way with the API so that way we can move it forward and then also that we can reinforce the Robert Wood Johnson and for others. Now, what we are trying to do is bring it all together in a consistent way so then the data users whoever they are can actually then not have to worry about how do I look at summaries of information to know whether it makes sense for me. It is all different even in data.gov.
DR. MAYS: I think in terms of talking about recommendations, we want to make sure that we are standing on firm ground in terms of doing that. I think that that is the next step. I like the things that Damon asked because I think they will be helpful for you all. But we also want to make sure because the committee has to make sure it has done it because it has heard it. It has done it because it is a best practice or in some kind of way. I think surveying some of the data owners. I think having some consultation potentially with different individuals so that when we present it, we present it with here are the experts or here is the best practices as opposed to we just happen to have a group that got together and did this. Short of a full hearing, there does need to be some input that will help guide us. I am mindful of reaching out to – or Bob Groves or others that can serve as those experts.
The other thing we are working on is increasing our staff from me as the lead zero to some additional ones. That is the other thing that is part of the work plan.
Josh, any last comments?
DR. ROSENTHAL: Go team.
DR. MAYS: Thanks. Any other comments, questions? I think we are landing at a good place to stop. I am not a person who needs to run the clock out or anything. Hearing no others, I say the meeting is adjourned.
(Whereupon, at 4:50 p.m., the meeting adjourned.)