[This Transcript is Unedited]
Department of Health and Human Services
National Committee for Vital and Health Statistic
Working Group on Data Access and Use
February 21, 2014
Hubert H. Humphrey Building
200 Independence Avenue, SW
Washington, D.C. 20024
Proceedings by:
CASET Associates, Ltd.
caset@caset.net
TABLE OF CONTENTS
P R O C E E D I N G S (1:05 p.m.)
Agenda Item: Review Agenda and Introductions
DR. CARR: Welcome everyone. I think we have a couple of folks that will be here shortly. They just stepped out. We have a couple of new folks here today. We begin the meeting by saying who you are and where you are from. I am Justine Carr. I am from Steward Health Care in Boston. I am chair of the Workgroup on HHS Data Access and Use.
MS. BRADLEY: Lily Bradley with the Assistant Secretary for Planning and Evaluation, ASPE, and I am staff to the committee.
DR. SUAREZ: Good afternoon, this is Walter Suarez. I am with Kaiser Permanente. I am a member of the National Committee on Vital and Health Statistics.
DR. FRANCIS: Leslie Francis. I am a member of the Full Committee, and a member of the Work Group. I am at the University of Utah and I don’t have any conflicts.
MR. SAVAGE: Good afternoon, Mark Savage with the National Partnership for Women and Families. I am the director of Health IT Policy and Programs.
DR. FULCHER: Chris Fulcher, University of Missouri.
DR. KAUSHAL: Mo Kaushal, member of the Work Group.
DR. BLEWETT: Lynn Blewett, University of Minnesota, State Health Access Data Systems Center.
DR. VAUGHAN: Leah Vaughan, member of the working group.
DR. MAYS: Vickie Mays, University of California, Los Angeles. Member of the working group and member of the Full Committee of NCVHS.
DR. COHEN: Bruce Cohen, Massachusetts Department of Public Health, member of the Data Work Group and member of the Full Committee, co-chair of Population and Health Subcommittee.
MS. DEUTSCH: Terri Deutsch, CMS.
MS. JACKSON: Debbie Jackson, National Center for Health Statistics, CDC, committee staff.
DR. CARR: Welcome everyone. Our agenda for the day is the following. Damon Davis is going to speak to us about updates on the health data initiative core mission for this group. Then afterwards, who is joining us?
MS. BRADLEY: Adam Dole. He is the presidential innovation fellow working on Blue Button Plus. He will be talking about new innovative encouraging use.
DR. CARR: These are in keeping with sort of keeping us up to date with what is going on. We then have some important work that we have to get done today. As you know, we have begun a draft of the letter of findings from this workgroup. The way it works is that we create the letter. It then goes through the full NCVHS committee, and then onto the secretary. We had some very rich discussion over the last two days about the letter.
On the other hand, Larry Green, the chair of the full committee, said stop, we don’t want to take the voice of the Workgroup away. It is back to us, and we will take a look at it today and walk through that. Then, as time permits, we will do a little bit of thinking about next steps.
MR. DAVIS: Thank you. Thank you guys very much for inviting me to present today. I appreciate it. This is actually kind of cool for me because the strategy that I am going to go through was developed in conjunction with many of you here in the room and on the phone, or what have you. I appreciate, first of all, your input into this process in creating the strategy. It was a fun process for me to think through all of the different perspectives and get your various opinions. Let me open by saying thank you to the workgroup for your input into the strategy.
Without further ado, my name is Damon Davis. I am in the Office of the Chief Technology Officer here at HHS. I work as the director of the health data initiative, where most of my job is looking towards creating direction for our data initiative, trying to draft a strategy and execute on that strategy, so that the department could be a little bit more strategic in its data liberation approach, as well as looking for different ways that we can communicate about the availability and uses of the data. Finally, sort of inviting folks to utilize the data in the most full and robust ways possible.
I have a slide deck here that I would use for basically any audience. Some of the statistics here, you are going to already be familiar with. Obviously, one of the things that is important in health care today is the fact that it is a multi-trillion dollar industry. It is something that we have a lot of work in addressing the different inefficiencies and lack of effectiveness, and sometimes poor quality across the delivery of health care. Obviously, we have got an opportunity to utilize data to drive some of those changes across the health care system.
Data is clearly changing things in multiple different ways. Obviously, there are payment reforms on the horizon that are going to be changing the way that we utilize data and information. There are individuals now interacting with their health, health care and their own health information in various and robust ways. Finally, there is an increase, an influx, in the utilization of data at the point of clinical care. We have got significant opportunities to make a real difference by making the availability of the data a little bit broader.
I would typically go into sort of the beginning of our health data initiative here. I think many of you probably are familiar with it, so I am going to skip over that because that tends to be a general audience piece. Here, we have the mission statement that was developed for the health data initiative, to improve the health and health care and delivery of human services by harnessing the power of data and fostering a culture of innovative uses of data in public and private sector institutions, communities, research groups and policymaking arenas. We really are looking at a broad swaft of individuals and organizations who we want to be aware of the data, utilize the data, provide data back to the health data initiative, and other kinds of interactions along those lines.
One of the things that prompted the creation of the strategy was that we had, as a department, somewhat of a catch as catch can approach to data liberation. It was whenever we heard that data was going to be available, we went after it. You can imagine that that doesn’t necessarily create a strategic approach to the way that we are going after the data that we are looking for.
However, in large part, our data liberation has changed the way that the department is thinking from data being a closed, unavailable asset to being very much open. That is something that we want to continue to foster. The mentality is changing, and we are going to continue to change it with some of our communications and different efforts, both inside the department and outside the department. I will go into that a little bit here.
The strategy is comprised of five main goals. Those main goals are advancing HealthData.gov, the platform where the data is available, to make it a more user-friendly and just generally better website for search, for utilization of the data for accessing the machine-readable formats of the data. We are going to be highlighting departmental assets that support achieving HHS strategic initiatives. I will go into the meaning of that later. Simply what that means is taking a more strategic approach to finding data, making sure that the data we are liberating is actually directly related to the department’s own stated strategic goals.
We want to educate new and existing internal and external participants about the availability and uses of the data. We want to enable and incentivize the marketplace to use the data in creative and novel ways. We want to implement administration policies, both from the White House sort of administration level, as well as from the department’s own policy development perspectives. We want to create and foster openness from a policy perspective, as well.
Now, we will go into each of the various goals. The first goal here is advancing HealthData.gov to basically a better website. There are a couple of different ways that this would be possible. We want to improve the metadata quality for better usability of the data. We want an area of the platform for non-HHS data. We are widely heralded as one of the major purveyors of health data nationwide. However, the simple fact is, there are many other federal entities that also have health data. Think VA, DOD, Department of Education and others who have elements of health data that would also be valuable to utilize HealthData.gov as the platform for discovering those other data resources.
Another way to think of it, too, is from the state-based perspective. There are multiple states out there that are on their own open data journey. One of the things that we want to do is embrace the fact that states also have very strong data assets available for public use. What we are striving to do is create HealthData.gov as a discovery zone for health data across the board, not just our own HHS data, but other federal entities, state-based entities and localities.
We want to make sure that we are funding the platform’s existence into the future. I think that one is a relatively obvious bullet, but it deserves to be stated. We want to create some feedback loops for better usability of the platform. Basically, we have created the ideas tab of the platform. We have created the questions platform. It is an opportunity for some interaction. We feel like that user feedback loop could be a little more robust. We are going to try to look for some strategies and ways to do that.
We are going to be developing and publicizing the availability to store and host HHS data. Many of our colleagues across the department have multiple different IT implementations where they are storing data. We have the availability of our own data store. We could potentially reduce some costs, create some efficiencies, create some machine-readable formats of data that is not currently machine-readable by virtue of ingesting those external data assets into our own data store. I say external meaning outside of CTO office.
Then finally, certain things that are going to be important on the platform are improvements in sort and search, and other kinds of identifying abilities for the data. There is a question?
DR. FRANCIS: This is just an observation. It would be enormously helpful, I think, for us to know as much as we can about what is going on, on some of those fronts. The one that prompts me to say that is the metadata one because we are in the process, this working group, of developing a letter that makes some suggestions about metadata. It would be really nice to have a little dialogue.
DR. CARR: We will get to that maybe once we get through the whole thing, and then we will update you a bit.
MR. DAVIS: Thank you for raising the point because that is one of the things that I am actually looking forward to hearing back from this group about.
DR. KAUSHAL: You say obtain future funding. Do you envision getting external funding, or is it just all internal?
MR. DAVIS: It is going to be internal. It is a federal platform. That can get to be kind of murky. Another element of this in the next goal is highlighting departmental assets that support achieving HHS strategic goals. What we need to do is first identify what we think are strategically relevant data assets. In the vein of creating acronyms in government, SRDA is now the new acronym for what we are thinking of as our strategically relevant data assets.
The idea is that we want to, as I have said, align our data liberation efforts with the fact that the department has already openly stated its strategic goals. The start plan is going to be coming back from OMB, as I understand it, very soon. We are going to be releasing the 2014 to 2018 start plan very soon in the spring. We want to align our data efforts going forward with those strategic goals.
We want to further drive the department’s culture change toward data openness. I think that many people across the department understand data openness, but may not necessarily have a robust understanding of what we are doing as a department for open data. There are so many people on the line that are working on data-related projects, grants, research, et cetera, and that are not necessarily probably thinking about the end product of their data work being something that could be openly available, seen through a different lens by an external party, and utilized for a secondary purpose.
We want to work with each HHS division on external outreach, so that we can make sure that we are supporting the data efforts of each of our various individual divisions. There is a large ecosystem out there for public data usage. We want to as much harmonize and coordinate some of our various communications as much as we do just generally create and make suggestions for ways that we can communicate together.
An example of this might be, sometimes a division will post some data to HealthData.gov and then that is it. If nobody says anything about the fact that the data is now available, it is not going to generate too much traffic. I want to help the various divisions think through the fact that you now have an opportunity to blog about the availability of this new data set. We can then tweet about it through HHS idea lab. You can cross promote the blog. When you go out now, and you are talking about your various data assets, you can add this as a bullet point in some of your newly created assets, so helping folks to think through the communications opportunities for the data.
Finally, we want to increase traffic to HealthData.gov as the discovery zone. I have kind of alluded to that a little bit. I just said it in some points, thinking through social media and various opportunities to really amplify the message that there are resources there and openly available for folks to use.
On another goal, we want to educate new and existing internal and external participates. We were very careful to word this one correctly because we feel like there are definitely distinct individuals and organizations. There are newbies to HealthData.gov. There are those who are already utilizing data sources from HealthData.gov. Many of those users are external parties, but some of them are, in fact, internal parties. We want to make sure that we think through the breadth of individuals who are going to be engaged with HealthDAta.gov.
We want to spotlight the value of openly available health data to health care transformations. When you think through payment reform and quality initiatives, and all kinds of things along those lines, we want to make sure that those on the outside are thinking through the availability of the data that we have as potential supports for the things that they are trying to implement.
We want to increase the percentage of machine-readable data assets. We have seen through the health datapalooza in so many external events, that there is a vast ecosystem of individuals and organizations who are looking toward the data as an opportunity to fuel the various projects they are using. We want to make sure that we are not setting them up for having to constantly manually come back to the platform to look for updates, but that they can set their software systems applications and tools to quickly interface with HealthData.gov, so that they can move about their business of doing business and improving health care.
We want to continue to expand external outreach. I have talked about that a little bit with regard to supporting some of our divisions, but just generally thinking through going to various conferences and different opportunities to address existing organizations who are talking about various health care transformations, but also thinking through some new audiences that we have not been in front of necessarily before, as new audiences that would be very interested in the opportunity presented here.
We want to develop use cases and an internal marketing approach to enhance workforce engagement and continue the culture shift towards understanding an appreciation of open data. Quite literally, just figure out what are the examples that exist that we want to point out that are going to say, whatever your idea is, I can point over here and say, here is a similar sort of thing that you can feel comfortable. This can be successful for you. We want to build out and sort of uncover the use cases in these regards.
Then finally, we want to insert the tenents of the HDI into Idea Lab programs. Just as a quick sidebar, the CTO’s office has rebranded itself as the Idea Lab. The chief technology officer’s office is not somewhere that you can call and get tech support. We are a place that is utilized for a lot of what you might view as sort of consultative services across the department in many different ways. They are policy, and in many ways, IT.
In this instance, Idea Lab stands for innovation, development, entrepreneurship and action, IDEA. What we are trying to do is foster the availability of the various ideas that folks across the department have toward making those ideas action-oriented and actually providing resources that will allow people to bring these ideas to fruition.
Many of us, in our various jobs, have seen the way we do business. We wish that something could be different, but we don’t necessarily know the way that you can actually bring that idea to someone to make it happen. The IDEA Lab is now rebranding itself as a place where you can start to look for various resources if you are inside the department, to try to make those changes.
The reason that this bullet is on the goal is because, as these different projects come up, they are going to have data-related elements to them. If we are creating internal innovations inside the department, we also want to make those available as data assets to external parties.
DR. MAYS: Can you talk a little bit about what kind of groups you want to do outreach to? Is it non-health or is it other health groups? Then, can you say any more about what you are thinking about in terms of use cases?
MR. DAVIS: In terms of groups that we are thinking about reaching out to, there are some of the classic things that just generally come with the health information technology space. You might think Health 2.0s and HIMSS and those kinds of places where people are already thinking about how information technology can be utilized in the health care space.
Another place that you might think about it is at the university level, where students are embarking on health-related projects, and may not necessarily realize that the federal government or their state government have a robust set of resources available for them to do some deeper analytics into, say, a research project or what have you. I was just having a conversation with Leah Vaughan, to my right, about the Black Girls Code Group, and the fact that they were teaching young African-American females and women of color to code. That is an opportunity to create an outlet for the data to be utilized by an audience that we have never addressed before.
Those are sort of the creative kinds of things that I would look to the group to bring back here, and to the CTO’s office or IDEA lab, and say, I would love for you guys to provide data resources or do a data talk to, and name the group. It is very wide. It is not necessarily all health-related.
I just got an email about trying to find ways that transportation data can be utilized as a match-up with health data. There is a lot of room for creativity across the space. I think it is going to be really interesting to see.
I think the second part of your question was with regard to use cases. I think that those are also varied and very broad, too. I have a slide here where we will see a couple of examples of uses of data. One of them that comes to mind that I really am enjoying talking about right now is an organization called Purple Binder. This organization is utilizing HHS’ administration for children and families data to provide social services data to social service workers, so that when they are addressing an individual in the field, that person is able to understand what services they are eligible for in their local area. That is one of the first social services examples that I have seen of uses of public data to provide that kind of feedback.
In presenting the examples of the entrepreneurs and innovators that are out there utilizing the data, and heralding them so that others can take that ideation and say, well, if they can do that, maybe I can do X. That is the kind of use case thing that I am thinking through.
DR. BLEWETT: Is there a list of those somewhere? Do you post them?
MR. DAVIS: That is a really good question. I don’t have a list. What we tend to do is gather a list at palooza time. We have a list of folks who will apply to present at Health Datapalooza. That tends to be a great opportunity for us to start to learn who is using the data, how they are using it, et cetera. That is an interesting bullet that goes back to the feedback loops.
The data that we have available on HealthData.gov is free, openly available, and there is no requirement for you to log in and download it. You don’t have an ID for the platform. Therefore, we don’t necessarily know who is using it on a regular basis.
While that is awesome in terms of openness and transparency, it is challenging because you don’t know who your customers are, who you are serving and what they might need back from you. There is a real opportunity for us to make the catalog better by having that engagement with a broader audience, and currently we don’t have that. I think that is an opportunity that we would like to explore further.
What we do do, though, is as we travel and we are having these talks and phone calls and things with these entrepreneurial endeavors, we ask them. Do you mind if I put you in my slides? Or would you like to blog on HealthData.gov? We are trying to create sort of a larger groundswell of information about organizations that are using the data, so that hopefully more people will raise their hand and say, would you like to talk about me, and we can increase that communication.
Moving onto the next goal, enabling and incentivizing the health data ecosystem. We want to continue to publicize the availability of the data, as I have said, drawing the attention to its uses, basically through the use case. But we want to point out the use of the data in the transformation of health and health care, and the delivery of human services. We really want to focus on some of the strategic approaches that the department is taking to manage and make a transformation in health care. We want to point out utilization of data in improvements and quality, and changes to payment models and things like that, and herald those examples so that others can emulate them.
We want to seek new ways to engage entrepreneurs who may use the data as fuel for this businesses. One of our main flagship things is the health datapalooza. Obviously, that is a real opportunity where we work with the health data consortium to engage in entrepreneurial entourage of people that is now in the numbers of 2000 plus. We need to find some new ways to engage the entrepreneurial community. I would look to the folks in this room to assist in helping us figure out the different ways that we can engage these various audiences.
We want to also develop relationships and support the needs of federal and non-federal data projects, data enclaves, repositories, et cetera. What we are talking about here is obviously, as I have said, we are not the only big health data game around. There are massive repositories of data out there. They are trying to do some really cool analytics and make some changes in health care. We would like to be helpful and supportive of that process.
One of the challenges that we have in that regard, though, can be policy related. For example, it has long been said that CMS data is very valuable from a claims perspective in understanding what is happening in health care. However, you can imagine that it can be challenging to get CMS data in various forms that you might need.
What we are trying to figure out is what are the policy levers that we can adjust that will allow us, as a department, to make CMS data available, for example, to a large data enclave, so that they can take that claims data and make it part of the analytics and knowledge generation that they are doing in a large enclave. That is one example.
The idea is let’s examine the policy lever on that instance and leverage that policy change across multiple instances of open data, so that we can now scale that availability of various sources of data to other external repositories. We want to support knowledge generation outside of the department. There are a lot of people that are doing some heavy lifting to do that.
The final goal here is implementing administration and departmental policies that foster openness. One of the things that I have been asked a couple of times when I am on the road is what happens when the administration changes. Is all this open data stuff going to go away?
Thankfully, no, and that is for two reasons. One that horse had run out of the barn, and you are not getting it back. The other thing is that we really do want to institutionalize the idea of open data as part of the way that the department does business. The first element of doing so is creating a charter that is going to allow the health data leads organization, the group of liaisons from across the departmental divisions, to be chartered as an entity that continues the ongoing practice of open data across the department. That charter is going to go into clearance next week.
The other thing that we want to make sure that we are focused on is the implementation of the open data policy, M1313, where the management of information as an asset. I am sure you are all aware of the White House’s policy to make sure that all federal entities are managing data as an asset across its life cycle. We want to continue to be a leading edge agency in the open data space, but also obviously, meet all of the goals and tenents of M1313.
We want to draft plans to address the impacts of various policies across the board. Open data policy is one thing, but then you have the Holdren memo on increasing access to the results of federally-funded scientific data. That is a specific scientific data-related memo. However, how does that fold into the overarching open-data strategy that the department has? Let’s make sure that the one hand knows what the other hand is doing, so we are not duplicating efforts in multiple places. That goes to the last bullet, which is clarifying the relationship between the various policies, so that we make sure that
Here, I would normally go into some of the various data assets to familiarize an audience with what we make available. I think you all are very much familiar with those. I will skip that. Moving on, this goes back to the slide of some examples of the entrepreneurs that are out there, working on some health data initiatives in a business perspective.
Purple Binder was the one that I spoke about before that utilizes social services data to deliver social service awareness to the staff in the field. Another one that you may have heard, our CTO, Brian Sivak, talk about Aiden. Aiden is using nursing home quality data, and I believe it is hospital duty, for an evaluation in post-acute care. Basically, they are providing data to the post-acute care coordinator at the hospital, to allow that individual to provide a patient or the family with information that actually has quality providers on a list of post-acute providers.
This changes the model from, here is a sheet with 25 names on it, you should go pick a post-acute provider for your follow-up. What you are going to do is probably pick a name that is familiar to you, a Zip Code that is close to you, or a phone number that is within your own area code, versus actually making a decision based on quality, which Aiden is trying to provide. The new sheet will now have people that may not necessarily be quite as close to you, but are going to provide the better quality care possible. That then is going to lead to all kinds of efficiencies and cost reductions in the system.
I can give you more information. If you have a copy of the slides, either in hard copy or soft, they are in the notes of this slide. There is a little bit of an explanation of each one of these companies. Normally, what I would typically close with is sort of a request list for the audience. Tell us what data you want to use, and let us use your input as an opportunity to go and track down that data, and make it openly available. Help us define what SRDA really is. What is strategically relevant in your eyes and contribute to that conversation. Share your data with us.
There was a project out of Pittsburgh called TECO(?), where they had taken decades of CDC epidemiological data and made it available through their own project, which unfortunately was funded by the government. We funded a project to make government data openly available. Now, we have federated it from this university. The idea that we want to utilize these data stores for a broader knowledge generation is what is meant here.
Finally, tell us how you are using the data. I have already said we don’t know enough about all of the various ways that the data is being utilized. We really would like some more robust feedback on that. That basically concludes what the strategy looks like. I would love to take more questions and have a discussion about opportunities or challenges that you might see.
DR. CARR: I am going to ask you to be a standing opening item for this meeting. We finally have been able to move back to HHS. We have been in this curious situation of being a reactor panel, but not having anybody to react necessarily to. I think being here and having a chance to really explore all of these things that we have spoken about, but without having benefit of sort of the direction that you are going in. This would be very good.
I would say the other thing, and Leslie alluded to it, is that the workgroup is seated within NCVHS, which has a 60 plus year history of advising the Secretary and so on. Yet, we know that the workgroup is meant to really be available to you as a reactor, as well as if there are things to put forward to go through the committee structure. Just as Leslie said, we have got a letter in the works that talks about a lot of the things that you are already saying. I think it is also important that we not be a day behind what you are already doing. We will talk more about that today.
I would say that this group can really play that role as reactor if you come, throw the ideas out. You can pick one or two, and we can really work on it. I think that is the most nimble way for you to take away from this group what should we be thinking about. While a letter will codify this is what we heard and thought, it is not nimble in the timeframe that we have.
Just one other question for me. As we think about HealthData.gov, as you think about it, we all think about it, who really is the customer? We have spent time talking about the two different kinds of customers, developers, innovators, technological folks who have certain needs about the metadata knowing things. We have also talked about communities, who would want to access this data, and pointed out that it is hard. It is not really easy.
As we have thrown ideas around, one of the things has been in the ideal for a customer that is not technologically savvy, having an Amazon-like thing. Here is this. Customers who like this also use that. It can be used here. Here is a little description of what it is. That would be welcome by certain customers.
I think it is important to know, do you see this as a repository that people who specialize in this than customize the data toward a certain audience. Or is it your intention that HealthData.gov will achieve that level of customer friendly for the less technologically savvy user.
MR. DAVIS: I think that it is probably going to be best for at least HealthData.gov to remain at a sort of, not layperson, but less toward the technical side. I think that there is still a large audience of people who have the opportunity to come to the platform. I think that many of them are going to need a less techy version of what the data is.
However, there is also Hub.HealthData.gov, which is the more technologically advanced side of the platform. I think it is focused more towards those who just want to jump straight to the data, download it, start to mash it up, et cetera. I was not present when both platforms were developed. It is my understanding that that is the intent, that the HealthData.gov platform reach a broader audience, have both the ability to provide the download and sort of technical aspects of what someone might come to the platform for. Also provide the nice robust description of what the data set is, what is its periodicity?
Some of the other things that someone might be interested in, just generally knowing what is there. In which case, the Amazon.com sort of example of what other customers have come to this platform to view is also something that is going to be valuable there. I think we have to sort of make sure that we have balance. I am not positive that we could actually say that we have an answer to what does the customer look like because it is so broad.
DR. TANG: A lot of the initial work in open government and open data has been premised on field of dreams. If you expose it, then they will come. You are working more and more to make it appetizing and digestible by folks probably more, we called it data intermediaries, the technical folks.
One of the things that occurred to us, and we had some hearings with communities, some of your target end-users. One of the things I am not sure is do they all understand what are the problems that could be helped by having better data. In a sense, do we need to do some of that up front work? Instead of the field of dreams, be more the traveling promatoras in the field. What are the problems that you face in your community?
Let us with the data background figure out, here are some things, knowing both the data and the methods to use data, here are some problems that are addressable by having better data about it. Do you see what I am saying? It is almost like an REC effort. You have got to deploy to the field because you have to understand their problem, which we have found is so unique to each community. It is not at the state level. It is not even at the county level. It is at the community level because those are the effectors, the action folks.
Have you talked about that? Where do you think HHS would play a role in that promatoras kind of activity?
MR. DAVIS: That is a really interesting question. I think it goes back to the feedback loops piece. Part of the challenge is, quite frankly, resource constraint. In an effort to build a large sales module that would be able to sort of reach the boots on the ground effectors of change is going to be really challenging. I think at the very least, in an effort to prevent making perfect be the enemy of the good, I think what we would love to do is at least start the conversation around the feedback loop piece, and invite that conversation. Tell us what you think you need.
It may be that folks don’t necessarily know what problems they could solve. At least if we have the ability to begin the conversation, I think we could probably begin to uncover those things and a model might emerge from that. As of right now, I can’t see how we could do a whole lot more than what we have.
That also says that we need to rely on our various divisions quite a bit. You have got the CTOs office, who is sort of the purveyor of the catalog of HealthData.gov. You have also got hundreds of people across multiple, very large agencies, who are going out and talking about their various projects and data assets and things along those lines. If we provide them with the talking points about what HealthData.gov is, what it can do, what is available there from their division’s perspective, as well as across the department, I think you at least begin to address that desire to reach the foot solider on a more direct basis.
DR. FRANCIS: I was just hoping you could give a concrete illustration of maybe a success case of improving metadata so we would have some sense of what that is looking like where you are now, that might inform some recommendations.
MR. DAVIS: Unfortunately, I don’t have many, and that is partially because in developing the strategy, we had to set out a timeline for how we were going to go about some of these different activities. I have not had an opportunity to start that piece of the project yet. What I have been trying to think through is a lightweight approach to sort of test it out. Where can I go? What division or organization could I turn to and say, I would like to sort of embark on a metadata project with you and improve some things. That would require obviously some level of analysis up at the front end to say here are the things that we think we could change.
Then, turning to an agency, keeping in mind all of these individuals have a day job, and managing the data is not necessarily the thing they are all doing on a regular basis. Finding somebody that could potentially be effective, and then scale the lesson learned from that engagement to others. It is not just something that I have actually had an opportunity to do yet.
Now, I would love to hear back, though, and I pose this to the entire group, if you have had engagement in making your metadata better from your various organizations, I would love to hear lessons learned from you as to what you would at least suggest that we start off doing or not doing, so that I could get this off the ground in an effective way. I would welcome that input from the group. What do you suggest, at least at the beginning or at a high level, as the first two steps in improving metadata with one organization, and then scaling that across the departments? Recommendations are welcome.
DR. COHEN: It was funny when you mentioned that you are not a technical support group. In our conversations with communities, we are actually finding that might really be what communities need to understand and use the data. Paul began to mention it. The model that we have been talking about is the agricultural extension service.
If we had a health data extension service that could easily connect community users, so that they understand the usefulness of the information and the usability of the information, and orient them towards all the data that are available, that might be the missing link. We haven’t really figured out, at least I haven’t really figured out, whether there are enough data out there. The issue is connecting the data to potential users who can incorporate it into decision-making. I know it is a question of resources, but I would strongly encourage HealthData.gov to think about human resources to provide service at the community level.
MR. DAVIS: Can you say a little bit more about the signals you are getting from the communities about what support they would need? I am interested to hear.
DR. COHEN: In our community conversations, it has been amazing to me how pretty sophisticated communities aren’t aware of what data actually do exist. I guess one of the issues we are dealing with is trying to assess community readiness to use data because there is an enormous variation in communities’ ability to integrate data use into their decision-making process. We are trying to explore that interface, as well.
The other issue is federal data hubs like HealthData.gov repackage the data that exists. Most of the data that exists aren’t at the community level, a small level is county, for actionable kinds of projects that communities engage in. They need smaller level data, more granular data. Whether it is granular with respect to geography or special population groups that they are trying to target with their interventions. What is the role in the federal government data liberation strategies to support those kinds of activities?
Those are the kinds of issues that I see, where the federal government can make huge inroads. It is almost like I feel we need to pause because our initial effort was to try to get all the data out there. We haven’t put the same level of effort into understanding the end user’s ability to absorb that information and make sense of it, and make it actionable.
MR. DAVIS: That is very interesting. I couldn’t help but think you build a gas pipeline right to a bunch of gas stations, but nobody has cars or the ability to utilize the resource. That is a very interesting problem to have. You have got an abundance of something that nobody really realizes that they need.
DR. CARR: It is funny. We got back to what we initially had in the letter. A lot of times, the use cases are potentially very useful. The other thing is we talked about extension centers, but we also talked today about online learning. We don’t have to go that far, but just targeting the online learning to different types of communities. Show them this is what you can do.
That is what I think people love about datapalooza. You say, wow, look at all the different things. Even just putting that in front of people, and again, going back to the Amazon thing. I know I am consistent, if nothing else, but here is what someone else did. They use this data in this way.
DR. COHEN: Datapalooza targets one segment of the population. If we had a community datapalooza, the whole structure of that, of innovation, for community uses, not targeting the particular audience that datapalooza does now, that would be another kind of opportunity and structure to connect the data to ultimately the targets who we are focusing on.
MR. DAVIS: I have often struggled with standing in some of those massive conferences with all of these great techy people and all these awesome ideas. Your could literally probably walk out of that convention center, go 100 yards in either direction, and find a community health center that has no idea that any of those tools and resources have been developed, that they are standing right down the street, and that is ultimately where those things need to go. I think you are right, that we do have a significant challenge, but a real opportunity to get some of that information down at the right granular level where it could be used.
I couldn’t help but think as you were talking, the health data consortium, which I am sure you all are familiar with, has an affiliates program where it has tried to localize the data liberation efforts in multiple states. I wondered if maybe there wasn’t an opportunity for at least the amplification of the message to go through the consortium by way of affiliates perhaps, just to be able to start somewhere.
Basically, you have got an infrastructure in place of people who are paying attention now. Basically, you almost amplify the message to the same people who are coming to the palooza. At least you have more localized the conversation, I think. That may be one of probably many opportunities. We could explore universities or other relationships perhaps.
DR. COHEN: I think we are searching for these kinds of channels. I think this will be a really productive interaction between us, as we spread out. I look forward to it.
DR. CARR: We are going to keep going around the room.
DR. MAYS: I think we are all kind of on the same theme. I think when you made your comment about not the IT consult, that it got us thinking. The example that I was going to bring up is, Census really experienced the same thing that I think HHS did, which it really wanted its data to be used by communities. What Census has done is it has established around the country these community data centers.
We have one at UCLA that is specific to the Native Hawaiian and other Asian-Pacific Islanders. What they do is they take the data, and they break down the Census data. They make charts, tables, they do reports. They become kind of the expert for the community in terms of getting the data out. What happens is that they come to an annual meeting at Census, they talk about what the needs are. Census begins to get also a better sense of how to package it.
I think the other solution that I would offer is that you probably need something like a partnership with NIH. What happens is that we who get funded, particularly at the centers, are supposed to do exactly what you are talking about. My center runs data education workshops for the community. That is your federal funding that is doing it.
You really would have the ability to take the federal data that we have. We are supposed to now, if we get over a half million a year, put that and make it available. It is amazing to all of us is that we have nowhere to put it. We all run to ICPSR in Michigan and give it to them. The federal government has not established a place.
Then, it would be at a different level. It would be at a state level or a community, et cetera, level, with the right kinds of protections. You actually have that. You have actually funded it. I think we need you all to think about, as Bruce is saying, that is what people need is for there to be some way to actually do training. I think having these data centers the way the Census does would be perfect because you would have a feedback both ways.
MR. DAVIS: I appreciate that comment, and I would love to talk with you more about what the possibility really is there. I am interested to know about MOOCs, sort of massive online sort of education resources. Does the group think that is a valuable sort of online digital footprint that could be continually referenced? Or is it not necessarily a resource that we think is valuable?
That is one approach that we have considered is if you start to have FDA, NIH, CDC develop some three to five minute videos that discuss what the data was curated for, the fact that it is available, and some of the ways that they have seen it used, and created a short suite of maybe three different use cases that different entrepreneurs have utilized the data for, just to show an example, is that kind of digital footprint something that is going to be valuable in the conversation? Or are we talking about something different?
DR. TANG: I think it is different. For example, if the counterpart in the lean would be to say let’s show a video of lean. It is not saying, how do you identify and tease out the problem to solve. Then recognize which ones could be aided by data. That is a pretty heavy lift.
I was going to go back to what Bruce said about localizing. Your affiliate program was a really great example. You talked about the lack of resources, but what if your resources were basically providing the travel expense for the communities that are doing something, to put them on the roadshow that goes around. One, she is like me, that is point one, and two, then what did she do? That is a real education.
Then, it is not actually your resources because it is community teaching community. What they don’t have is this body contact, and that is what the regional program would do. Then, to save on your resources instead of having you have a staff that goes around, you just ship in these other communities, and let them talk to each other, but on their turf. The whole bringing people here is tough, right?
Anyway, that is a piggyback on what Bruce was saying. Also, the main comment is I don’t know whether it is telling them more about what we have or you have. It is helping them figure out what their problems are.
DR. VAUGHAN: Some of the other examples that come to mind, Stanford and Coursera, to a certain extent, have put forward MOOCs. What has been evolving, at least in the bay area around women and code, is that is the point of contact for coming together to learn together. It basically just becomes that is the lecture, and basically this is the lab. There is a shared point of view. There are usually participants who are training each other, who have some familiarity with the subject matter already. It has been useful. It is very interesting, and people have been able to advance because of that type of situation.
To a certain extent, Hopkins is going through some of that. They are in the process of doing evaluation of exactly who is using the MOOCs at the School of Medicine.
DR. MAYS: It is almost like what Paul is saying. There are some community groups, very large, that tend to have their own data speak. I want to go back to this notion because it is like one of the workforce development that is in our grant. We are training promotoras to work in research settings.
You want the community to actually do it, not you, not us. Really the push-out is there because what happens is that the knowledge stays much longer. No dollars, no nothing, the community stays with it. They may get a little outdated if you don’t have booster training for them.
What is important is because of the reluctance to use some of the federal data, and not understanding kind of about identity and protection, et cetera, you really, for this one, need the community to do this, either the data center or promotoras training. Again, one of the groups that I turn to is the community campus partnership. They are big on bringing the community together, about research and teaching them how to participate and use data for what they need, and then to protect themselves.
DR. VAUGHAN: Another one along that vein is the Community Data Research Center in New Orleans. It is just the same thing, largely around census data, but community-based engagement. I wanted to loop back to your metadata observations and comments. I am pretty sure it was The Open Knowledge Foundation meeting in London middle of last year, the question in issue of how do you tag many things with metadata, came up.
To just gently introduce the idea that not everything on HealthData.gov is the data itself. Sometimes it is a link to the data. Sometimes it is a link to a document that has data in it. Their notion was that all the PDFs should have their own metadata. It is very low-cost. It makes it eminently more findable, that there isn’t time and resources to actually transform everything into a machine-readable format. That gives a systematic way of cataloging rich assets, including the legacy assets.
MR. DAVIS: That would actually support the feedback loops about what the demand is. If the metadata is supportive of finding the document, and multiple people have the option for voting on making that an open data resource versus remaining in PDF, in an area of shortage of resources, you could then begin to allocate resources in specific places in order to address the public need. I think that is a really cool idea.
DR. VAUGHAN: There is also ecosystem of people making businesses around transforming PDFs. Our team was fortunate to work on a humanitarian piece with Sunlight and Human Rights Watch. It is out there.
The other point I want to touch on, in terms of who has done it before, is there is a really rich long heritage at Federal Geographic Data Committee. I would actively encourage you to reach out to them. It is well-vetted, very mature. Lots of stakeholders reaching consensus about what the data element should be, why and how. I think there is a lot of richness there, and I am happy to make the connection for you.
DR. BLEWETT: I would like to make a couple of points. One is just to reaffirm this distributive model as opposed to here it all is. There might be some ways to partner and use existing, whether it is the campus portals or the campus data centers, the community partners or data centers. NIH sponsors population health centers all across the country.
Your last comments about there was an epidemiological data set that you brought in-house. I am funded by NICHD to harmonize 30 years of the National Health Interview Survey at the University of Minnesota. It is online and successful. Our stuff is for researchers specifically.
It is cohoused with the IPUMS. Have you heard of IPUMS? That is a Census. That is this geographic historian, I call them a quantitative historian, Steve Ruggles, who has basically taken all of the Census data. Now he does like the slave data. He goes and finds microfiche in vaults and transfers it into data. He is just a crazy guy. This whole thing is to release all the Census data and make it available. It is all public. It is all at the University of Minnesota. It is just like an amazing thing.
I don’t know that the federal government could ever do that. That is funded again by NICHD. I guess I would encourage you to leverage some of that. I think that is what you were saying, too, Vickie, is to leverage that. They are funding data infrastructure projects. They are really well-funded. He has got a bunch of other initiatives.
It is researchers mostly who use that resource. In all of that stuff, he does have people sign in for an account. You have to just say, I am not going to release. It is all public data anyway, but there is some kind of general statement that you have to sign. It is on the University of Minnesota.
MR. DAVIS: I think you would get into PRA if you end up asking people to sign in on HealthData.gov.
DR. BLEWETT: That is another reason sort of to distribute it out.
MR. DAVIS: The Paperwork Reduction Act is a misnomer. Basically, it was intended to prevent the government from sort of unduly surveying people. There are all these rules that arguably out of date, but the bottom line is I don’t know that we could necessarily get into that.
DR. BLEWETT: That is again another reason to sort of distribute it out where people can do that. He can keep track of everybody. Then, he has a citation, and he wants people to use the citation. Then, that goes back to the government. This is what we have done to use your data. All the thousands of people have used it.
MR. DAVIS: We thought about that, too, just sort of the idea.
DR. BLEWETT: It is like translating the data you have got into something that is usable. In my world, it is for researchers. I think here, we are talking about community people. That is probably not going to be you who are going to do that. Probably other parts of the government would fund that to get that out, so a distributive kind of model. You could never meet all the needs in the world. There might be another funding agency that has got a priority, that would like, in my world, that is upon the census. We are doing the NHIS. It is pretty cool stuff.
DR. QUEEN: Damon, if the person is signing on and just literally certifying, it is not considered information collection. It is information collection if the certification is, in effect, in lieu of a data collection. If you are literally just saying I will use the data for these purposes, just like the ICPSR. When you sign on to use any of their public, it is just a checkbox.
DR. BLEWETT: That is another group that has done a lot of that harmonizing data release.
DR. FULCHER: Last time we saw each other was last year at the Administration for Children and Families. We followed up and worked with them on the locations of all HeadStart locations and buildings. We added that layer to community comments because of all the other data layers in there. You can look at poverty, education attainment and those social determinants related to HeadStart locations and access and all that. I just wanted to follow up, to let you know that that happened.
A couple of things, a really good point about the FGDC. I think that is something the committee should be thinking about when we talk about data. All of the data that we are talking about, health data, is inherently spatial. It is all geographically based. Providing a consistent metadata framework, FGDC would be a really good place to look.
Second, you are talking about you had the portal open with no user name and password. About 15 years ago when we started making all this public data publicly-accessible for communities through our website, we all thought, okay, we don’t want any user name or password. Just let it be free, and we don’t know who they are. They can track it through some basic Google analytics.
Over the past year, with community comments evolving, and understanding that you cannot have an Amazon.com approach because there are so many different kinds of customers, having the one type, you are not able to meet their needs. They look at data very differently. We are working with RWJF around four arch types. Strategic planner, an action taker, a researcher or a community organizer. That is really from a community lens.
Our theory of action is really what is their entry point. It is not oh, I need data. This is the issue I have, or this is what I am tackling, or the story that I am telling. It is really interesting to see how the curriculum that we are building around these different arch types are really helping understand the needs and their pain points around access to data.
Datapalooza, I think I mentioned something on a call last time, that I was really concerned that it was not really a forum for the public sector. There were a lot of vendors there, and it became much more commoditized. I felt that, in terms of all these communities that have little or no resources, datapalooza is a great opportunity to have more of a public good presence, in terms of a helping communities understand that data that is available to them, rather than vendors basically creating products that many of these communities cannot engage effectively.
The last point is, as I am listening to the work, you said disruptor up there, so I am just going to just do this. We have all been talking about data. I have lived my career in data. Then, I am thinking about an analogy. For example, we are all experts on carburetors. When was the last time anyone opened the hood of their car to look at the carburetor and all the other parts?
We are all around, talking about carburetors, and we are going to liberate carburetors, and everyone is going to be good to go. The thing is nobody wants just the carburetor. They want the car. They want a dealership. They want to have choices.
In some ways, I feel more and more, as I work with communities, that we are not speaking the same language in terms of where we are meeting them. We are trying to get them all carburetors or pieces of an engine, and they want something that drives down the road.
MR. SAVAGE: Your question about what communities need and the conversation around the table provoked a memory for me. I suspect if you went around and asked different communities what it is that they need, you would hear a bunch of different answers, depending upon where you go. Some work that I did in California, we faced exactly that. We were trying to use $50 million to help bridge the digital divide. This was about a decade ago.
Rather than starting to do a grant-making program, the first thing we did was we actually seeded some focus groups in different regions of California, and asked the communities what they thought technology meant to them, what they thought they needed from it. There were many different answers.
We used all of that body of knowledge to help frame the grant-making at the beginning. We took it, and we applied what we knew about grant-making to say, well, some of that, we just really can’t do anything with. Some of these other things, yes, there are ways we can meet those needs. They might not have known that we could meet those needs, but we can figure this out.
I suggest that, if it is not already done, something like a focus group approach that actually goes out and asks in different parts of the country, what is it that you need? The way we did it, to make it simple and easy, is we actually found non-profits in different communities who, in turn, convened people around them that they knew, and they knew would be good resources.
We just asked the question. It was inexpensive, and it cost us about $300,000. It was the basis then for getting things out that really did meet their needs and respected the fact that there was a diversity of answers to your question.
MR. DAVIS: Thank you for that. That is an interesting example of how we might go about this. I appreciate it.
DR. CARR: This really is great. This is the way we thought we work best. Having you here is really good. We will return out attention to a letter that would go officially out. I think it is very much informed. We can kind of retool it to a line with sort of awareness of your strategic plan to help amplify what you are already trying to do. If you want to come back later, Adam Dole is going to speak to us shortly. We will return to the letter. You are welcome to come back and joint us. Otherwise, this has been most informative. I appreciate it very much.
MR. DAVIS: Thank you for the invitation, and I will appreciate the conversation.
DR. CARR: Do you want to just tell us a little bit about what you are doing?
MR. DOLE: Thank you for the kind invitation to be here today and speak to everybody. Damon is always a tough one to follow. I find myself in that position, so I am getting used to it.
My name is Adam Dole. I am the current presidential innovation fellow here at the White House and working with Health and Human Services this year on the Blue Button Initiative. I am about seven months in of just a 12-month term right now. Prior to coming to D.C., I was based in San Francisco and was an entrepreneur in residence at the Mayo Clinic, and was responsible for helping to spin up new businesses and start-ups on their behalf, and manage some new venture activity on the commercial side of Mayo’s business.
When I was asked to present today specifically about Blue Button, it was really a focus on how we are engaging the community using different tools and mechanisms and approaches. Mostly convening opportunities that we have here in the government, but also some open source platforms and social media, which might provide some insight into how we could leverage these tools, four-year programs and projects, where they are.
I definitely want to set the context that I am not a social media expert. I am not a developer. My background is in research and strategy, helping organizations figure out where they should be going next, and identifying opportunities for growth. Doing that from consumer unmet need standpoints, so really focusing on what the needs are, rather than what the solutions are, and retrofitting back to a need. Figure out what the needs are and design sustainable systems that actually can have long-term growth and opportunity to help people’s lives, especially in the case of Blue Button.
With that, I will kick it off. Feel free to interrupt at any point with questions. I definitely will leave room at the end for plenty of questions, as well. If I am saying something that doesn’t make sense or you want additional clarification, by all means, feel free to interrupt.
I broke the presentation up into a couple of sections. I am going to spend just a couple of seconds giving everybody a quick background on Blue Button so that we have some level setting in terms of what Blue Button is, where it is going. Then, really talking about the three chunks of the presentation being community-based collaboration, leveraging open source platforms and social media. Then, we will have time at the end for questions.
The purpose of the Blue Button Initiative is to empower everybody in our country with access to their health records in an electronic format. This really started out of a side project at the V.A., that I am sure many of you are familiar with, probably much more so than I am. About six or seven years ago, and it is quite insightful in hindsight because we struggle to get a lot of health care organizations to recognize the importance of letting their patients take their data outside of their system, but the V.A. recognized the importance of vets being able to take their records outside their system because a lot of the vets were going outside of V.A. health systems for their care.
Whether they were living in rural areas and didn’t have access to a V.A. clinic or just more broadly had specialists that were outside. I think it was like 60 or 70 percent of vets were doing this. Good on the V.A. to recognize that instead of trying to hold their records as if it was their own intellectual property, let’s give them their records in a format that would actually allow them to share them.
Six or seven years ago, it was an appropriate assumption to think that they could print out records and bring them with them, which is what they did. They actually created a Blue Button on their portal that allowed a vet to download their records and print them out, and bring them. It has been a wild success. Since then, Blue Button has not only become a part of HHS and ONC, but is also a part of the broader My Data initiative at the White House, which is a broader initiative to empower all Americans with access to not only their health records, but their energy consumption data and their financial aid records from student loans.
Blue Button has really taken off. What started as a federal initiative is now a public-private partnership with the health care industry. The vision of empowering people with their electronic health records, getting people the right information at the right time, is a big vision that probably couldn’t have been possible even a couple of years ago. Literally today, and the coming months and years, I think it is now possible. We are in this perfect storm where people are walking around, almost everybody is walking around, with a super computer in their pockets. Every experience we have could be a connected experience because we have Cloud-based technologies and the cost of centers are so low.
We have also got policy working on our side with things like Meaningful Use II, and the CLIA regs that just came out that are empowering consumers and patients to really not only just be able to demand access to it, but have records flowing more freely than ever before. This is a really exciting time. What is possible today literally couldn’t have been realized even a year ago. I think we are still at that early stages of this perfect storm. Over the next 6, 12, 18 months, we are going to see a lot more data liquidity than we have today. I am very hopeful that data is going to be liberated and flowing through consumers.
The data that we talk about when we reference Blue Button is a growing set of data coming from different sources. Obviously, the meaningful use two providers, or eligible providers out there, are required to do things like allow their patients to view, download and transmit. The data that is in those records is just one slice of the overarching health record of an individual. There is a lot of data that you see here represented on this slide that isn’t necessarily in the structure document that comes from an EMR.
You could argue that there is a lot of data not on the slide, like patient-generated data, when we are not in the doctor’s office or we are not getting lab tests. All of the data that we are collecting about ourselves, quite frankly right now, is not part of the health conversation in a meaningful way. I think over time, we are going to start seeing more of this data become structured and standardized. We are working as part of Blue Button to work through all these different data sources, and actually have them be part of Blue Button portfolio, and actually get this type of data to flow through consumers’ hands, through portals and different interfaces, which we can talk a lot more about.
MS. BRADLEY: Adam, can you tell them about the pharmacy?
MR. DOLE: We just had some pretty big news. About two weeks ago, the White House and HHS announced that the retail pharmacy chains, like Walgreens, CVS, Rite-Aid, Kroger and Safeway specifically, are now not only supporting Blue Button publicly, but are going to be structuring consumers’ and patients’ pharmacy history records in a standardized structured format known as Blue Button Plus, which is something that we can get into in just a few slides.
What that actually means for people is that people will be able to not only access the record, but move their records into apps that can help with things like medication adherence, dosage management, drug interactions, things that cost the health care system an exorbitant amount of money that are actually preventable, and really have a major impact on quality of life for people who are taking a lot of meds. The clear regs are focused around the diagnostic medical labs.
I am sure many of you know about this, but labs were not able immediately to deliver the results directly to patients. With the final ruling that just came out from HHS a couple of weeks ago, it is now not only legal, but they are encouraged to do so. We are going to be working with them kind of at a data source with a specific focus on that vertical to empower them with technology and help them understand what this rule means for their business, their bottom line, and ultimately their patients, and help them come up with solutions that empower their patients.
However, we all know that the health care system is siloed. A lot of these portals that provide people their access are done in a very proprietary way. While a lot of organizations, big hospital systems and small providers, are doing great work in providing this portal, it doesn’t always allow patients to move their record out of that portal. That is for a variety of different reasons.
Sometimes it is a technology challenge. Sometimes it is a resource challenge. Sometimes it is just a political and philosophical challenge that provider still knows best and provider still owns your record. No matter how much access we give to you, we are not going to actually give you that much access. That is kind of the message that a lot of the current proprietary portals out there send to the market, whether they intend to or not.
With Blue Button, we talk about this as a broad movement. We recognize that printing out records, or just having that access in a proprietary way, is not good enough. With what we have access to today with the technology, it is pretty much only acceptable to imagine patients being able to share their records across the system with providers that they choose, third-party apps that they deem important themselves, caregivers. The use cases go on and on. I think nobody would argue that is an important thing to enable patients to be an equal member of their care team through doing that.
We created Blue Button Plus. Blue Button Plus is really about the portfolio of technologies and standards that enable that structured transmission of the data. When we talk about VDT, Viewing, Downloading and Transmitting, Blue Button Plus really addressing the T part of that. We have got a couple of technologies that transport data, whether it is direct protocol, which operates a lot like email does. The sender has a direct email address, and the receiver has an email address, and they can send health data. APIs, kind of the future of where we would like to see the industry go. It is how the rest of the world operates on a technology perspective in terms of data exchange, not necessarily in health care as wide today. When you want to connect an app to Facebook, it is kind of a click of a button. You don’t need to worry about email address and those sorts of things.
We have got new standards in place for plans in EOB information, as well, that we would like to see the payer community adopt. This is a very important part of the Blue Button story. It is where we spend a lot of our time.
We do a lot with the community. The team working on Blue Button internally are just a few people really. For this to work, we are talking about the entire health care industry, we need the buy-in. We need the support. We need the horsepower from the community.
One way that we do this, and I am sure many of you have heard of the standards and interoperability framework, which is an initiative that we host out of ONC to convene the community around what these standards and approaches should be that can actually go through a formal standards body, like WEDI, to get things so that they are more normalized. We don’t have a lot of one-off interpretations and proprietary ways of doing things.
This is a very important group. In fact, this is the group that works on a lot of the Blue Button standards, like the direct protocol standard, the rest API standard that you saw on the previous slide, and the claims information. We look to this group to bring the community together, hear what use cases the community needs to be able to make available, and what challenges they see with making those available at a technology level, and then going through the WEDI process, which is a much more involved process that takes some time. In the end, it is a consensus-driven process which is really important.
Just as one example, I have a lot of examples of things in here. Most of the slides from here on out are just examples of things that we are doing to support some of these activities. The Blue Button Plus standard came out of the SNI framework. We had close to 70 organizations that actually contributed to that standard. You can see some payers, some providers, some vendors, a lot of the different stakeholders.
The Blue Button Plus standards then got translated into an implementation guide. This is an open source. I am going to address open source in a separate section. This is all open source, so developers can go and contribute to it. Essentially, all that body of work that went on in that SNI framework group is now represented in a roadmap basically for organizations that want to be Blue Button enabled, meaning they have data that they want to structure in the proper way. Then, if you are a data receiver, you are an app, you want to receive that information, then there is a set of instructions, step-by-step, exactly how to do that. We call it the Blue Button Plus Implementation Guide.
We also have a website called SITE, which is the Standards Implementation and Testing Environment, which is another open source sandbox, if you will. This allows developers and folks that are trying to implement this stuff a way to see if they are implementing it correctly. There are some tools in here that basically allow them to see if they are doing it right.
One example of those tools is the CCDA scorecard. The CCDA is a structured document for health information that is required to be Blue Button Plus enabled. Not all people implement it correctly. You can actually test it out to see how correct it is or where the mistakes are. Developers find this really valuable because if you get a score of, say, 75, it will show you exactly what is wrong with it and how to actually improve it. If it is obviously not deemed good enough, it won’t be interoperable, and it actually will kind of break the system and it won’t be effective. This is just one example of many tools.
The government did not actually build this tool. Somebody in the community did. He happens to be working on a contract with Blue Button. He saw a need for this and just did it. This wasn’t somebody who was sitting in the government walls, being asked to do this.
Another tool is something that takes the structured machine-readable data and actually makes it human readable. This is quite interesting. This is kind of what the code looks like for structured health information. Through another open source tool that somebody developed, they changed that. It is some HTML. It is all code that anyone can borrow.
You can basically pull this code in, and run your health information that looked like the machine-readable stuff through it. Now, all of a sudden, you have got something that is not only human-readable, but something that organizations can use a starting place to think about how we should be designing these interfaces to engage people at a higher level.
We have also got a growing community of organizations that are part of the Blue Button Pledge program. The pledge program is a voluntary program that signifies your support, whether you are a provider, you are a data holder, you are collecting information and you believe that your patients should have access to it. Or you could be a data receiver, you are an app, and you believe that this is the right thing to do. Or you could be an advocacy group, like AARP or a number of the others, that have pledged, that really just want to get behind this in promoting. We have got over 500 organizations. This is a growing movement to support the work that we are doing. They basically get on our radar, too. By signing the pledge, you are now on our radar. We bring you into a lot of the conversations. It is a way for us to convene a larger group, as well. We have got a bunch of other tools that we can provide them, so that they can go out to their audience and promote this.
PARTICIPANT: Adam, how do you keep track of all those different tools that people build?
MR. DOLE: Those tools are built and managed on certain sites. We are going to get into some of what those sites actually are that help us manage them. Sometimes it is challenging because there are new ways to build these tools all the time. They might not fit nicely into one of those sites that help us manage it. If they are open source, anyone has access to it. If you don’t know about it, it is really difficult to go have access to it.
Oftentimes, there is a little bit of insider baseball language associated with getting access to it that only developers might know, or only people who are implementing this stuff might now. We have to, I think, do a much better job about not only articulating what tools are available, but how to actually use them if you are not a developer or if you are not part of the group that designed them.
DR. SUAREZ: What are these tools for that you are talking about? Are these tools for consumers to be able to use the data that they download? What are these tools.
MR. DOLE: One example of the tool is to be able to test whether you are doing things correctly, if you are implementing Blue Button Plus and those sorts of things. I would say there is a whole set of developer tools. Then, we have also got things for providers to offer to their patients to communicate the value of having your information and other sorts of things that could be valuable to other stakeholders, not just the developer communities.
I will talk about one tool that we are developing called the connector, which is going to patients or anybody in the country, to all the organizations that are making health data available to them. Think of it like an online directory. This is something that we are developing in-house, but open source. This is something that we are taking advantage of a product called GitHub, which is a way for developers to develop in a collaborative way. In the past, software developers would sit in their siloed organizations, develop proprietary code, and generally not share in the open source.
As open source became more popular, we recognize that the value oftentimes wasn’t in the code itself, but in how the code is being implemented and being shared, and being built upon. GitHub is an open collaboration tool for developers that can actually borrow code from one another and collaborate on each other’s projects. The connectors is a project that we are developing on GitHub right now. You can see here an example of how somebody might interact with the connector.
They would search for what data source they want to see if their organization is participating in and providing data. You can drop down to your provider. Your click on it, and each provider organization has their own profile page. It articulates what data is being made available and how individuals can access it at a consumer patient level. This is a tool that we are creating that is not geared for the developer community on the front end, but really a way to educate consumers about what health data is being made available to them.
However, the data that we are collecting to populate those profile pages is another interesting example of how we are using open source. Let’s use Kaiser. We are collecting what information Kaiser is making available to their patients, so what data fields. What standards, what format, what transport mechanisms, all the things that developers need to know in order to build products for that data.
It is oftentimes one-oft conversations that developers have to have, in order to understand that information. That information is not proprietary. It is just no one has collected it in any single place. Because we are collecting it and making it available through an open API in the backend, developers can actually consume this data, and do whatever it is they need to do with it. They could sort through it and say, how many providers are actually providing the CCDA. How many of them are using direct? Then, they could start targeting what products they should be building to support the market that they have in front of them, which has not been a transparent one until now.
GitHub is one place that we use to put a lot of our tools, Lily, to your question. This is just showing you two of the projects that we have on GitHub right now. There are probably about 10 projects total. This is an older screenshot that says we only have 10 members. Anybody in the world could become a member of our GitHub community and contribute to the code, and take the code and build products that are better than anything we could do, and make the entire experience more collaborative around software development.
It is a fairly disruptive thing for the software industry. I don’t like to use the word disruptive a whole lot, but I think that GitHub is probably one of those examples in recent years that has changed how business is being done in a private sector. Now, it is changing the way that we are actually doing business from a government perspective, at least with Blue Button.
MS. BRADLEY: Anyone can become a member? Then, who owns the repos? When you suggest changes and accept them, who is in charge of that?
MR. DOLE: There is an owner of a project, and that owner can assign other owners to projects. The owner of the project is generally the one probably creating the most code, putting things out there. They can request that people contribute to certain things. Some people are surprised how many people want to contribute, and other people haven’t necessarily pitched their story well enough to get a lot of collaboration.
When anybody collaborates, if they are not an owner, they have to basically submit a change, for example. Then the owner accepts it. It is not like anybody can just go in and start messing with your code. The next thing you know, your website has changed. It goes through a review process through the owner essentially.
MS. BRADLEY: Do you have any liability?
MR. DOLE: I think the general philosophy is that things that are developed in the open source, I don’t know from a legal standpoint, but I don’t think that you are really covered under any sort of legal framework there. It is pretty much up to the community to use it under kind of best practices and not do harm with a lot of this code. Again, for our projects, we have an owner associated with these projects. The general community is not just going to contribute and have access to patients, for example.
Another way that we use GitHub is we have sample patient data on there. When developers want to actually see what happens when their app consumes data, we have got 40,000 patient records that anybody can tap into. These are machine-readable.
This is a huge resource because I can tell you, coming from the Mayo Clinic, how many entrepreneurs wanted to get access to our patient records just so they could test, and not even touch patients. They just wanted the records. Now, they can actually have access to synthetic clinical data, which is really important.
Another way that we use open source is probably more of an open innovation process. We do a lot of challenges to the community. There is a lot of crowdsourcing that goes on in Blue Button because we are not here to design things for ourselves. We want the community to design things for the community.
One of the challenges that we put out about a year ago now is called the Code Design Challenge. This was an effort to have patients collaborate with developers. The patients would provide the unmet need and talk about some of the pain points that they have. The developers would understand that unmet need, and try and develop solutions to fit.
What was really fascinating was we had over 3000 engaged patients that contributed. Anybody could contribute an idea, an insight, an inspiration that would lead to actual ideas. We had 80 patient-generated ideas that came out of that. Thirteen apps were actually created as a result. We had a lot of votes, and that was pretty exciting. Here are the three winners. It is just fascinating. I won’t go into exactly what the apps were, but here they are. One of the first examples of how patients have co-created solutions that worked for them.
DR. MAYS: When you are on and this gets designed, say like for instance, do the patients then get this? Are the people who created it able to go off and be entrepreneurs, and make money on it? What is the community side of it like? Do they get it forever, for free?
MR. DOLE: In some cases, the patients just like to contribute. They want to have their voices heard. In other cases, they are brought into that entrepreneurial team as a cofounder and somebody who has a lot of insight into where this product should go. Now, whether they get equity in the product, that is personal, kind of a proprietary decision made for that team. They might have access to the product for free, for example, if they were even charging for their product. A lot of these apps are for free anyway, that are being made available to just about anybody that has a device machine that can use it.
I think it is a bit of a self-selecting model there, people who are really interested in this. We have seen new companies get created as a result, which is really exciting. Obviously, the more data that is flowing, the more entrepreneurs are going to come into this space. Right now, the best entrepreneurs do not work in health care. We would like to change that because I firmly believe that some of the best solutions are going to be designed by people outside of the health care industry, with collaboration from the health care industry.
DR. FRANCIS: I was going to hold this until you talk about social media, but I thought I would just raise it right now. The tiger team to the ONC policy committee right now is looking at view, download and transmit with respect to the question of identity-proofing and authentication. Not only for patients, but for patients’ personal representatives who have the same rights under state law, under HIPAA, that the patient would have.
Actually, the whole question of identity-proofing and authentication of some sort, whether in person or not, are huge issues about Blue Button and view, download and transmit. It is super important to make sure that the person who gets into the record is either the person or the person that the person has appropriately authorized to do that. There are going to be some recommendations to the ONC policy committee about dealing with identity-proofing and authentication.
It is a whole new ball of wax if there are multiple ways of getting into the same space. The kind of in-person or other sort of authentication, and prior to that, identity proofing, that Medicare uses for Blue Button is totally different than Facebook. I can set up a Facebook account in your name. It is actually this linkage stuff, as well as transmit stuff.
There is a huge set of comments, including from industry, on the tiger team website about the new dimension added by transmit. That group isn’t even talking yet, but probably should be about that it is not only that I take my data, and I say it is okay to transmit it there, but that I create a whole new form of storage repository with a whole lot of change potentially coming into it. I don’t know where any of that is going to go, but I can tell you that is currently on the plate of HITSP.
MS. BRADLEY: Do developers talk through these kinds of issues and these communities? Is that part of the game?
MR. DOLE: Internally with Blue Button, it is one of the biggest topics. Whether we are talking about developers or policymakers that are contributing to Blue Button, it absolutely is. I would say the entrepreneurial community has not recognized a lot of the same issues because they haven’t partnered with the health care system yet. They are kind of off on their own, creating something is engaging to somebody who could be using it, who might be a patient one day.
I think the big elephant in the room is for us to actually get the data flowing to those apps that can really help people engage in their health care and coordinate health care better through access of their data. We just haven’t run into the roadblocks yet. As you mentioned, this is probably on the next biggest thing that we have to understand.
We can create all the great apps in the world, and if data is not flowing to them, and if trust hasn’t been established with the people who are setting up the accounts, none of this works. It can be quite dangerous. This is a huge issue, and something that we are really excited to start tackling. This is kind of where technology meets policy head-on. It will be really exciting.
DR. FRANCIS: All I am saying is that it is the elephant in the room. I don’t like to think of it as a roadblock. I think these are all enormously important tools. They need to be designed. FTC used the term, privacy by design, a couple of years ago for various kinds of internet and so on. They need to be designed with the kinds of concerns that the tiger team is talking about right now on the ground just as much as the wonderful excitement of them.
MR. DOLE: With a lot of the advancements in technology, we can actually get much more granular control over who has access to what data. This whole movement is powered by a lot of advancements that are going to hopefully increase security, and make the privacy more transparent for people. We are excited to put more of that in the hands of patients as they have more access to their record.
DR. SUAREZ: Certainly, there are unique elements of this authentication and identity verification with respect to the three elements. We really wanted to separate the three elements. The element of view, the element of download and the element of transmit, I think the three need to be separated and separately discussed. Certainly, there is a lot of important differences between someone that is simply accessing the record on behalf of a patient to view it, and then someone that actually has the ability to download. Then, there is also the question about view in terms of a view tool. You can actually do screen captures and things like that, that then create a permanent copy of that extract. Even though it was a view, it was not a download. There are all sorts of interesting technology issues around that.
My question really actually was about this point. In some place here, we invite designers and developers to redesign the patient health record. My question was about the concept around this. We use, of course, electronic health record, electronic medical record, and then we use something called personal health record. Is this getting to personal health record? It is getting to actually redefining the electronic health records or both?
MR. DOLE: I would say it could be both because one of the requirements that went out for this was that they redesigned it for the CCDA. Assuming that an organization is structuring their information with a CCDA-structured document, it could be used for that. If a PHR wanted to consume that and represent that on the front end, that could be used for this, as well. We actually are working, and it hasn’t actually happened yet, but the winner was going to get piloted with the V.A. That is where it would actually be the patient-facing interface for the MyHealtheVet patient portal.
Now, it hasn’t been implemented. We don’t know if it actually will. That just kind of gives you an example. It has the potential because it was designed in a way that could consume the information that is being generated inside of the hospital, not just on the consumer side of things.
This was a really interesting one that happened last year. We had over 230 submissions. Some of the submissions came from some of the best design firms in the country that would charge clients over $1 million to do a project with. Whether it was method design or Hot Studio in San Francisco, these are some of the best interactive design studios in the country. We had 230. Not all of them were as good, but it was an incredible showing.
This was supported by both federal partner, with V.A., Rock Health and the California Health Care Foundation. This was one of the bigger challenges that we actually had, and this was last year. This is just an example of the winner. Again, you can see, one of the criteria is that it was all open source. Anyone could contribute or borrow this design for their organization.
Lastly social media, and this is the briefest section that I have because I don’t think that we actually use social media as well as we probably could. We use Twitter fairly well for government agency, specifically with Blue Button. It is interesting. We have a couple of different handles. This is part of the problem. We don’t have a cohesive social media strategy. I am by no means the social media expert here. You have different projects starting to develop their own Twitter handles, Facebook pages. It creates a lot of fragmentation, and it is not always great.
I do think that Blue Button has started to use the Blue Button hashtag fairly well. I just pulled this data last night for the last 30 days. We have had over 3 million impressions with the Blue Button hashtag. That is actually kind of cool when you think about most people don’t know that Blue Button even exists, and yet, over 3 million impressions.
It is interesting. I am fairly active. You can see here, in the last 30 days, I tweeted 10 times, was mentioned in 91 tweets. For me, it is interesting to see because it keeps me coming back to Twitter to see what people are talking about. I would say every usually like once a week, there are a couple of tweets in there that have nothing to do with it. It is just like people use Blue Button to use something else.
When you go, it is really interesting to watch the stream. It is a lot of the community that we know. In fact, you can see here. A lot of these folks are internal to government. What is exciting is that a lot of them, like I would say half, are not internal to government. They are part of the community. I think as this community grows, we are going to see things like 3 million impressions is low. We are excited that this project has been able to kind of garner that level of attention. It is just really easy to have brief communication with a broad community on Twitter.
I don’t know how many of you are actually on Twitter personally or with your work, or with your projects. It is a great way to begin a conversation. Somebody described to me Twitter is kind of like picking up a phone that has got a million different conversations on it. Your can basically dial into any one of those conversations, listen, contribute, participate. Those are conversations that you would have never known existed.
I don’t have to sell Twitter to you guys. I think they do their own job about marketing their own value proposition because it is fairly useful and very lightweight for projects like Blue Button.
DR. SUAREZ: What was going on February 7th?
MR. DOLE: That was when we actually announced the partnership with all of the pharmacies. I think 3 million might be obviously more as a result. That also, I think, helps. When our social media folks at ONC see this, they are like, oh, when is the next announcement? When can we do more? Then, you start getting internal buy-in for things that might otherwise have been a difficult conversation to have. People that don’t typically talk about social media are now talking about social media, which is kind of neat.
I think this is the last example that I have. We did about a year ago, I believe, a YouTube competition for people to submit their videos about how Blue Button or access to their records impact their lives. I don’t actually have the video clips here. Some of them are not good at all, amateur video, people just talking, not good quality video, but good messages.
I think the point was to start surfacing people’s real life experiences with having access to the records, how it has changed their lives, how it has maybe saved a loved one’s life. We have since been able to catalog some of those stories, and we highlight them at different events. We bring them to share their story among others.
Those were some of the things that we are doing. Obviously, we are doing a lot more, but I wanted to highlight the things that were really about kind of open collaboration, open source, social media collaborating with the community using different types of tools. I am happy to have more questions. Feel free to take down my information.
DR. SUAREZ: How much have you looked into what other countries are doing with respect to this? I know there was an interest from the U.K. to do some of these. They actually started a project to work on it, but apparently there was something about the project that they stopped or whatever it was. That is one question.
The other interesting one is, here we are so happy about the CLIA announcement recently to allow laboratories to report directly or share directly the results with the patients. It is interesting that many other countries have done this for years. In fact, when I go back to my own country of Columbia, and I have an x-ray, before I leave, the x-ray lab, they give me a DVD with a playable presentation of the actual images that I can share with any of my providers.
In fact, in a country like this, it is actually the responsibility of the patient to go and pick up their results from labs, in order to take them to the provider, not the lab sending it directly to the provider. There is an interesting kind of perspective when I came to this country and started to learn about CLIA restrictions. Anyway, I just wanted to hear your perspective on the international arena around Blue Button.
MR. DOLE: We did just announce actually at the ONC annual meeting, that I am sure many of you either had heard about or viewed online, or attended in person, we had a signing of a MOU between Secretary Sebelius and the U.K., that they are going to basically not just adopt Blue Button because I think that is just kind of too broad of a statement, but start thinking about how they can enable structured machine-readable data in the standardized format to be able to flow into third-party apps, and be shared more broadly across systems.
Obviously, they have more of a centralized system. There is probably less fragmentation internally. That doesn’t necessarily make it easier for people to share their records with loved ones security or with third-party apps. We are really excited about that.
I think generally we, as in the United States, don’t do a good enough job learning and borrowing what is working in other countries. I think we could always do better there.
I think there is a lot of insight that can happen when you look at even the developing countries, and how they have created solutions because they don’t have some of the luxuries that we have had. If we could bring some of those solutions in, and figure out how they could be applied to our under-resourced communities, we could have tremendous value to be getting there. I don’t think we do enough of that. It is great when we do it, and I think we could always be doing more.
Yes, I think that just looking at some of the other countries. When you look at the rankings of health care outcomes and value, I think it speaks for itself that we have examples to be able to learn from. I know there are a lot of teams internally that are working internationally. I haven’t been part of a lot of those teams. I can’t speak to a lot of their actual projects outside the U.K. one. I know there is a big effort to try and do more cross-collaboration and learning.
DR. CARR: This has been terrific and very stimulating. We spent some time talking about social media and sort of bridging the gap from the traditional, and taking it to the next level. I think it has been hard to find the examples we thought we would find. Obviously, Blue Button is really leading the way.
I think what we are going to do now is break, I would say until 3:15. Then, we are going to come back. We have some hard work to do to finish up on this letter. I have to say the two presentations this morning have been very broadening in our perspective. Thank you so much, and we look forward to hearing updates as you continue your journey.
MR. DOLE: It was a pleasure being able to share some of this. I am looking forward to continuing the dialogue. Feel free to email and reach out with any questions.
(Brief recess.)
Agenda Item: Continue Discussion
DR. CARR: Just framing a little bit of where we are, what we are doing, what we are going to do next. Yesterday, I put up this slide to try to speak to the full committee about where we are, trying to think about what is it that we have been doing for 18 months. If you look on the left and the right circles, on the left side is the data supply side. We have done a lot of thinking about HealthData.gov and the usability of the data. On the right, we have been thinking about the data demand, the users of the data, the communities, the developers, providers, what information they need.
As we have been doing that, we have looked at the data sources that we have and the traditional applications that are available. We also began the discussion about the innovative applications, and the codeathons, hackathons, datapalooza, the EMT initiative. We heard about the health data consortium, Optum Labs, Blue Button.
Even though we are sort of coming up on our two-year anniversary, it has been a bit of a discovery in terms of how do we best fit in. I would say today, having Damon here was the superb example of the role that I think we can play best. While we will continue to do discovery, and I think having the Blue Button here was probably a better use of our time to react to something innovative that folks are doing in bringing up the issues, as Leslie pointed to, the major issues around protection, that may be the best use of our time. Trying to work offline over the phone is hard, and folks have other responsibilities. Going forward, I think having Damon or someone from health data initiative here will be important for them, as well for us.
Secondly, I think continuing to have innovative uses coming forward will be valuable. Thirdly is the issue about having a work product. We have talked about a number of different things. We were going to do the framework around social media. What we found was that there were not enough examples to draw from, so it is a work in progress. It is probably something that we need to be thinking of. I don’t think that we have at our disposal the resources or the kind of in-depth analysis that would be required to develop the framework as we were talking about.
Where we are right now is a letter. We talked at our last meeting about putting together the observations and moving it forward to the secretary. Because we are seated in the NCVHS, anything that comes from the workgroup needs to go through and be signed by the chair of NCVHS, which is Larry Green.
We have gone through that exercise. We have been working on, as you think back, we had our initial letter that compiled everything that we have ever thought about. We had some discussions where we got to a point of saying, let’s pick three things. Pat Remington’s suggestion, I think, was wise. Among those things, pick the easy things that we can talk about. The three that we decided on were the timeliness of data, the metadata and granularity. That was of particular interest of many of us, to get to data that is below the county level.
As we put that letter together, when it came to the full committee, the discussion evolved around, as you get more granular, there are more obligations that come with the protection of the data, and to talk about going to more granular data without talking about those protections. People did not favor having one without the other.
With that, we had a couple of different options. One is to continue to deliberate and incorporate that into the letter. The second was to trim the letter, to say two things instead of three things, and at least move that forward.
What was very valuable about the full committee was that a lot of things came forward like 1313, and we heard about today from Damon about what data should look like going forward. I think one of the things that has come to our attention is that, while we can be very free-thinking, we also need folks who are grounded in what is going on, so that we are not out of step. We talked to Jim about that, in terms of how we get that kind of input.
What you have in front of you, I think, is the draft that was being worked on this morning. Then actually, Larry Green thought it best to just stop having the full committee micromanage, that we were kind of using the voice of the workgroup. The workgroup is supposed to be a little bit different from the full committee. We need to have the voice of the users and move forward.
Hopefully in the next hour, we could put this together in a way that will add value. I will say that, as I listen to Damon, I think some of the things that we were saying they have already thought about. I think there were a few things that came up that we actually could put in the letter, that were new to him and would be valuable.
I think we can do a quick glance at the letter. Then, I think our focus needs to be on what are the things that will add value for the health data initiative. Then, how do we say them. Once we do that, we will put that together, it will go the executive subcommittee and the full committee, and it will get sent out.
As you can see, the two avenues are very different. If we can just have a sit-down conversation, it is a great opportunity to share ideas and build on ideas. When we are going to put anything in writing, it has to go to the secretary, and there is a long process that goes with it. This is how we do things.
Before we jump into the letter, any thoughts or ideas that came from what we heard this morning, Lynn?
DR. BLEWETT: It seems like there are some things that are federal rules and regulations that Susan or Debbie could make sure that we are meeting what we are supposed to, and not try to figure what those are, just that you will do them. I don’t know what they are, but I am sure there are many. We don’t spend time on those pieces.
DR. MAYS: Was your invitation for him to do what Jim does at each meeting? Like to come in and give us an update?
DR. CARR: Yes, that was what we asked. Why do you ask? Are you talking about Damon? Yes. We asked Damon to come in and give us an update.
DR. MAYS: Can he actually stay for the meeting?
DR. CARR: You mean like now?
DR. MAYS: No, I mean like in general. It is almost like he is our den keeper. I am trying to see if what the request could be is, other than just an update, whether or not this could go on his calendar. It is only four times a year, and he is there for the whole time.
DR. CARR: I think that is right. I think that we will be much more aligned. Actually, this might be a good time for me to mention that in June, I will be stepping down as chair of this workgroup. My esteemed colleague, Dr. Vickie Mays, will be taking the reins. We are going to overlap and kind of transition over the next few months. As we develop this, it is your show. You are right, I think that is right. Who will take that on to get that on Damon’s calendar?
DR. QUEEN: Actually, Jim mentioned to me just a few minutes ago that he was asking Damon to be a liaison to the workgroup. We will make sure that it gets on his calendar.
DR. CARR: I think today, he could see the value of it. I mentioned to him, if we need you, can you come back down, and I think he said yes. He is in this building, which has been a stumbling block. Going to Hyattsville was not something that folks could do. Other comments or thoughts?
DR. COHEN: NCHS is space planning, and they will probably lose their conference room at some point.
DR. CARR: I don’t know who had that influence, but I thank them. I am just kidding. It is difficult. I think for NCHS, we are now putting the hardship on them to come up here. We appreciate your efforts.
The next step, if we pull up the copy of the letter.
DR. FULCHER: I wanted to make sure about process. I was mentioning to Justine that I really wasn’t able to engage very well on the phone in terms of offering observations around the three areas, metadata, aggregation and timeliness, or granularity and timeliness. I didn’t know if that is appropriate to bring up now? Are we past that process?
DR. CARR: Well, I think that we made the decision not to try to tackle granularity because we would not be able to get a work product out prior to June.
DR. FULCHER: Thank you for clarifying that earlier. I am just wondering about this metadata question. There is so much going on around metadata and how we are improving on that. Damon mentioned it, that the FGDC processed around geospatial data. All our data can be classified under geospatial. I think there are some very good processes already in place.
I think the thing that was missing was, and I am speaking from a community perspective, just the plain English description to Walter’s point earlier. What is this data, who created, and when was it created? So they just have some very basic front end understanding of can I use this 2006 data with 2014 data? Well, no, maybe not. That was my sense of the metadata.
Other than that, the metadata, really there are a lot of good things going on already. The timeliness issue is inherent in the metadata fields. The timeliness and being able to have that when it was created, what is the shelf life of the data, those are all metadata attributes.
DR. CARR: You are right, whether it is really just one topic.
DR. SUAREZ: I agree there are a lot of good directional activities around metadata. If you take the 998 databases listed and even more so, and you actually evaluate how many of them do have metadata in the way that it is consistent with the core metadata schema, zero or less. Really seriously, this is an activity that we should push hard to make sure that every one of the agencies within HHS that maintain or create a data set, create this data documentation and metadata, those two parts.
If there is one thing that this letter can do, that will improve usability and usefulness of the data, is to make sure that all that data has good data documentation and metadata. Everything else, timeliness, of course the data source is going to determine how often and all those elements. The metadata is a requirement now. That is my perspective.
DR. FULCHER: My experience with metadata has been EPA data or transportation data. Just one data layer is 300 plus pages long just on metadata. I wasn’t aware of, from a health and human services side, where you are at with that metadata process.
DR. CARR: I think there are a couple of things that come to my mind that we want to do. If we maybe just work on pieces of this, and then pull it back together, and just returning to the metadata. What we heard yesterday is this, if you look at your observations, there are two paragraphs here. I am not sure. The first paragraph was part of the original letter. The purpose of that first paragraph under the metadata observation was data sets do not often provide enough information about the data for potential users to understand and use the data appropriately.
Then, it goes on give an illustrative example that shows that field names are not always intuitive, descriptions of key variables are not consistently provided or do not contain sufficient information. Descriptions may provide how data was collected, but not why they were collected. Provider types are not listed. The key variables are not generally or specifically described. Data documentation is not easy to navigate and is often provided on several webpages and a few separate files.
Then, it goes on, one of the downloaded files, zipped documentation files, is POS, record layouts for CLIA and other. I don’t know if we need that sentence, but in non-machine readable format. The text file defines for each provider type, variables are captured, et cetera. It goes on, and it kind of gives, here are all the things that we find wrong or problematic.
I guess is that a statement of what prompted us to make metadata a priority? Have we said that in a way that is valuable?
DR. COHEN: I don’t think so. I agree with Chris’ point of view. There are two issues around metadata. This is certainly one issue. The metadata that is required is not available and uniformly complete. The second issue is more of substantively to make this data. Since the title of this letter is usefulness and usability, to make this data useful and usable for communities, there needs to be very simple descriptive language that describes the contents and how to use the data. Those are the two metadata issues. I think that should be the in the first observation in this section.
DR. CARR: When you say the required metadata is not available, the 1313 memo says going forward, this is what you have to have.
DR. COHEN: However Walter feels comfortable making that point about the metadata being available for HHS data sets. That is one thought. The second one is simple plain language. If I a community group, and I want to know what is in the National Health and Nutrition Examination Survey, somewhere it needs to say this is an annual ongoing survey of 2000 people. They take biological measurements, and here is how you can use the data.
DR. SUAREZ: I totally agree with the two components. That is why I recommended to insert data documentation separate from metadata. Indeed, metadata is the actual data structure and coded elements. Data documentation has two components. One is the vernacular description of the data in simple terms of the data set and data elements, and a whole bunch of things about the data, in a way that a user can understand it. That is one of the data documentation components, a vernacular description of a data set.
The second component is really the data dictionary, which is a much more technical refined description and definition of each of the data elements containing the data set. Some people think of it as part of metadata. In the conceptual terms, that data dictionary itself is a separate document that describes truly the definition of the data element. Those are two separate elements of one component of data documentation.
DR. CARR: With data documentation then, the issue is —
DR. SUAREZ: Again, this issue has two major elements, data documentation and metadata. Data documentation itself has two elements, data documentation regarding the vernacular simple description of the data set, that is element one under data documentation. Element two under data documentation is the data dictionary itself.
Then, a separate issue is really the metadata. I think the recommendation is going to be organized really in a way that points to these two major categories and then the two elements within the data documentation category.
DR. QUEEN: For documentation, where would you put descriptions of the methodology, if it is a survey for the sample design, and then on the other end, where would you put information about creation of new variables or the creation of the weights, or how you use all these things when you are going to do the analysis of the data? Where would you put those in this realm?
DR. FULCHER: That is a great question. I think I heard you, where would you put that, is that right? It would be ideal if it was actually part of the metadata that you have at the very top, a data summary which is plain English. Then, you have the deeper metadata for people who want to delve in deeper, it is in the same place.
The community groups that just want to look at what is this, the year it was produced and who produced it, that brief area. That could be as part of the metadata. Metadata is data about data. I think that is good, rather than having it something else, it should be.
DR. QUEEN: Right now, usually like NCHS, they have put in their data documentation the whole description, the sample design, how the data were collected. Then, you have all the questionnaires and variables and codings. Then, you also have sort of analytic guidelines or technical notes.
DR. CARR: I am thinking about membership of this workgroup. Bill Davenhall gave us a very elaborate example that we initially had as an appendix. We took it out, maybe we need to put it back in, of what he would like to see, a data description at the top, sort of compare and contrast. Maybe can you pull that up, Lily?
DR. FULCHER: I sent Lily some examples of how we had the data in plain English summary. You had to click a hyperlink to the deeper metadata.
DR. CARR: You brought that up about how are you, as the user. Is that what we want to say? Do we want to say there is an array of things that need to be available and maybe accessible, according to who you are? Maybe that is getting too detailed.
DR. MAYS: Can I just offer? On this what you call a federal DG, anyway, they have a very interesting definition. It says what are metadata, and I like the description. Metadata records include core library catalog elements, such as the title, abstract and publication data. Geographic elements such as geographic extent and projection information, and database elements such as attribute label definition and attribute domain values.
Then, there is a statement before that, that just explains what it is. It represents the who, what, when, where, why, and how of the resource. I mean, that to me is kind of plain talk, something like that. I don’t know if we can just lift it.
DR. CARR: I think that is right. Damon was not familiar with that, as I recall. Again, as I am thinking about how do we add value to something that is not already in the repertoire, I agree with you. I think that description is good, and that it is from the FGDC is also good.
DR. FULCHER: It is not in the repertoire. I think so often we talk about health data. It is not in our repertoire to think about it spatially. All data is spatial. If we think from the lens of all data is spatial, then you have a backbone to build on to FGDC in terms of their metadata processes. It goes down to community, yes. In the GIS world, in the spatial data world, a lot of people are familiar with metadata, even those very dense. Then, you just want to elevate the English part of it.
DR. COHEN: I guess in this concept of metadatas, Susan mentioned a couple of things about how you use the data that might seem relevant. I think they are relevant, but they are not part of the classical definition here of metadata. I don’t know whether you have developed a way to provide guidance for use to incorporate into the data or whether that is a separate issue.
DR. FULCHER: What we did simply was went into the raw metadata and extracted just the fields that are basic, and provided a very simple summary for communities, which is what they used. Then, for the folks who are more technically savvy, drilled down to the actual metadata.
What we did is we just extract and elevated it. We may be doing that right here. It is just talking about metadata in its core sense, FGDC, spatial. Then, however people want to go ahead and elevate that, they can do that.
DR. MAYS: There is also on the website, a whole training to train people what is metadata and how to use it. You may want to also look at that. They have online training initiative, useable materials, available online lessons. Then, they ask you to provide feedback. This is FGDC.
DR. CARR: Give me the problem statement of our observation. The data elements are difficult to understand or something like that. What is the problem statement?
DR. COHEN: The problem is we need to provide all different levels of users with an easy way to understand what the data contains.
DR. BLEWETT: I think your first statement there, under data sets often do not provide enough information. That sentence seems to be what we are talking about.
DR. CARR: Data sets often do not provide enough information about the metadata for potential users to understand and use the data appropriately. Stop there. Do we want to use this illustrative example or no? It just seems to get in detail. We take that out, and our opening statement is data sets often do not provide enough information. Do we want to then say different types of users need different types of information?
MR. SAVAGE: It is helpful to me, at least, to understand the categories of data sets we are talking about and the ones that we are not talking about. If I am understanding correctly, we are sort of talking about aggregated data bases that HHS is making available.
My mind kept going back, for example, to the discussion of metadata in the PCAST report. I was thinking, so, you have got a letter like this that plops down in the middle of a broader conversation. Is there a connection? Is there not a connection? I am sensing that there is not. That is what brought me back to thinking it may be helpful to at least say, here is what we are talking about and where is what we are not talking about.
DR. CARR: There are a couple of things. One is that there is the old data that has been liberated. I think a key, very interesting observation was those PDFs could have metadata, as well, to just say what they are. We are talking about the data on HealthData.gov. It is old data; it goes back many years, a decade in some cases.
MS. BRADLEY: Could I clarify that that is the case, that there are PDF reports that contain data that have the same metadata as data sets on HealthData.gov?
DR. QUEEN: What I was interpreting is that you could have reports. The report has a title, when it was generated, by whom, what it is about and abstract. It may not have data within it.
MS. BRADLEY: I mean like ACF wise reports to Congress are on Data.gov.
DR. CARR: I don’t understand what you are saying. If there are PDFs, which I think there are, there are PDFs in the HealthData.gov.
MS. BRADLEY: They have to have metadata to be put up there.
DR. CARR: Was it Leah made the point that came out in a meeting last year, that even adding descriptions to things that are PDFs make it useful to an individual who has the desire and is willing to comb through it. Similarly, at HHS, I have had 1000 people ask for this PDF. This might be the one that we want to break down.
DR. COHEN: I want to get back to Mark’s observation, which I think is a really good one. If we are going to change the title of this to Steps to Improve the Usefulness and Usability of HHS online data, there is a lot of data online that might be aggregated. It might be individual level deidentified, and it might be individual level not deindentified, identifiable, that you can access ultimately through research data center or a variety of processes.
I think this letter needs to address the entire universe of HHS online data. Here are the things that all online data, regardless of how it looks, these are properties that it needs to have.
DR. CARR: Online data on HealthData.gov?
DR. COHEN: No, just online data. I think Data.gov is actually, in some ways, too narrow a focus for me.
DR. CARR: I think what we heard Jim say is that they have liberated, they have created this HealthData.gov, and they want more uptake. They want this group to help with that. Now, there may be other things that we comment on, but the compelling need is having sunk resources into this. Are we getting what we need?
DR. COHEN: Are all the data in the community indicators warehouse and HealthData.gov?
DR. QUEEN: I don’t think all the data sources.
DR. BLEWETT: They are supposed to be.
PARTICIPANT: What was your hearing about? That is what this letter is about, what you heard, right, when you went out to the communities and did your surveys?
DR. CARR: I think again, this is a very non-traditional group within HHS. Customarily, the subcommittees have a problem statement. They plan a hearing. They invite designated industry leaders. Those leaders answer those questions. They sit down, deliberate and compile. This group was not intended to do that. I think each of our meetings has had robust conversations. The unfortunate thing is we have not had health data HDI folks here.
It is hard. The hearing from the communities was piggybacked onto work that the populations and privacy group did hold this roundtable, and we were listening in. We have been capturing sound bites.
We have consistently heard, and it goes back to the first meeting that we had, where Josh Rosenthal, who couldn’t be here today, talked about the fact that as a developer, when he looked at data, he could not find entity relationship diagrams. He could not find data definitions, all of that. That is his experience. I think that is what we were trying to put this in.
DR. MAYS: Can I respond to Bruce? I am a little nervous to make it everything right now. I agree in principle, but I think that maybe the thing to do is a compromise, to talk about Data.gov, and then have an aspirational sentence to say we would hope that in the future that HHS data in general that is online could also meet these same criteria.
DR. COHEN: I think the last sentence of the first paragraph states, if you pull up the letter, HHS and other resources continue to grow on our various websites, including HealthData.gov and indicators warehouse among others. I think we can make it general, and then in the language of the letter say these are specific examples that we are concerned about. There is no reason for me, I guess, not to be more broad about HHS data online.
DR. MAYS: What about if we think of something like NHIS? Those are online, and I think there are some things that we will probably need a different blog.
MS. BRADLEY: It is a catalog. HealthData.gov is just a catalog. It is like MLM, PubMed, when you go to search for an article. Then you link out, and you actually go get the article where it is.
DR. MAYS: So we are only talking about the cataloging part? We are not also then talking the data?
DR. COHEN: We are talking about any online data set that is available from HHS should have these properties.
MS. BRADLEY: Exactly, but all of them are supposed to be on HealthData.gov.
DR. COHEN: They are not.
MS. BRADLEY: You should read through the implementation guidance because this first step required by OMB is just creating an enterprise data inventory. If you just want to know what we were asked to do, it is clearly laid out. There is a document.
DR. FULCHER: It is hard. We are kind of bouncing around. In some ways, I don’t know where we are at from an HHS standpoint on metadata and how it is being collected, how it is being put into a database. It is hard for me. Where do you go if we don’t know?
DR. CARR: Well, I think that part of this is that early on, we had a couple of phone webinars which were of minimal variable value. We had presentations also at the meeting where we saw things. I think we were seeing it for the first time. As we look at it now, when you back and you go to these sites and work with them, and see what you can and can’t do, it becomes clearer. I think we are missing a bit of that because we probably need a deeper dive, which we will fix going forward.
I think what your observation is that we are responding to something that we don’t all have the uniform experience of. I think that makes it difficult.
DR. QUEEN: I think one issue is that the data-producing surveys in HHS have been putting data out for a long time, and putting documentation out and data dictionaries out, they have been doing this, and they know how to do it. A lot of the more recent, I think, things that have been pushed out to HealthData.gov, they are putting the data out. In some cases, there is nothing. It is a real mix.
DR. CARR: Paul Tang had to leave to get to the airport, but he sent three bullets that are his observations. One, add plain language descriptions of data sets and precise data definitions. We all agree on that.
Number two, conduct to look for ways to streamline data capture processing and publishing of data sets to achieve turnaround time that meets each data set’s timeliness value. That gets into timeliness. We will get to that.
Three, explore facilitated ways of communities, helping communities identify their own pressing problems related to health, and then identify which problems would benefit from the available data. That is a new concept.
DR. KAUSHAL: We have heard from multiple people on just the usability piece. We have just focused on let’s release it and people will come. Is it too late to the game to add a section around how we could help solve that problem?
DR. CARR: I think we are stuck a little bit because we went down one direction. We heard some updated information today, and we have a better sense of what is responsive to what HHS needs.
DR. KAUSHAL: I do agree with the fact that Datapalooza is very entrepreneurial and industry-focused. Again, I have heard a lot from this audience and now from others that there are many more uses of this data. Those stakeholders just don’t feel it is appropriate, so I think this would be a great place to put that forward.
DR. CARR: Can you just say that a little bit differently? The usability of it?
DR. KAUSHAL: Yes, the usability. Improving usability for other audiences, other data. I have heard that theme now, I think, three or four different times in different context, and more in the public health and the community focus versus the private sector.
DR. CARR: I agree. In our original letter, that is what we talked about. I think that is what we heard. I think that maybe we had too much in the letter, but I agree that usability is something that we have heard both from content experts who can’t navigate technically, and technical experts who can’t navigate the content. We heard that loud and clear at the first meeting.
DR. KAUSHAL: I think we referred to a little bit around the metadata section, but we can pull that out, as well, and write a whole different section.
MR. SAVAGE: Are we distinguishing the metadata of the data set from the metadata of the data within the data set? The usability question sort of lifts that up.
DR. FULCHER: The metadata is the data about the data, so it is not the data in the data set. It is about how it was all constructed.
MR. SAVAGE: Data within a data set could have metadata itself. Coming back to the PCAST report, you would have individual health information that might have metadata about it. Then, you aggregate that data, perhaps in another data set.
DR. CARR: Are you talking about the letter we wrote?
MR. SAVAGE: The usability question actually brought that to mind because I suspect usability needs to be thinking about the individual pieces of data within a data set, as well as just understanding what the data set is as a global thing.
DR. KAUSHAL: I don’t think we went deep into this whole notion of meta tagging to allow the amalgamation of all of these individual pieces. I don’t know if we want to go that deep.
MR. SAVAGE: Just trying again to sift out what we don’t need to get into.
DR. CARR: We don’t need to get into that. You are right. That PCAST report, as I recall, was about electronic health record data, was it? That it was, did you get the race and ethnicity from the clerk, from the geneticist, or who did you get it from, that kind of thing. We are not going there. You are right. We are just trying to get a description of these data, what is the data set. Maybe let’s go to the timeliness piece to see if there is anything we can salvage out of this today.
DR. COHEN: I agree with Paul’s third comment. I think our learning of the last couple of days is if the title is usefulness and usability, I think it is a good summary to talk about usefulness and encouraging, whether it is HealthData.gov or all HHS data folks to focus on making the data useful to a broader community of users. That is a new learning for HHS.
DR. MAYS: We want the letter finished, right, by the time we walk out of here, or we don’t?
DR. CARR: I mean, we do, but the challenge is that it is evolving. I would say the letter as written has some pieces that are maybe misaligned, information that they already have maybe, maybe not. Perhaps I am too close to this.
DR. MAYS: I was going to say, I think you gave up. I didn’t want you to give up. I think if we go back, let’s actually try and write it. Then, if we have to give up, we can. The timeliness, we are going to get through a little faster.
DR. CARR: Should we start with that, so we will feel better? Take a moment and read silently, and tell me if you have any changes for timeliness.
DR. FRANCIS: Just a quick question about the first bullet point. Is there a unique such thing or is it really just a tradeoff?
DR. CARR: Could you read the bullet point?
DR. FRANCIS: For each data set, identify the age-tipping point after which the utility of the data declines. I am not so clear there is a unique point.
DR. SUAREZ: I was actually going to comment on that, too, because some data, the data stays and persists. You don’t want to change it. You just want the new data about the same variables, so that you can actually do it trending. The data from 10 years ago, you want to keep it. It is valuable.
DR. CARR: I think this came out of a recommendation, I think, from Bill Stead. I think the point was that some data is going to be valuable when it is timely, when it is very recent. If you wait a year, it is going to lose some of its value.
DR. BLEWETT: I don’t think you can say that overall, questions over data. I think if they have how often the data is updated, it should be sufficient. If it is an annual update, and you only have to 2005, you know that there should be more data coming, or it should be somewhere. I think it is more back to the metadata. I don’t know if that is a bullet point that just says identify how often the data is updated. I don’t think we could tell whether it is useful or not useful.
DR. CARR: Maybe we should stay operational. I think the overarching thing is there is some data whose utility is maximal early on, and look for ways to accelerate the availability of that data or to make data available that is incomplete, but directionally correct.
DR. FULCHER: An important distinction there is timeliness of data, 2000 data, 1990 data, I have data back to 1880. It is really powerful, looking at trends over time. That data still has currency from a context.
I think what you are saying is about some data has a very short shelf life. How do we improve the availability of that data? That is a different kind of a title than timeliness.
DR. COHEN: Mortality data is fascinating because having rapid data on flu mortality helps us track influenza epidemics. The heart disease rate hasn’t changed in 15 years, so I don’t need the 2014 data. The notion of data to evaluate the purpose of the data, it is sort of the coordination of the purpose and the timeliness. Some data you need immediately.
DR. CARR: So could you give me a different heading?
DR. SUAREZ: The same data has different values at different times. The same data for an immediacy of response has a very specific value and a very short shelf kind of value. From that point on, the value changes in terms of being able to be trend-type data. It is still valuable, actually very valuable.
DR. BLEWETT: Why can’t we just delete that first bullet and use the next one?
DR. CARR: Let’s think about how we want to say this. It is really time availability of time-sensitive data.
DR. FULCHER: I think Bruce brought it up earlier this morning. At the federal level, basically you are compilers of what is being brought up from the state. Is that correct? I think as I work with communities, they are often frustrated with the national data sets because of the lag time from it going from the state to the national level, and being made available in that format.
The communities are saying, let’s just go to the state. Let’s do our granularity reaggregation at the state level. It is bypassing the utility of the federal levels.
MR. SAVAGE: For whoever suggested deleting the first bullet point, I am sensing that the third bullet point, this notion of the life cycle for data, we sort of evaluate whether there should be something more recent, based on what a life cycle is like.
If you are measuring something annually, and two years have passed, you think, well, I want the trend. Can I get another year’s worth of data? If it is a much longer duration between the points that you collect, you wouldn’t necessarily be asking that question. I just wonder if that third bullet point has a kernel that we are looking for.
DR. FULCHER: There is so much, in working with communities, to say what is the best information that we have available to us. Then, what they invariably do, they get together as a group and they say, hey, Joe, what do we know about this issue and this area about obesity? It kind of moves away from data to other aspects of their tasks at knowledge of community. They use the best available, maybe two years old, three years old. Then, they use other ways to try and get it. We don’t have the ability to update data even more quickly.
DR. FRANCIS: My understanding is that there is also a question about so you have got, let’s say, 80 percent of the data in. Do you release it when you have got 80 percent of it, and then assume that the rest of it is going to populate it when it comes in. It seems to me there are at least three issues that anybody has got to think about that.
I go back to election returns on this one. You have got to think about whether it is misleading to only release an incomplete set, which it certainly is if you have got the urban precincts, but not the rural ones without indicating that. You have got to think about the costs of releasing and then updating, and you have got to think about the importance of having whatever you have got, which is your point, however incomplete. That is a complicated set of judgment calls that is going to look different, as Bruce suggested, for one kind of data than for another.
DR. CARR: It is being done, and we are drawing attention to the fact that it can be done, has been done, and in the right settings, it is a mechanism already being utilized to accelerate data. The questions that you raise are all very important, but perhaps go beyond the scope of what we are trying to do here.
DR. FRANCIS: I was just trying to be clear. I think what we are doing is recommending where appropriate early release. If you put in a qualifier like where appropriate or something like that, then there is at least something that says let’s think about what the problems might be associated with.
DR. CARR: I thought that that was covered in the sub bullet, releasing data when not complete, but marked as provisional, HHS should provide guidance on how data users should assess and interpret data quality. HHS should communicate clearly the meaningful tradeoff between timeliness and accuracy, and set expectations accordingly. Were you thinking beyond that? HHS should do so in dialogue with data producers and data users to understand data quality needs for timely decision-making. They should strive to use consistent language.
DR. FRANCIS: I just think there is a prior step to that, which is that there are points at which you don’t even want to release provisional data.
DR. COHEN: I think you do raise an interesting point. Up to that bullet, we are talking about final data sets. Then, we talk about provisional data sets. The previous issues around timeliness really refer to complete final data. Then, we are saying an additional strategy is, if you can’t get the final data set produced in a timely way, you should consider using provisional data if it is not biased and has certain attributes.
DR. FRANCIS: There actually are three issues here. One is the frequency with which you collect it. Do you collect it on an annual basis? Do you collect it in real time, whatever? The second is the one that you talked about, how soon you release it after you have collected it. Then, the third is, what do you do about provisional? Those are all different.
DR. CARR: The audience is HHS. The audience is Damon Davis. It is not in his span of control to tell cities and towns to capture data electronically. Should we even be putting that in there? It is theoretically within his span of control to say we are not going to wait two years to get 99.99 percent complete. We are going to do that, so at least it is sooner.
My question is, if we think this is a letter that will ultimately go to Damon Davis, do we even want to go there with the ways that people have accelerated data collection? I think it is, while interesting, not relevant to him.
DR. COHEN: This is a letter to the secretary, I think, not to Damon Davis. My point is, the reason why timeliness became such a huge issue for NCHS is because Tom Freidan pushed it because he came from a community that was able to provide data more rapidly.
DR. BLEWETT: Can we say something like encourage and support data collection by capturing data? Collection electronically? We are kind of working backwards. It is a role Damon could play and the secretary.
DR. COHEN: And evaluating the life cycle. I guess it is not only request, but support in the evaluation of the life cycle. States would love to be able to do that, if they had the resources to do it, I think, in some cases.
DR. CARR: Okay, this is good. Other comments about timeliness?
DR. BLEWETT: I guess on the last bullet, number three, how often is the data refreshed? Just to my earlier point, I would say how often is the data updated and posted.
DR. CARR: If Larry were here, he would tell you to use the plural verb.
DR. COHEN: For all those bullet points, I have the wrong verb. I have a singular verb. It is highlighted red in my copy.
DR. SUAREZ: Could you somehow where we came down, in terms of the bullets? We are saying bullet number one?
DR. CARR: The overarching recommendation is that HHS should find ways, or support or encourage ways, find ways to accelerate the release of selected data sets who is relevant, is sensitive to the aging of the data. Okay, do we want to say that?
NCVHS recommends that HHS consider the following approach to identify where earlier release is valuable and identify ways to expedite the release. I think we want to amend that. Actually, we have decided that we are not going to say figure out what is the tipping point.
Maybe there is only one approach. NCVHS recommends that HHS consider the following approaches to identify where earlier release is valuable and identify ways to expedite the release. Are we still asking them to identify where earlier release is valuable? Hold that thought.
The next bullet then begins, in collaboration with key stakeholders, providers and communities, identify high demand, high value data sets whose timely release is critical for urgent intervention, set a goal of a designated percentage improvement and turnaround time for the data set.
DR. COHEN: I like the first sentence, but I am not sure about setting a goal. A designated percentage of improvement in turnaround time?
DR. BLEWETT: Yes, so somebody is not getting their data in for three years.
DR. CARR: Take out that sentence? What do you think, Susan?
DR. QUEEN: That means that somebody has to come up with the percentage. How exactly are you going to, and who is going to determine it?
DR. CARR: Next bullet, support evaluation of the life cycle. Request that data owners map out and evaluate the entire data life cycle from data collection, evaluation, release, consumption and identify bottlenecks and strategies for addressing them. In this way, efforts can focus on the steps that are most likely to yield improvements.
DR. BLEWETT: I personally do not like this one just because it seems like a lot of work for federal agency people. They are already stretched for resources. What is a map, and how would it look? Then, you have to define it, and then require people to do it. I don’t know.
DR. CARR: How about this? The first bullet is identify the high-value data sets that need timely release. Then, among those data sets, do the life cycle.
DR. COHEN: The mortality example, for some purposes, the life cycle is really short. For other purposes, it is not. All the data aren’t necessarily federal. I would support data owners reviewing their data to see if they can improve the processes. I think it is reasonable, in this day and age, where getting data rapidly might have advantages for surveillance, for instance.
DR. BLEWETT: Does this mean that you are going to ask them to do it before they post it on HealthData.gov?
DR. CARR: No. This is an extracurricular activity for evenings and weekends. When we are working on this letter, they can be working on this.
DR. BLEWETT: As long as that is clear.
DR. CARR: If we say support data owners reviewing their data to see if they can improve.
DR. COHEN: Chris, you should just yell out.
DR. FULCHER: I am just wondering where you were before, around in collaboration with key stakeholders, identify high demand, high value data sets. I know Damon was talking about having a completely wide open, no user name, no password. I am trying to understand the rationale for that because you can get so much more from a user-centric approach of having that login, so you can find out who is accessing what type of data sets.
You don’t have to go and ask people. It is already rising to the top in terms of the data that is being accessed, and what type of user are they? Are they a policymaker, a researcher? You are already able to garner all of that information without having to go out. My question is, does anyone know from an HHS standpoint that you cannot have username passwords, or is that possible?
DR. QUEEN: Even just to do a couple of questions.
DR. CARR: Susan doesn’t think you can ask a couple of questions.
DR. QUEEN: If you are asking questions —
DR. FULCHER: There is a lot around this theory of action approach, where you are able to get much more the needs of the end user if you have some data in the very basic profile about what they are interested in and who they are. I am just wondering, is it a limitation?
DR. QUEEN: For what it is worth, if it were something that required PRA, it probably would be considered what is called a customer satisfaction type of thing. It would be a value to have it.
DR. CARR: I think we heard from Damon that there was consideration of the pros and cons of having people identify themselves.
DR. COHEN: I would make it optional at best because we have discovered there is a chilling effect when people have to go through that gate-keeping process. You want to be anonymous.
DR. FULCHER: Believe me, I completely hear you because with our center for the past 15 years, we have gone back and forth. We have vacillated between no login, login, no login. Now the foundations, they want metrics, they all want logins. From a user-centric design standpoint, they need to have those metrics. I know what you are saying.
DR. COHEN: The optional approach where people are committed and want to provide feedback, it is a segment of probably power users, which is probably what you want to some extent. If people want additional benefits, like being able to provide feedback in a certain way, or getting certain information, then having the optional login is the way.
DR. BLEWETT: Census does that all the time, where that little bubble comes up when you are accessing Census data.
DR. FULCHER: On one hand, we want to understand our users better. We want to know their needs. On the other hand, we feel that don’t want to know.
DR. BLEWETT: Innovative ways to get more information about your users or something like that.
DR. CARR: We need to shut down the equipment here at 10 minutes of 5:00. We have got about 25 minutes to make some headway on this. I really do appreciate all of the insights. I think we are making headways. It is just hard work. Do we want to say everything or nothing? I like what you said, consider ways of doing that, of getting the information so that you could do that.
Back to the life cycle, are we saying what we had was support data owners reviewing their data life cycle to see if they can improve? Do we want to make that recommendation? Yes or no?
MR. SAVAGE: It seems to me, reading that, it is useful to understanding what is timely and what is not, just having a sense of how it varies. It is going to be different for different data sets. That is useful information.
DR. CARR: The data life cycle, but we said that is a lot of work. Do we want to say review their data life cycle on high demand? Right now, can you tell what a high demand is?
DR. QUEEN: You are also not being real descriptive in telling them how often to do it or how to do it, or to what data set. I think the recommendation right now does allow a certain amount of flexibility.
DR. CARR: Let’s say support data owners reviewing the data life cycle on high demand.
DR. COHEN: I wouldn’t even put high demand.
DR. CARR: Support data owners reviewing data life cycle on high impact data sets or selected data sets. We are not going to do everything. I am responding to the comment that there is 998.
DR. QUEEN: The way it was worded before, it didn’t say on every data set. Now, when you say high impact, the agency does have to know which ones are. There is a sort of flexibility.
DR. CARR: Support data owners reviewing the data life cycle to see if they can improve. We won’t say request that. We can just say including map out and evaluate or something like that, or e.g. or i.e. or something like that. Map out and evaluate, okay, so then that is good. We have got two, this is good. This is very good. Can we go for three?
Borrow from successful examples outlined above. We are telling the secretary to borrow from an example. Those examples are, one, releasing data sets when not complete, but marked as provisional with all the caveats. Do we want to leave that in, yes or no?
DR. BLEWETT: I think we were saying when methodologically sound or something.
DR. SUAREZ: We don’t want to just limit the successful samples to the ones outlined above, first of all. Borrow from successful samples, such as those described above.
DR. CARR: Such as, good idea. Thank you, Walter. Those outlined above, and then if appropriate, okay, good, release data sets, okay. That is the first bullet. The second bullet, encourage and support more timely data collection by capturing data electronically to decrease processing delays. Encourage and support, it is okay? Okay.
Next, clarity, so here we go to the metadata and to Chris’ point, there is overlap. Metadata about time periods, do we bring it up here? Is this appropriate for timeliness?
DR. BLEWETT: I think clarity in the metadata around timeliness.
DR. CARR: Egg, what the time period, when were they published, how often are they updated?
PARTICIPANT: Do you want to say provide clarity?
DR. CARR: Provide clarity, good. Okay, any objection on that? We might have those done. I think we have one section done. Let’s say the introduction is pretty much done.
DR. BLEWETT: I just have one quick comment on page three, the second from the bottom paragraph. Incentivize jurisdictions using financial, and I just said and other incentives.
DR. FULCHER: I was just looking at something. In drawing off our system, we have about 23,000 GIS data layers nationwide. Our top data layers because we are able to pull that, number one is hospitals, and then FQHCs, then public schools, vulnerable populations, then community health care centers, modified retail food environment.
You can just grab this from the system itself to really find out what rises to the top. Then, see how we can get more current data. Like POS data, right, you were talking in here that it was not very good in terms of the metadata, is that what I read? Then, really focus on improving that because it is being hit so much.
DR. CARR: Are you making a change to this part?
DR. FULCHER: I am just really trying to get it from the user standpoint, to elevate that data that people are actually grabbing, and do a better job with that first.
DR. CARR: I think we need to get to our concepts. We know we have work to do on metadata. We have talked about timeliness. We wanted to talk about usability, Paul’s point. The metadata, streamline data capture, explore facilitated ways of communities helping communities identify their pressing problems related to health, and then identify which problems would benefit from available data. Is that in scope or out of scope of this letter?
While this is something we talked about, I think this is going to come more out of the population.
DR. COHEN: I think the problem definition piece is too different from what this is. I think the notion that Mo suggested, and what we discussed with Damon, about making the data more useful for communities as part of the responsibility for HHS is an important statement. I almost saw a light bulb go off in Damon’s eyes because the focus has been pushing the data out. We really haven’t paid the same level of attention to how useful the data are to the users.
DR. CARR: Then I think that we could go back to that. We did have that in earlier letters to engage stakeholders in discussions about what they need and how they need it. We can capture that from our earlier versions. It is the data documentation metadata, timeliness, and third is user needs or usefulness for community.
DR. FRANCIS: Justine, I would suggest that you use that as part of framing, rather than as a particular recommendation because it is going to be hard to know what to put as bullets under it. I agree very much with Bruce that one of the really important things that came out earlier is that a lot of the whole data palooza has been aimed at commercial users.
DR. BLEWETT: I love that suggestion of a public health datapalooza or something that public community, public population health.
DR. COHEN: I think we do have actionable recommendations. That is actionable. I think the point that Chris was making about evaluating the data, the most heavily demanded data, to focus on that. I understand putting it in the frame, but I think there are a lot of points that we want to make about how HHS could focus on the issues about making data more useful.
MR. SAVAGE: I had a suggestion for a catch phrase because we like to capture people’s attention. We need to have use cases, and we need to have user cases.
DR. MAYS: I was going to suggest putting this notion at the end as a next step, that this is what we see as our next step is to hear from them, and then develop. That would be a nice thing to develop as well as the user case.
DR. CARR: Let’s hear a word about metadata.
DR. MAYS: I can tell you the little bit that I put in is to go the section where it says observation. Let’s start there. I did an opening sentence. The workgroup heard from members and communities, difficulty in using data sets because they experience the data set as not providing clear and sufficient information about the contents of the data set. Then, it goes on with an illustrative example, blah blah. Are we okay with that? We can clean it up.
DR. CARR: Damon, let me just catch you up. We have been reflecting in this letter to try to highlight the things that align with what your interests are and to kind of delete some of the things that don’t align as much to what your needs are. We talked about timeliness, and I am not going to go into that now. We have added a part about the user community and the usability. We will put something in on that. We are just onto metadata, which we struggled on, and now we are going back.
DR. MAYS: The introductory sentence is the workgroup heard from members and communities, difficulty in using data sets because they experienced the data set as not providing clear and sufficient information about the contents of the data set. Again, we can clean it up as this is quick writing.
I didn’t touch anything else until the executive order. The executive of May 9, 2013, so I am skipping down to where it starts with effort. Right after that, I say metadata helps individuals find the data they need and determine how to best use it. Metadata supports producers in locating and using their own data resources, and data consumers in locating and using data resources produced by others.
A metadata record is a file of information that captures the best characteristic of a data or information resource. This comes from the FGD people. Sorry, I just sent it too far.
DR. SUAREZ: From where?
DR. CARR: Federal Government Data Committee.
DR. MAYS: It represents the who, what, when, where, why and how of the resource. Metadata records include core library catalog elements such as title, abstract, publication data, description, keyword tags, modification dates, publisher, contact name, contact email, unique identifier, and public access level and database elements such as variable label definition and variable values.
DR. CARR: That is the same as what is in the 1313.
DR. SUAREZ: That is already covered there.
DR. MAYS: I made a couple of edits to the recommendations. HHS can improve the usefulness and usability of the online HHS data, including the data by providing more information about data sets in a clear and easily understood format, useful to a range of audiences.
The second bullet I said data publishers should present this information as if the audience were not familiar with the data set, the data system or the data collection in order to achieve usefulness to a wide variety of audiences. We keep bouncing between all of these audiences. That is it. I don’t have anything else.
MR. DAVIS: I didn’t have anything. I was trying to read it. I didn’t hear anything out of order or anything.
DR. CARR: Hopefully it aligns and reinforces what you already have set out.
MR. DAVIS: It sounded like it was right down the middle, as far as I could tell.
DR. CARR: Is there value in adding, Leah brought up the point about metadata for PDFs? Is that a new concept?
MR. DAVIS: It is new and therefore potentially valuable, I would imagine, in this letter because it is not in the strategy.
CARR? I think then we ought to add that, as well. We can clarify our details. We have struggled a bit with the discussion we had about whether to ask any question of who you are, to get any indication of who is using what data set. It sounds like that is something that you thought about a lot and ended up not asking for.
MR. DAVIS: Well, in developing the strategy, I think we didn’t necessarily go through an evaluation of whether or not we should. I think we recognize the futility in doing so. We didn’t necessarily think through whether we wanted to make a change in that regard. Obviously, we said that we think that there is value in doing so. Therefore, I could see how the group would be interested in then making that recommendation. Then, it would be up to us to figure out how best to implement that.
DR. CARR: I think it is iterative over time. Right now, you don’t want to scare people off in the beginning. We just had some conversation about that. What is scary in the beginning, Chris has the experience obviously in your group that there is enough knowledge about the data sets that signing up isn’t a disincentive.
DR. FULCHER: I think we have vacillated for many years, going back and forth, to no user name and password to having it. It also depends on the funders. The foundations are really wanting some clear metrics. As we move to more user-centric model with this theory of action process, so depending on who the user is, you can better help them look at similar types of data.
In our case, they are maps or reports, depending on if they are policymaker versus a researcher because they have a profile. They created a profile with the user name and password. It is free, but it helps them hone in on what they really need potentially. It helps us better identify where we need to do a better job in making sure the currency of the data is there, as it is made available.
Damon, it could be a longer discussion because we have spent hours and hours going back and forth with this because of some of the things that were brought up here. We landed on the user name and password. If you look at all social media sites, they all require user names and passwords.
MR. DAVIS: She suggested doing a pilot of the user name and password.
DR. CARR: Pilot it for a month and see what happens.
DR. QUEEN: I think the implication is, given the president’s policy, open data initiative.
DR. CARR: Is there anything else that we can help you with to advocate for, either that is in your strategy that needs reinforcement, or that isn’t in the strategy that ought to be elevated for consideration?
MR. DAVIS: It is a little bit challenging for me to think of that off the top of my head, without having read sort of what you have currently put in it. What is the timeline for when you are going to send this?
DR. CARR: We are going to send it when it is ready. We came away from the meeting with one timeline, but just your presentation, that whole discussion, has really made us think a little differently.
Here is what I would propose, that we pull together the notes that you have, I have, Lily has. Get a draft out and get it to the workgroup, and you as well. Let’s hear back in a week’s time, or whatever is the appropriate amount of time, to say, all right, these are the issues that we have landed on. Are they said correctly? Are there other issues?
MR. DAVIS: I would appreciate that. Thank you.
DR. CARR: That would be great for us, too. We want to add value. Does that make sense, that we just take our time? There is no sense in rushing through this. I think we made headway.
Sometimes, when something feels hard, it is because it is hard. This is probably hard, harder than we would have thought. I very much appreciate everybody’s insights. Are there any closing comments? Vickie, did you have anything further to say?
DR. MAYS: I would just ask whether or not, Bruce and Walter were making such distinctions about how they wanted the metadata. I didn’t think it was what we have there is okay. Maybe if you all could like send us an email about this distinction, so that it is captured.
DR. CARR: I have some of it here. Walter’s point was there is the data documentation and the data dictionary.
DR. SUAREZ: There are data documentation and metadata. Now, everything can be rolled under sort of metadata because it is data about data. It is helpful to distinguish data documentation as in two things. I think we talked about maybe a third thing.
The description of the data in a vernacular way and the simple way, the data dictionary, and then some additional description of methodologies and things like that. Those three things were part of what I called data documentation. Separate from metadata from a technical level perspective that is more electronic tags associated with the specific data to identify specific fields.
DR. CARR: I got that. That is very helpful, and we will incorporate that. I think that does help us. I think that is where we were struggling. The vernacular for the everyday user and the other is the tags, which are of these core elements. I don’t know, is there value in having this attachment?
DR. KAUSHAL: I have one last comment, as well. I know both you and Lily, and others, have been spending a lot of time, evenings, weekends and thank you.
DR. CARR: Thank you. We are all learning together. It is a little bit different from what we usually do. We definitely want to be of value. We will be more nimble the next time through. Thank you very much.
With that, the workgroup will adjourn. Thank you. Safe travels.
(Whereupon, at 4:45 p.m., the workgroup was adjourned.)