Department of Health and Human Services
National Committee on Vital and Health Statistics
Working Group on HHS Data Access and Use
March 1, 2013
Hubert Humphrey Building, Washington, DC
The Working Group on HHS Data Access and Use was convened on the afternoon of March 1, 2013, at the Hubert H. Humphrey Building in Washington, DC. The meeting was open to the public. Present:
Working Group members
- Justine M. Carr, M.D., Chair
- P. Kenyon Crowley, MBA, MS
- Bruce Cohen, Ph.D.
- Bill Davenhall, ESRI
- Leslie Pickering Francis, J.D., Ph.D.
- M. Chris Gibbons, MD
- Mohit Kaushal, MD
- Joshua Rosenthal, Ph.D.
- Walter Suarez, MD
- Paul Tang, M.D., MPH
- Leah Vaughan, MD
- Pete Hudson, MD
- Patrick Remington, MD
- Kalahn Taylor-Clark, Ph.D.
Lead Staff and Liaisons
- Marjorie Greenberg, NCHS, Exec. Secretary
- James Scanlon, ASPE, Exec. Staff Director
- Ed Sondik, NCHS Director
- Susan Queen, ASPE
- Lily Bradley, ASPE
Others (not including presenters)
- Debbie Jackson, NCHS
- Katherine Jones, NCHS
- Marietta Squire, NCHS
- Lynn A. Blewett, Ph.D., NCVHS member
- Vickie Mays, Ph.D., MSPH, NCVHS member
- Susan Baird Kanaan, consultant writer
Note: The transcript of this meeting is posted on the NCVHS Web site, http://ncvhs.roseliassociates.com. Use the meeting date and agenda to locate it.
In this half-day session, the participants heard and discussed several presentations focusing primarily on the usage and users of federal health data. They also started to plan a session at the June HHS Health Data Initiative Datapalooza.
Ms. Bradley, the new lead staff to the Working Group, presented her preliminary analysis of usage of healthdata.gov. Dr. Rosenthal presented a tool he has developed that Dr. Kaushal suggested might be showcased at the Datapalooza. The group agreed that the session should feature two applications, one sophisticated and one less so, to show the range of things that can be accomplished with federal data. Mr. Davenhall presented a provisional five-star “data grade” system that could be used to evaluate the physical quality and technical robustness (i.e., not the content) of federal data, based on user feedback. Dr. Vaughan reported on illustrative uses of open federal data by public health, non-profit, and philanthropy sectors, and she urged that these communities be included in outreach efforts and enlisted as partners. Dr. Gibbons discussed the central importance of starting with a use case and suggested that the Working Group help collect and prioritize use cases. The group agreed to host a Datapalooza session along the lines discussed. Ms. Bradley suggested deciding on the desired target audience and tailoring the session accordingly.
Following introductions, Dr. Carr noted that Working Group members are encouraged to participate in an NCVHS workshop on community data needs on April 30-May 2. She then reviewed the Working Group’s charge and its practice of identifying distinct supply- and demand-side issues and devising strategies to bridge the gaps between the two. Today’s meeting will include presentations about the demand side. For clarification, Dr. Cohen noted that communities are end users and developers are intermediaries.
Understanding Use of healthdata.gov – Lily Bradley, ASPE
Ms. Bradley, the new lead staff to the Working Group, presented slides and a preliminary analysis of usage of healthdata.gov, based on Google Analytics data. The purpose of such analyses is to inform resource allocation and data product development. The site has had 43,000 unique visitors since its launch in June 2012. She described details about usage, noting that understanding referral patterns could help the Department improve promotion. Over time, end-page analytics and what other sites users visit when they leave will increase understanding of how users are navigating the website. Significantly more of the uses relate to public health than to commercial uses. She said the analysts would look at metatags to generate additional information on usage.
Dr. Rosenthal suggested having user-generated (community) tags to accelerate those dynamics. Mr. Crowley wondered about querying users as to whether they found what they were looking for. Ms. Bradley also noted limitations associated with using the Internet Explorer browser.
Plans for a Datapalooza session – Dr. Kaushal and Dr. Rosenthal
Dr. Kaushal, among others, remarked that “there is huge room for improvement” in the total number of healthdata.gov users. He then presented ideas for a proposed Working Group session at the June 2013 Health Data Initiative (HDI) Datapalooza, noting that this event has created a valuable community and ecosystem for pushing out data. The major topics of the proposed NCVHS Working Group session include describing the purpose and activities of the Working Group and demonstrating what can be done with federal health data.
One possibility for the demo, he said, is to showcase a tool created by Dr. Rosenthal. Dr. Rosenthal said he used games at the 2012 HDI event (speed dating and MadLibs) to demonstrate how to take government data and “turn it into something meaningful.” He then described and demonstrated new software he has developed for health plans and providers, which he said illustrates the process of first identifying a business problem or question and then finding and deploying data to help solve it. In this case, the tool uses public data to analyze variables and determine how to allocate resources and target spending to improve market share and performance. He reiterated that the key question is, “What problems are we trying to solve?”
Later in the meeting, Dr. Rosenthal explained more about his 2012 session, as an example of what might be done in 2013. His process began by eliciting statements about what problems the participants wanted to solve, thus creating a use case; then participants identified what types of data (hypothetically) would help solve it. The next step would be to see if such data existed.
In the group discussion of these topics, Dr. Carr noted the combination of sophisticated knowledge management and innovation evident in the demo. Dr. Mays pointed out that social media and politics use the same techniques “to get people to move to action.” Several members agreed with Dr. Cohen’s suggestion that the Datapalooza session feature two applications, one sophisticated and one less so, to show the range of things that can be done. Dr. Carr observed that the idea is to show how combining data can create added value. She also noted the importance of including the middleware that explains the sources and other attributes of the data. Dr. Blewett suggested showcasing both a business application and a public health application. Mr. Scanlon urged that the session allow time for discussion of what people think HHS should be doing.
Data Grading System – Mr. Davenhall
Mr. Davenhall noted the variations in data quality, sources, and other attributes. He identified a group of frequent users of federal data and got their agreement to comment on the quality and usefulness of the data. He devised a matrix, and is looking for more people to give this kind of feedback about HHS data. To help prioritize what data are most useful and “leveragable” across the U.S., he created a provisional five-star “data grade” system for evaluating the physical quality and technical robustness (i.e., not the content) of federal data. The idea behind it is “to be more intentional about the datasets we need to work on,” so that users don’t have to “clean the data.” He suggested setting up a webpage where people who touch the data could comment on it. Dr. Rosenthal suggested having users rank the data.
The Demand Side – Dr. Vaughan and Dr. Gibbons
Dr. Vaughan reported on her preliminary survey of users and uses of open federal data, with a focus on public health, non-profit, and philanthropy sectors. She showed usage data for data.gov by geography and the top 10 data sets, and then described the types of health challenges on Challenge.Data.gov and illustrative codeathon and hackathon activities. She noted the need for a HIPAA-compliant mechanism to help enroll the many unenrolled families in California’s Covering California Families portal, noting that this problem illustrates the digital divide. Among other examples, she also described an event at Stanford University that reached out to the medical school community for ideas on data use; EcoHack, a project to understand the impact of environmental pollution on cancer disparities; and sites developed by Healthy Communities Institute (HCI). Working Group members wondered about how HCI websites are used to communicate, define problems, and inform decision-making. Dr. Vaughan concluded that her informal conversations with informants illustrate how such users define success, where their pain points are, and what information resources would be helpful to them. She urged efforts to invite users from these communities into the process. She also noted the need for “developer-evangelists” who can reach out and engage new communities of data users from backgrounds other than “data geeks and developers,” because “there are more opportunities for partnership than we realize.”
Dr. Gibbons suggested looking at who is not using these data sources as well as who is, and thinking about why and what can be done about it. He stressed that “it all starts with a use case.” For this reason, subject experts need to clarify the problems to be solved before involving developers and coders. He noted that Dr. Rosenthal’s application is essentially a creative response to a use case. He suggested that the Working Group help collect and prioritize use cases and then make the connections with developers.
Dr. Francis wondered about the degree of interest in identifiable data in these scenarios, and Dr. Carr noted the centrality of trust. Dr. Gibbons observed that a surprising amount of information can be distilled from public data sets without identifying individuals.
On the idea of connecting use cases and developers, Dr. Mays suggested adding a day onto the NCHS data users’ meeting to facilitate these connections. Dr. Cohen suggested government investment in training opportunities.
Finally, the group agreed to host a Datapalooza session along the lines discussed above. Ms. Bradley asked about the target audience, and suggested that the Working Group decide what audience it wants and tailor the session accordingly.
Mr. Crowley raised the question of how to stay engaged with people who have already expressed an interest and “shown up,” to make them part of the learning community. Dr. Gibbons suggested thinking about mentoring. Ms. Greenberg suggested that members start to collect use cases and to think about innovative forms of dissemination.
Dr. Carr then adjourned the meeting.
I hereby certify that, to the best of my knowledge, the foregoing summary of minutes is accurate and complete.
/s/ June 20, 2013