Department of Health and Human Services
National Committee on Vital and Health Statistics
Working Group on HHS Data Access and Use
February 25, 2015
National Center for Health Statistics, Hyattsville, MD
The Working Group on HHS Data Access and Use was convened on the afternoon of February 25, 2015 at the National Center for Health Statistics in Hyattsville, MD. The meeting was open to the public. Present:
Working Group members
- Bruce Cohen, Ph.D.
- Vickie Mays, Ph.D., Chair
- Kenyon Crowley, MBA, MS (phone)
- Leslie Pickering Francis, J.D., Ph.D.
- Mohit Kaushal, MD
- Joshua Rosenthal, Ph.D.
- Mark Savage
- Walter Suarez, MD
- Paul Tang, MD (phone)
- Leah Vaughan, MD
- Christopher Fulcher, Ph.D.
- Bill Davenhall
- Chris Gibbons, MD
- Pete Hudson, MD
- Patrick Remington, MD
Lead Staff and Liaisons
- Damon Davis, HHS IDEA Lab
- Debbie Jackson, NCHS, Acting Exec. Secretary
- Katherine Jones, NCHS
- Maya Bernstein, ASPE
- Marietta Squire, NCHS
- Linda Kloss, NCVHS member
- Mr. Soonthornsima, NCVHS member
- Susan Kanaan, NCVHS writer
- Rachel Hornstein, ASPE
Note: The transcript of this meeting is linked to the meeting agenda on ncvhs.hhs.gov. Use the meeting date to access the agenda.
In this half-day meeting, the Working Group examined the evolving set of guiding principles it is developing from several perspectives:
- Open data as a process, with eight key concepts (Mr. Crowley)
- Data communities and best practices (Dr. Cohen, Dr. Rosenthal)
- Bottom-up strategies (Dr. Tang)
Mr. Davis commented on these themes and gave an update on the Idea Lab’s work and plans. HHS Entrepreneur-in-residence David Portonoy briefly described his initiative to develop demand-driven mechanisms for HHS open data. Finally, Dr. Francis commented on data stewardship issues raised by open data and data linkages, and Dr. Mays outlined the Working Group’s next steps.
These presentations and discussions are briefly summarized below.
As an introduction to Mr. Crowley’s presentation, Dr. Mays noted that the Working Group is looking at impediments and barriers as part of developing guidance for HHS on data access and use.
Open Data as a Process―Kenyon Crowley
This is a follow-up to earlier discussions around the development of guiding principles. Mr. Crowley pointed out that open data should be seen as an ongoing process, not just a product. This process encompasses collection, publication, description, and use/reuse, all of which is constantly evolving. By the same token, open data policy and resources should be continually assessed and improved. He presented and commented on eight “challenge areas” that represent key concepts and potential solutions that could form the structure for guidelines:
- Availability and access
- Linking and combining data
- Data provider support
- Community-building and learning
The presentation stimulated comment and discussion from Working Group members, with these key points:
- Add usefulness to usability (AKA applicability, meaning).
- Develop examples of each attribute.
- To enhance usefulness, flag the “all-star” datasets with the greatest value/meaning for specific stated purposes.
- Maybe ask users to rate datasets as to usefulness and describe uses.
- Transparency is an important dimension of findability and usability.
- Distinguish primary purpose of data collection and purpose of secondary use(s).
- Add timeliness; perhaps tag datasets as to timeliness variables, including shelf life (relevant to secondary use).
- Use non-technical terms that everyone can understand and/or add descriptions (e.g., findability = Can I find what I need?).
- Just exposing the metadata will add a lot of value.
- The biggest challenges and risks come from linking and combining data; just knowing what mash-ups are happening would help.
Mr. Davis commented on a number of these variables from the perspective of Idea Lab initiatives and expressed particular interest in “the [user-]community-building piece.” He voiced some concern about the potential burden stimulated by inviting user questions (attribute #7). Mr. Crowley recommended facilitating “community-to-community relationships” for mutual aid and answering questions. Other suggestions were to include FAQs from the data provider and ask users to indicate where HHS can be most helpful to the user community.
Data Communities and Best Practices―Bruce Cohen, Josh Rosenthal
Dr. Cohen commented on the perspectives of local (physical, geographic) communities on the eight areas listed above, based on his experience working with community members and leaders. He stressed the need for useful summary measures, granular data on small populations, and both qualitative and quantitative data, and communities’ customary focus on aggregate data. The challenge that communities need help with is transforming data into information that points them toward solutions. Thus data providers should make data useful in community decision-making, and provide training and tools so community members can use the data. One model is to use federal regional centers as a network for providing technical assistance. Data can be a powerful tool in building community by drawing attention to priority issues.
Dr. Rosenthal commented on the foregoing presentations and discussion, noting what will create a sense of community within data-user communities. Dr. Mays noted that this is the key question for the Working Group. Dr. Rosenthal described the outlines of a meaningful learning center and said the three things that would be most helpful are perceived usage level, perceived community identification, and free tagging. This connects to the idea of a release package from the data provider and the idea of “Mad Libs”-type queries to prompt answers from users about specific intended data uses.
Mr. Davis noted the varied uses of the term “community” in the foregoing discussions―physical community versus communities of data-users―and added that his major focus is on the latter use of the term. He noted that ultimately, localized communities will benefit from the growth of data-user communities, which the HHS Idea Lab intends to support.
Other points made in this portion of the meeting:
- Users also can be categorized by level of expertise.
- Physical communities vary widely in their data sophistication and what they want and need.
- The “prize” for the federal government is improving the quality of community life.
- An application could be developed that helps potential data users identify what data they need and for what purpose.
- Rather than a single definition of community, what is needed is a taxonomy that reflects the range and diversity of types of communities.
Bottom-Up Strategies, and Starting with Problems―Paul Tang
Dr. Mays introduced this segment of the agenda with the observation that the Working Group needs to consider what mechanisms will make it possible to institutionalize the voices of communities (however defined) so they can provide feedback to HHS.
Dr. Tang offered thoughts on the bottom-up perspective, building on Dr. Cohen’s comments. He stressed the importance of “making data useful in an obvious way.” And data can only be made useful when the provider understands what problem needs to be solved. Beyond that, there must be exemplars that show how data can be relevant. The next step is to make the data accessible, which is best accomplished by incorporating the data into the workflow.
Dr. Mays observed that it is natural for HHS to do things the top-down way; to stimulate improvements, the Working Group wants to figure out more ways to “hear from the everyday person…in the way they are in their lives.”
Update from HHS Idea Lab―Damon Davis
Mr. Davis said the Idea Lab is working on revitalizing healthdata.gov, and the Working Group’s valuable input would help to inform that process. Attention will be paid to improving usability, adding new functionality, and making it more flexible and dynamic. Community-building will be a top priority, moving forward. Administratively, the Department is building up its data science dimension with chief data officers, for example at CMS. D.J. Patel is the new Chief Data Scientist at the White House. These developments reflect a new phase of data utilization in the federal government. He encouraged the Working Group to help HHS envision a data usability scorecard that could be used with data-owning agencies to guide specific improvements. He asked the Working Group for comments on whether such a thing had potential benefit, prompting a brief discussion of its pros and cons and different ways to use a scoring system.
Asked about the agency’s priorities, given that it cannot be all things to all people, Mr. Davis returned to the priority of community-building and the intention to focus on two or three top sets of communities whose engagement will lead to the engagement of others. Dr. Kaushal suggested focusing on the problems entrepreneurs are having. Mr. Davis invited Working Group members to send him suggestions about what communities could be instrumental in supporting the learning health care system and improvements in health and equity. Dr. Mays suggested taking advantage of existing efforts, including those of federal agencies now providing local outreach (e.g., NIH’s CTSIs, SAMHSA, HRSA). After asking for inputs for his Datapalooza session, Mr. Davis introduced the next speaker.
Demand-Driven Open Data―David Portonoy, HHS Entrepreneur-in-Residence
In this program, HHS invites entrepreneurs to apply their expertise to helping solve problems identified internally. Mr. Portonoy’s focal issue is the Department’s inability to gain meaningful feedback from data users about what HHS should liberate, and how. He described his past experience using HHS data and how various “minor improvements” in the data could have made them more useful. The idea of demand-driven open data is to create a systematic, ongoing and transparent mechanism to enable data-users in industry, research, and the general public to indicate which uses and linkages would be most useful to them―i.e., a built-in feedback loop.
After this introduction, Working Group members commented on the need for a contact person (or two) for each database and for a data governance structure. Dr. Mays said there would be an opportunity at another meeting to talk further with Mr. Portonoy after reviewing the website.
Stewardship Issues―Leslie Francis
Dr. Francis commented on some of the health data stewardship issues associated with open data and data linkage. She noted that the new NCVHS stewardship toolkit was influenced by fair information practices, which emphasize doing things responsibly and ensuring openness and transparency. Thus as healthdata.gov releases data, people need to understand that this is happening and what data are being released. Purpose specification is another key principle, with the awareness that some purposes are more publicly beneficial than others. The first principle is to start with transparency about what is being released and what people are doing with it.
Members agreed that with spontaneous mash-ups, it is often not possible to know the purpose until after the fact. Dr. Stead articulated several related principles including the purpose for which the data were originally collected and the purpose of use, adding that it must be assumed that all data are identifiable. Mr. Savage proposed that the Working Group consider making a recommendation that re-identification of data be prohibited, as his organization has done.
Dr. Mays said the next major priority for the Working Group is to secure new staff support. Meanwhile, the guiding principles will be fleshed out further based on the foregoing discussion and brought back to the next meeting in May. She then adjourned the meeting.