Department of Health and Human Services
National Committee on Vital and Health Statistics
Work Group on HHS Data Access and Use
June 15, 2016
Capitol Hilton Hotel, Washington, DC
The Working Group on HHS Data Access and Use was convened on the afternoon of June 15, 2016 at the Capitol Hilton Hotel in Washington, DC. The meeting was open to the public. Present:
Work Group members
- Bruce Cohen, Ph.D.
- Vickie Mays, Ph.D., Chair
- Kenyon Crowley, MBA, MS
- Helga Rippen, Ph.D.
- Joshua Rosenthal, Ph.D.
- Walter Suarez, MD
- Paul Tang, MD
- Chris Boone, MD
- Leslie Francis, J.D., Ph.D.
- Chris Fulcher, Ph.D.
- Tessie Guillermo, JD
- Mark Savage, MD
Lead Staff and Liaisons
- Rebecca Hines, NCHS, Executive Secretary
- Debbie Jackson, NCHS
- Damon Davis, HHS IDEA Lab
- Lee Cornelius, Ph.D.
- Alexandra Goss
- Linda Kloss, MA
- Rich Landen, MPH, MBA
- Denise Love
Others (not including presenters)
- Marietta Squire, NCHS
- Katherine Jones, NCHS
- Rashida Dorsey, ASPE
- Jim Sorace, ASPE
- Wendy Nilsen, National Science Foundation
Note: The transcript of this meeting is linked to the meeting agenda on ncvhs.hhs.gov. Use the meeting date to access the agenda.
Call to Order―Dr. Mays
Dr. Mays welcomed Work Group members, NCVHS members, and others in attendance. She noted that Dr. Rosenthal and Dr. Rippen presented an overview of the data matrix the Work Group is working on to NCVHS on June 14. Most of the present meeting will focus on the matrix.
As background, Dr. Mays said the Work Group has wanted to develop advice for the Secretary and HHS data providers/owners about how to increase access and use of HHS data. It also wants to identify best practices and principles in this area. The idea is to help data users/seekers identify what information they want and need in four user groups: entrepreneurs and data warehouses, researchers, community organizations, and consumers. She asked those at the table to respond to these questions: 1) What information do you need to know to decide if a data set is useful to you? 2) What information about a data set have you looked for but found hard or impossible to find? The ensuing wide-ranging discussion revolved around the following themes (which did not precisely align with the two questions):
- What people want to know about a given data set:
- Source of the data and its credibility
- Reliability, validity, and quality of the data
- Information about the indicators themselves
- Geographic parameters, level(s), granularity
- Is it exportable and machine-readable
- How representative are the data
- Are the data usable as is, without needing programming? How easy are the data to use?
- How searchable is the data set?
- How do I get information about people like me?
- Metadata, including entity relationship diagram (ERD), plan continuation, cost
- How findable are the data?
- Are visualizations associated?
- How do the data fit into non-data-driven priorities?
- How the information can be used
- Related topics and ideas:
- Web-based data query systems
- Data packaging
- Making requests transparent
- Graphic summaries of data attributes
- Good search capabilities, which depend on data tags
- Standardization of data content across sources
- Clear information on the restrictions on data use and re-release
- Perspectives of consumers and community organization data users:
- Communities lack access to granular, neighborhood/sub-county-level data.
- Communities vary widely in terms of their capacities to use data.
- Communities need help in clarifying their questions and what they want to learn from data. And what questions will the data answer?
- How can communities know which data are best for them?
- Which data, at what level, best fit the community’s ability to use data?
- A local physician seeks complementary, non-health information such as on community services, social determinants, local resources.
- Strengthening the public interface, considering public/private partnerships
- One data producer/provider/owner aims to provide easily understandable, meaningful to real-world issues, quantitative information to help with qualitative decisions. It helps to know who the end user is, although many data sets have multiple user groups.
- Data providers/owners need information on end users and feedback from end users.
- The IDEA Lab wants to implement demand-driven open data, with a feedback loop that enables follow-up with data-seekers who couldn’t find what they were looking for. Also, it is trying to make healthdata.gov requests transparent and allow users to vote them up and down to help prioritize data requests.
- The example of the uses of NOAA’s framework (National Oceanic and Atmospheric Administration) for pushing weather data to consumers was of special interest, with acknowledgement that health data are more complex and involve privacy/confidentiality issues. One broad point: the original source of the data does not have to be visible to consumers of the informational end-product.
- Limitations, constraints, and challenges for the federal data enterprise: Duplication of effort; need to consolidate indicator data in a single location. That might make it more feasible to invest in a more effective interface with the public.
- Opportunities for public-private partnership―especially to develop the “push” model of data distribution to help get data into more usable and accessible formats.
- Where do APIs fit in this vision? The group agreed that they have a role in widening access to data. Mr. Davis suggested that NCVHS might advise the Department to consider including APIs in re-competed contracts.
- A recurring theme was comparisons between “push” and “pull” models of distribution, and especially the advantages and potential efficiencies of the push model and how it could save the federal government time and money. The idea, in Mr. Davis’s words, is for the government to “invest a little bit to enable the public a lot.”
- In a world with the envisioned information systems, perhaps HHS should create a staff position such as “data set moderator” with responsibility for supporting the outreach effort and ensuring that HHS data are being used appropriately.
- Broader, framing questions:
- What is meant by “consumer”?
- What kind of product does the Work Group contemplate, and for what primary audience(s)? Who are the chief data users envisioned?
- What data access and use resources are feasible for the federal government to create and sustain?
- Other observations:
- Timeliness may not be as critical as people sometimes think; and reliability has different tolerance levels for different types of use.
- Need to balance rapid turnaround and expediency with optimal data quality and usability.
- Remember NIH-funded studies as another data source. Think about talking to researchers about innovative ways to use their data sets.
Four NCVHS schema
Finally, participants discussed the following NCVHS schema (some of them embryonic) and how they can be made into a complementary set of tools: 1) the data and methods framework, drafted by the Framework Work Group; 2) the community data users toolkit, developed by the Subcommittee on Privacy, Confidentiality and Security; 3) the community health and well-being measurement framework, now being developed by the Subcommittee on Population Health; and 4) the data matrix, now under development by the Work Group on HHS Data Access and Use. They agreed to keep working on an overall conception, with crosswalks, for such a tool set.
Returning to the topic of the user groups envisioned for the data matrix project, participants explored the appropriate level-set for the Work Group’s work. They agreed that community groups are more likely to seek and use data than individual lay persons (consumers). People with health conditions do seek information and data on their conditions and the outcomes for “patients like me,” in which case they are part of an affinity group. Work Group members favored setting aside “the individual” as a key end-user of its products.
Dr. Mays asked whether there are relevant data standards the Work Group needs to be aware of when it asks people to make their data available. Ms. Goss pointed to USHIK, the interoperability roadmap, Meaningful Use, and HIPAA. She suggested thinking in terms of the payload, the data content, the code set, and the data structure syntax. Dr. Rosenthal mentioned the importance of the ERD; Dr. Rippen mentioned the requirements for geocoding. Dr. Suarez noted past NCVHS work around standardization of questions and data elements. Dr. Hines pointed to the relevance of the data and methods framework, which Dr. Stead has presented to HHS data leads. That meeting generated the idea of applying the framework to an HHS data system or two as an exercise.
Next Steps/Work Plan
Dr. Mays noted the need to seek input from HHS staff on the potential utility and value of the data matrix. Also, the Work Group needs staffing. She then adjourned the meeting.
I hereby certify that, to the best of my knowledge, the foregoing summary of minutes is accurate and complete.
September 19, 2016