Department of Health and Human Services

National Committee on Vital and Health Statistics

Working Group on HHS Data Access and Use

September 23, 2014

National Center for Health Statistics, Hyattsville, MD

– Meeting Minutes –

The Working Group on HHS Data Access and Use was convened on the afternoon of September 23, 2014, in Hyattsville, MD. The meeting was open to the public. Present:

Working Group members

  • Vickie Mays, Ph.D., Chair
  • Bill Davenhall
  • Bruce Cohen, Ph.D.
  • P. Kenyon Crowley, MBA, MS
  • Leslie Pickering Francis, J.D., Ph.D.
  • Mohit Kaushal, MD
  • Joshua Rosenthal, Ph.D.
  • Mark Savage
  • Walter Suarez, MD
  • Leah Vaughan, MD

Absent:

  • Christopher Fulcher, Ph.D.
  • M. Chris Gibbons, MD
  • Pete Hudson, MD
  • Patrick Remington, MD
  • Paul Tang, MD

Lead Staff and Liaisons

  • Debbie Jackson, NCHS, Acting Exec. Secretary
  • Jim Scanlon, ASPE
  • Damon Davis, HHS IDEA Lab
  • Lily Bradley, ASPE

Others

  • Katherine Jones, NCHS
  • Marietta Squire, NCHS
  • Larry Green, NCVHS chair
  • Linda Kloss, NCVHS member
  • Wendy Nilsen, NIH/NSF
  • Susan Kanaan, NCVHS writer
  • May Wendt, OMH
  • Tammara Jean Paul, NCHS
  • Nicole Cooper, NCHS

Note: The transcript of this meeting and presenters’ slides are linked to the meeting agenda on http://ncvhs.hhs.gov. Use the meeting date to access the agenda.


EXECUTIVE SUMMARY

Opening Remarks and Introductions – Dr. Mays

The meeting began with a brief discussion of the Working Group’s charge.

Update on the HHS Idea Lab and Health Data Initiative – Damon Davis

Mr. Davis gave an update on healthdata.gov, which has grown to more than 1600 data sets. Federal data now have a 97 percent metadata quality score. He described his regular meetings with HHS health data leads and CIOs. He wants to organize an open data roundtable for HHS, and would like Working Group members to participate. Among other things, Working Group members suggested improving the access of qualified researchers to research data, encouraging consistent metadata standards across states, and enhancing consistency and continuity in HHS data releases. The group also discussed the need for a national infrastructure that facilitates data access and use.

Building a Framework for Guiding Principles for Data Access and Use (slides)

Members offered suggestions and comments on possible guiding principles for data access and use. Dr. Mays identified three types of HHS data users: 1) data warehouses, data builders; 2) professionals who use data and have some level of training and skills; and 3) consumers and community groups with basic-to-no skills in accessing and using data. The goal over time is to become clear about who falls in which “bucket.”

Mr. Davenhall presented a set of slides and handouts on data access and use, stressing the huge volume and complexity of federal data holdings and focusing on “high-value data.” He recommended defining what a high-value data asset is. These topics and others (e.g., timeliness) prompted considerable discussion.

National Health Interview Survey Data Release – Dr. Chris Moriarty, NCHS (slides)

Dr. Moriarty works on the National Health Interview Survey (NHIS) in the division of Health Interview Statistics at NCHS. After giving a brief overview of the survey, he described NHIS products, data release history, and future directions for data release. The focus of future data release will be primarily electronic. Mr. Davis said he would like to work toward closer coordination with the NHIS group, for example by using the healthdata.gov blog to support NHIS listserv announcements. Working Group members offered several suggestions for improving access to and use of NHIS data.

Usability and Access Parameters for Various Use Cases

At Dr. Mays’ request, the group discussed what kinds of feedback the Working Group might provide to NHIS and other HHS data suppliers. They also returned to major themes and ideas from the day’s discussions. There was considerable interest in involving the Health Data Consortium in understanding prospective users’ needs and improving HHS data access and use. After members made announcements of forthcoming events in their spheres, they commented on the process and structure for this meeting and offered suggestions for future meetings and activities.


DETAILED SUMMARY

Opening Remarks and Introductions

Dr. Mays reviewed the meeting agenda, stressing the importance of keeping in mind the Working Group’s charge, which asks the Working Group to look at seven areas, which she enumerated. Commenting on the charge, Mr. Davis noted the opportunity to assist in the value proposition for open data, noting the budgetary constraints on carrying out that intention. Dr. Cohen suggested documenting the secondary and tertiary uses of HHS data. Mr. Savage observed that much of the Working Group’s charge is backward looking, and he wondered how to position the group to be more forward-looking and creative as it shifts to more of a project focus. Dr. Mays said members would have a chance to return to this general topic later in the meeting, and the group should think about how to use these insights to structure future work.

During participant introductions, Dr. Cohen described the October NCVHS Roundtable on Supporting Community Data Engagement and encouraged members to attend.

Update on the HHS IDEA Lab and Health Data Initiative – Damon Davis (slides)

Healthdata.gov has grown to over 1600 datasets, from 300 last year. This includes data from HHS, USDA, the Department of Education, and other federal agencies, and there is an initiative to give access to state and local data. A related activity is improving the metadata across the Department, in accordance with OMB and Office of Science and Technology policy. Federal data now has a 97 percent metadata quality score.

Every HHS operating division has a health data lead, and they have been holding quarterly meetings; and Mr. Davis has been meeting with the operating divisions individually. They are exploring forming a Working Group to tie together CIOs and health data leads from across the op-divs. Another focus is a public access memo for research data. OMB requires each division to submit a public access plan; he recommended a public access memo update at a future Working Group meeting.

Mr. Davis noted the Working Group charge related to exploring the value of the data and looking at how data can be made available going forward. He is working with health data leads to map the HHS strategic plan to the open data available at the Department, to determine where supportive data are available and where there is an opportunity to provide access to high-value data. The related idea of proving the value of the data ties in here. Open Data 500 is an attempt to identify 500 companies that use government open data, and there have been two roundtables, which he described. He added that he wants to organize an open data roundtable for HHS, and he would like Working Group members to participate.

There is also an effort to understand the dissemination process for given HHS datasets and data tools, as well as the best ways to get information about the data out and alternative forms of dissemination. There is interest in looking at the flagship surveys in this light.

Discussion

Dr. Vaughn suggested finding ways to improve the access of qualified researchers to research data, to expand the pool of qualified users. Mr. Davis said the main challenge is opening access retroactively; he also noted the issues around maintenance of legacy data. Dr. Nilsen said she would bring this topic back to her NIH colleagues. Ms. Kloss later commented that data stewardship should take a “lifestyle approach” that includes attention to when it is time to retire data sets. Others commented on alternate possibilities such as archiving.

Dr. Cohen suggested that the Department or Working Group work on how to encourage and support consistent metadata standards across states; Mr. Davis said he has been pushing that idea.

Dr. Rosenthal commented on the need for continuity, visibility, and regular updating in the data HHS is releasing, to build public and business confidence in using the data. He also noted the need for ERDs.

Mr. Savage noted that coordinating with the National Coordinator before release of the HIT Strategic Plan would offer an opportunity to influence data practices.

Dr. Green prompted considerable discussion by pointing out the recurring theme of a missing national infrastructure to coordinate disparate activities around data development, release, stewardship, and use across public and private sectors to support knowledge development. Dr. Queen and Ms. Bradley reviewed the relevant federal policies and practices in these areas.

Mr. Davenhall cited the Healthy People program as a model for linking measurement to planning. He also noted the Health Data Consortium as a potential contributor to the envisioned infrastructure.

Building a Framework for Guiding Principles for Data Access and Use – Dr. Mays (slides)

Dr. Mays introduced a discussion of guiding principles for data dissemination and access. These principles could inform the Working Group’s interactions with people from HHS agencies that provide access to HHS data, such as Dr. Moriarty regarding NHIS data. She outlined three types of HHS data users to keep in mind: 1) data warehouses, data builders; 2) professionals who use data and have some level of training and skills; and 3) consumers and community groups with basic-to-no skills in accessing and using data. The goal over time is to become clear about who falls in which “bucket” so the Working Group can advise HHS about where the challenges lie with respect to these user groups.

She invited suggestions of possible principles, generating these ideas and topics:

  • If you list something somewhere, there should be a definition for that thing (a glossary).
  • The article “Socio-technical impediments of open data” lists ten areas in which impediments to data use occur: availability, findability, usability, understandability, quality, linking and combining, comparability and compatibility, metadata, interaction with the data providers, and opening and uploading of the data. (Dr. Crowley will forward the article to members.)
  • To create usability, findability and completeness, both ‘top-down’ and ‘bottom-up’ mechanisms should be in place.
  • Decentralization, commoditization, and transparency will create value.
  • Release interpreted data in addition to regular data.
  • Release data in ways that it can be used at multiple levels.
  • Monitor, evaluate, and reassess all principles and practices in order to update and revise them as needed.
  • Allow users to self-identify as to purpose of use, and to rate each aspect of the data.
  • There should be a ubiquitous mechanism for users to evaluate government data sets in an ongoing way for the benefit of other users.

Some of the major discussion themes and points:

Dr. Francis and Ms. Kloss observed that the principles in question are distinct from other important framings such as data stewardship that involve their own sets of principles. There are many issues around data repurposing.

Mr. Davenhall presented a set of slides and handouts on data access and use, stressing the huge volume and complexity of federal data holdings and focusing on “high-value data” and the most useful DHHS data files offering “the greatest stewardship opportunities.” (See transcript for details.) He also shared descriptive documents on the California public health open data portal and the OSTP data access plan. He commented that stewardship principles have to be in play to prevent the release of inaccurate data, and he recommended defining what a high-value data asset is.

Dr. Vaughn appealed for open-mindedness about how data are published “in the rush to publish as an API,” since PDFs may be just want consumers want and need. She also called attention to the work of the Federal Geographic Data Committee, who should be part of the conversation.

On the question of timeliness, Dr. Cohen commented on the evolving thinking about releasing provisional and preliminary data that is adequate for some purposes though not for others. He noted the many impediments to doing so, including law, resources, and politics.

Ms. Kloss proposed that Mr. Davenhall’s concept of illustrating high value could be converted into a principle related to the notion of “fit for use.” Other principles relate to the source of the data, confidentiality, applications, small numbers issues, metadata, the quality/timeliness trade-off. Mr. Davenhall commented on the concept of operational usefulness.

Dr. Kaushal generated discussion with the observation that “the intellectual capital will always be greater outside the committee rooms,” so work in these areas should take advantage of the intellectual capital available from “the millions of people outside.” Further, it is better to think in terms of use cases than an abstract hierarchy of data value, and those millions of people can provide the use cases. This relates to the principle (above) of decentralization, commoditization, and transparency. On the federal government’s standards with respect to data quality and reliability, there were caveats about inaccuracy and counter-arguments about “over-precision.”

Dr. Cohen commented on the limited usefulness of a lot of federally collected data for community-level decisions and action, and posed the question of what the federal government’s responsibility is for sub-state data.

National Health Interview Survey Data Release – Dr. Chris Moriarty, NCHS (slides)

Dr. Moriarty works on the National Health Interview Survey (NHIS) in the division of Health Interview Statistics at NCHS. After giving a brief overview of the NHIS, he described NHIS products, data release history, and future directions for NHIS data release. His particular focus was the division’s evolving approach to data release since the survey began in 1957. The current questionnaire was instituted in 1997, and there will be a redesign in a couple of years. The core content remains stable, with slight variations. In addition, federal agencies can sponsor questions that appear in supplements that change every year. Information is collected on 90,000 people in 35,000 households annually.

NHIS products include annual microdatafiles, annual reports (population, children, and adults), reports on specific topics (e.g., complementary medicine), FastStats (a weekly report), Early Release, and Health-eStats. Selected estimates and a special report on health insurance coverage are released on a quarterly basis. While many are long and technical, the reports are intended for a “very wide audience.” All start with a highlights section. The main dissemination method is through the internet. Microdata distribution is available free of charge on CD-ROMs and/or the Internet or, for a cost, on mainframe computer tapes. The focus of future data release will be primarily electronic. Although staffing for the website is limited, they have worked hard to have an accurate and up to date website and data release. The division has developed an online analytic system through which people can request real-time analyses of NHIS data, but it has not yet been launched.

Asked about the link between NHIS data and healthdata.gov, Mr. Davis said the latter serves as a system of cataloguing “where you can go to find the data”; it is not a “mirror” or copy of NHIS data.

The resources to facilitate usage and provide documentation include a survey description document, recommended procedures for analyzing the data, and example programs. There is also a data request team to field questions. A moderated NHIS listserv is available through CDC; about 3,000 people now subscribe. Mr. Davis said he would like to work toward closer coordination with the NHIS group, for example by using the healthdata.gov blog to support NHIS listserv announcements.

Mr. Crowley suggested providing a mechanism so that when people take sample data, change them and add code, there are ways to bring those innovations and uses back to the community to re-share them.

Dr. Vaughn suggested creating groups of beta users to give feedback on the new online analysis system – in particular, people within universities who have been sworn into the security procedure.

Dr. Mays explained to Dr. Moriarty that the preceding exchanges are a Working Group experiment in giving feedback to HHS about ways to be innovative in improving data access and use. She thanked Dr. Moriarty for participating in this experiment.

Ms. Bradley suggested archiving questions and answers on the NHIS site (and/or listserv) in a queryable manner so people can search for pre-existing answers to their questions.

In response to a question, Dr. Moriarty said NCHS does not have the resources to improve data access by developing APIs, though it would like to.

Dr. Rosenthal suggested collaborating with an entity such as the Health Data Consortium or other public/private partnership to improve data access.

NCVHS Usability and Access Parameters for Various Use Cases

Dr. Mays asked the group to discuss what kinds of feedback the Working Group might provide to NHIS and other HHS data suppliers.

One idea discussed concerned involving the Health Data Consortium in finding out what prospective data users would like to see happen and considering how the HDC could help (e.g., by hosting listening events).

Returning to the discussion of principles (see above), Ms. Kloss suggested sending e-surveys to gather data from the broader community on principles. Dr. Rosenthal suggested doing a session at the Health Datapalooza on this topic. The group explored the idea of doing a Datapalooza session and/or weaving this theme into other Datapalooza sessions. It was noted that RWJF is interested in convening people to develop principles.

Wrap-Up – Dr. Mays

  • Ms. Bradley will summarize the ideas on principles and other suggestions from this session.
  • Follow-up calls will be scheduled for October and November.
  • In the future, presenters will be provided with a set of questions based on the guiding principles – for example, a question on metadata.
  • The group will explore the ideas about collaborating with the HDC.

At Dr. Mays’ invitation, members then made announcements of forthcoming events in their spheres. Members then commented on the process and structure for this meeting. Mr. Savage wondered about how the meeting process fits into the Working Group’s charge, and suggested that the Working Group help integrate data relevant to the social determinants of health into the data system and the interoperability plan. Dr. Rosenthal suggested having a slide in front of members at meetings that state the goal(s) and deliverable(s) for the meeting. He and Mr. Crowley reiterated their interest in promoting the “bottom-up approach” by thinking through what it would look like and what technology would facilitate it. Dr. Suarez stressed the importance of identifying the desired deliverables. Ms. Bradley suggested creating a communication mechanism such as a blog or Wikipedia page where Working Group members could interact with each other and the public. Ms. Kloss urged “pushing the agility button as far as you can take it” and not being “over-concerned with process.” Ms. Jackson affirmed that “this is an incubator think tank.”

Dr. Mays then adjourned the meeting.