Department of Health and Human Services

National Committee on Vital and Health Statistics

Working Group on HHS Data Access and Use

November 14, 2012

National Center for Health Statistics, Hyattsville, MD

Meeting Minutes


The Working Group on HHS Data Access and Use was convened on November 14, 2012,at the National Center for Health Statistics in Hyattsville, MD. The meeting was open to the public. Present:

Working Group members

  • Justine M. Carr, MD, Chair
  • P. Kenyon Crowley, MBA, MS
  • Bruce Cohen, PhD
  • Leslie Francis, JD, PhD
  • M. Chris Gibbons, MD
  • Joshua Rosenthal, PhD
  • Walter Suarez, MD
  • Leah Vaughan, MD

Absent:

  • Bill Davenhall
  • Pete Hudson, MD
  • Mohit Kaushal, MD
  • Patrick Remington, MD
  • Paul Tang, MD
  • Kalahn Taylor-Clark, PhD

Lead Staff and Liaisons

  • Marjorie Greenberg, NCHS, Exec. Secretary
  • James Scanlon, ASPE, Exec. Staff Director
  • Susan Queen, ASPE, Lead Staff
  • Ed Sondik, NCHS

Others

  • Debbie Jackson, NCHS
  • Katherine Jones, NCHS
  • Marietta Squire, NCHS
  • Matt Quinn, NIST
  • Susan Baird Kanaan, consultant writer

Note: The transcript of this meeting and speakers’ slides (final part of the full Committee meeting) are posted on the NCVHS Web site, http://ncvhs.hhs.gov. Use the meeting date to locate them.


EXECUTIVE SUMMARY

The Working Group held a three-hour meeting on Wednesday, November 14, at the National Center for Health Statistics. After introductory remarks by the Chair, Dr. Justine Carr, Dr. Rosenthal gave a presentation about “real-life resources” for increasing access to and usability of public HHS data, focusing on Google Public Data Explorer and a challenge sponsored by Tableau and ReadWriteWeb. His examples illustrate ways to reach new audiences with a minimal investment, working with partners and subject matter experts. He noted that browsers are less expensive and technically difficult to build than applications and portals.

Working Group members considered experimenting with posting some HHS public data on Google Public Data Explorer. It was noted that some HHS data are already accessible through Google; NIH, ONC and AHRQ sources were mentioned. The group discussed different ways HHS might partner with Google (or another platform), representing varying levels of control and varying levels of effort to engage data users. Some participants favored experimenting with Google posting to gain experience, while others wanted to explore additional platforms and tools and to consider the options further. The group also talked about taxonomies and standardization, privacy and security, control over data uses, and whether or how analysis happens when data are pushed out to new users in the ways illustrated here.

It was agreed that before proceeding, the Working Group needs to know what of this nature HHS is already doing or has considered doing (i.e., using a Google browser and/or similar vehicles). The HHS data leads will be asked to talk with the Working Group about these questions, either at the next face-to-face meeting or in a webinar.There were several references to the overlaps and synergies between the Working Group’s interests and the NCVHS project on empowering communities to use data to improve local health, especially with regard to the missing technical infrastructure to support the work.

Members identified the following issues and tasks related to HHS data:

  • Identifying what data are to be released, in what format;
  • Protecting the data;
  • Targeting new and expanded audiences and assessing their data-related needs;
  • Letting people know HHS data are available;
  • Making the data easier to access and use; and
  • Providing tools and infrastructure to support data uses.

The Working Group also was briefed on CDC surveillance systems by Jim Bueller, Director of the Center’s Public Health Surveillance Informatics Program.

Next steps for the Working Group:

  1. Ask HHS data leads to talk with the Working Group about what HHS has considered, and either done or rejected, with respect to data release and posting on Google or other Web platforms.
  2. Further align Working Group and NCVHS missions and work plans.
  3. Explore additional Web access technologies, and possibly vehicles other than the Web.
  4. Hold conference call/webinar in January.
  5. Next meeting: March 1, 2013

DETAILED SUMMARY

Introductions

Dr. Carr commented on the various types of expertise represented on the Working Group and the elements of its charge. She reviewed its activities to date, which have centered on briefings on HHS data holdings. The Working Group’s tasks relate to both supply-oriented and demand-oriented challenges. She asked Dr. Rosenthal to talk about the supply side (below).

Mr. Scanlon noted that this unique group was hand-picked to bring knowledge of both HHS data and technology. Its charge is to help HHS extend access to its data holdings beyond its traditional public health, research, and health care audiences and to suggest new formats, platforms, and technologies for that purpose.

Best Practices and Practical Suggestions for Release of “Open” HHS Data – Josh Rosenthal, PhD (SLIDES)

Using slides, Dr. Rosenthal talked about “real-life resources” that could be used to engage people in using HHS data. After referencing his previous remarks about taxonomy, learning centers, business value, and so on, he discussed the use of browsers for data access and the use of challenges and contests to stimulate creative data uses and applications. (See transcript and slides for details.)

The options for HHS data users include using data and portals directly, building apps (the focus of HHS data liberation efforts thus far), and using browsers. The first two options require specialized expertise. After noting the relatively low volume of use of data.gov and suggesting why market value “hasn’t really happened” for the site, Dr. Rosenthal focused on the third option, the data browser, using Google Public Data Explorer as his chief example.It enables data uses and mash-ups for multiple purposes via drag-and-drop. In this scenario, which is popular in the tech world, data are defined and labeled with metadata and taxonomy. The Google site is populated by public data from many sources including the World Bank, the World Health Organization, and some US government agencies including HHS. Either the source can push the data or Google can “scrape” it from public sources.

Browsers like this allow the user to pull data and create things. As an example of what can be done, he showed the results of a public challenge by ReadWriteWeb and Tableau Software in which the winning contestant mixed various types of public data to identify factors associated with obesity and diabetes in the US. She also created informative visuals. Dr. Rosenthal noted that this was all accomplished with a very limited investment, leveraging partnerships

The presentation stimulated wide-ranging discussion among Working Group members about relevant activities that HHS might undertake, their appropriateness and feasibility, and where the Working Group should go from here.

Dr. Francis raised several questions about privacy. Dr. Rosenthal explained that both the source of the data and who uses the data are transparent; data are not hidden from the public but rather submitted at the (geographic) level deemed safe for sharing. He pointed out that for some purposes,a synthetic data set could be used instead, and also stressed the potential for “compounding learning” from activities such as these. Ms. Queen added that any federal agency posting data on such a browser would have determined the lowest acceptable level; further, restricted data will never be made available publicly. The theme of privacy continued as a thread through the ensuing discussions.

Mr. Quinn talked with Dr. Rosenthal about taxonomies and standardization, a topic that also wove through the discussion.

Mr. Scanlon observed that rather than using a peer-review process, this relies on the community of users to judge the quality, accuracy, and usefulness of the information. Dr. Rosenthal said Google is a closed system and its experts review the content, while Tableau opens it up. Noting how “disruptive it is for our traditional ways,” Dr. Carr characterized these activities as hypothesis generation. Dr. Rosenthal commented on the synergies with new forms of education such as online courses and added that the technologies he has demonstrated can be used to help solve business needs, including include public health and social needs. He said he envisions HHS data-based product development as a “layer” in a learning community or learning center environment in which there is work with the data and with information as well as development of applications for public good and/or business value.

Dr. Cohen posited that these technologies are evolutionary and follow web-based query systems as novel ways to disseminate information. He and others speculated about “put[ting] a data set out there and see[ing] what folks do with it” as a first step for HHS. Dr. Rosenthal said that in addition to actively encouraging and participating in the use of HHS data(rather than simply “letting Google use” the data), doing something like a prize to draw attention to the data can increase awareness and enhance the impact. He suggested considering a working partnership, talking with a subject matter expert and/or with Google directly, and doing a feasibility analysis of alternate approaches.

Dr. Cohen asserted that the business case and goal of data liberation is to proactively liberate the data in order to expand its use, and that approaches like those under discussion will encourage agencies to think more creatively about data dissemination. The group discussed ways in which data analysis is likely to change in these new scenarios. Dr. Cohen proposed that the data are being moved upstream into the hands of decision makers, who then become responsible for analysis. The group noted the implications of these changes both for data stewardship and for the locus of responsibility for conclusions drawn from the data.

Another theme of discussion was how best to work with NCVHS, given that HHS views the Working Group as a “reactor panel” while any Working Group recommendations must go through the National Committee. Ms. Greenberg suggested that HHS is likely to be more receptive to Working Group comments and recommendations that are based on careful deliberation.

Dr. Vaughan urged the group to be aware of the full range of tools and platforms before moving forward and to evaluate options in terms of their policy appropriateness. She encouraged fellow members to gain direct personal experience with the technologies, and pointed out the strong open access movement present in the U.S. and the U.K.

The session concluded with a roundtable in which members expressed their thoughts on these subjects. The points, several of which point to next steps, included:

  • Merits of creating an infrastructure that allows questions to be moderated by a community and that facilitates feedback of learningsto the wider population.
  • Synergies with the NCVHS project on helping communities use local data to improve local health, especially the needed technical infrastructure.
  • Having a better sense of who could and would use HHS data would help the Working Group target its efforts.
  • Underlying questions: what data to release, in what format and through what channel(s), and to whom.
  • Other issues: reliability, validity and completeness; data limitations; barriers to access; tools to improve usability; ability to aggregate the data; and improving analytic capabilities.
  • Start with the data the Department has already decided to release on healthdata.gov.
  • Move the data to where people are, rather than vice versa.
  • Goals include: expand the user base; bring a finer grain of data; make it easier to visualize. Process suggestion: ask data analysts how they want people to use their data.
  • Continue the search for a common languagein the Working Group. Pay attention to workforce issues.

CDC Data – Jim Bueller, Director, Public Health Surveillance Informatics Program

Mr. Bueller’s office runs several large CDC surveillance systems based on state reporting of reportable conditions. They include BioSense 2.0, a large syndromic system that tracks patterns of disease seen primarily in hospital emergency departments; and it funds the Behavioral Risk Factors Surveillance System (BRFSS), a telephone survey conducted by states whose results CDC aggregates for the nation. It also provides various informatics services that support infrastructure and public health surveillance. The three surveillance systems his office runs are a small fraction of more than 100 CDC-managed surveillance systems run by different programs including infectious disease, injuries, chronic disease, and maternal and child health. Most CDC public health programs have a surveillance dimension.

The National System for Notifiable Disease focuses on infectious diseases, which are reported weekly by states. Other systems such as PulseNet complement that; Mr. Bueller described its activities, which are aimed at identifying outbreaks associated with food and other products sold in multiple states. Another system is the Emerging Infections program, involving detailed record abstraction. For any given condition there may be a mosaic approach, including routine data on a broad national level plus supplemental approaches on a subset of cases. Different surveillance systems operate at different levels. Increasingly, instead of involving primary data collection the systems draw on data collected by others, particularly National Center for Health Statistics surveys and information systems and vital records. Different approaches operate on different timeframes and at different levels of detail and geographic coverage.

In response to questions, Mr. Bueller said some surveillance data are made available in public access databases, and in some instances, CDC provides de-identified individual-level data for research or public use. CDC has an inventory of its surveillance activities that it is working to make available. Its Wide-ranging Online Data for Epidemiological Research, or WONDER, has a lot of information, as does CDC.gov/BRFSS. He discussed the use of a distributed query system with Dr. Francis,adding that ONC is keen on a query health project and CDC staff have been working with it on the project.

Wrap-Up and Next Steps

Dr. Carr reviewed the findings from this meeting, observing that it is good that HHS has liberated its data; the Working Group has seen interesting examples and learned that usage could be higher; and it notes the intersection with NCVHS past and forthcoming work on empowering communities to use data to improve local health. She outlined the Committee’s recommendations for a possible federal role in its report The Community as a Learning System for Health, and invited further comments and suggestions from Working Group members.

Next steps:

  • Find out what HHS data are already on platforms like Google Public Data Explorer.
  • Talk with HHS data leads about what HHS has already considered or attempted in these areas, perhaps including NIH on its public/private partnership on Big Data.
  • Continue to align the Working Group’s efforts with those of NCVHS.
  • Explore additional web access technologies and (possibly) non-Web-based alternatives.
  • Hold a conference call and/or webinar in January.

I hereby certify that, to the best of my knowledge, the foregoing summary of minutes is accurate and complete.

/s/

Chair Date