Department of Health and Human Services


Subcommittee on Population Health


Modeling Health Insurance Data: Practice and Perspectives
—Coverage, Access, Utilization, Quality, and Cost of Care—

February 27, 2009

Hubert H. Humphrey Building
Washington, D.C.

Meeting Minutes

The National Committee on Vital and Health Statistics was convened on February 27, 2009, at the HHS Building in Washington, D.C. The meeting was open to the public. Present:

Subcommittee members

  • William J. Scanlon, Ph.D., Co-Chair
  • Donald M. Steinwachs, Ph.D., Co-Chair
  • Larry Green, M.D.
  • Mark Hornbrook, Ph.D.
  • Garland Land, M.P.H.
  • Walter Suarez, M.D.

Staff and liaisons

  • Marjorie Greenberg, NCHS/CDC, Executive Secretary
  • James Scanlon, ASPE, Executive Staff Director
  • J. Michael Fitzmaurice, AHRQ liaison
  • Debbie Jackson, NCHS
  • Nancy Breen, NCI
  • Dale Hitchcock, ASPE
  • Linda Bilheimer, NCHS


  • Emily Shortridge, NORC at the Univ. of Chicago
  • James Reschovsky, Health System Change
  • Brian O’Hara, Census Bureau
  • Monique Harold, GE Healthcare
  • Dmitriy Goryachev, Econometrica, Inc.
  • Carol Bickford, Amer. Nurses Assn.
  • (See list of panelists, p. 2)

Note: The transcript of this meeting and speakers’ slides are posted on the NCVHS Web site, Use the meeting date to locate them.


This was the second of two hearings hosted by the NCVHS Subcommittee on Population Health on data availability for policy analysis. The focus of the present hearing is modeling health insurance data on coverage, access, utilization, quality, and cost of care. The hearing covered two themes: access to existing data, and data that do not exist but should to support modeling. The Subcommittee is interested in the hurdles encountered and how the panelists have overcome them, the capacities they have developed, and their suggestions for recommendations to the Department to strengthen capacities. The panelists (in order of appearance) and the models, initiatives, and/or data sources they discussed are as follows:


  • Beth McGlynn, Ph.D., RAND: COMPARE—Comprehensive Assessment of Reform Efforts
  • Jim Baumgardner, Ph.D., Congressional Budget Office: HISim—Health Insurance Simulation Model
  • Jonathan Gruber, MIT Department of Economics (by phone): microsimulation model (not named)
  • Bowen Garrett, Ph.D., Urban Institute: HIPSM—Health Insurance Policy Simulation Model
  • Jessica Banthin, Ph.D. and Thomas Selden, Ph.D., AHRQ: MEPS public use files, augmented data, and AHRQ modeling activities; forthcoming blending of MEPS HC and IC.
  • John Sheils, Lewin Group: HBSM—Health Benefits Simulation Model
  • Joan Turek, ASPE, and Linda Giannarelli, Urban Institute: TRIM— Transfer Income Model, Version 3

The panelists described their models, initiatives and data sources and highlighted the most significant data gaps and limitations. Several panelists endorsed the list of limitations offered by Dr. McGlynn—including a lack of timeliness (a major issue), inadequate sample size, and limited access—along with her list of “fantasy data sets.” Several added their own wish lists. The panelists thanked the Subcommittee for convening this hearing, saying it is helpful for analysts to meet together in this way. In their final discussion, the group not only reiterated the major data gaps and limitations, but agreed on the need for fundamentally new questions and thinking in this area, in view of the current economic crisis.


Welcome, Review of Agenda—Dr. Steinwachs, Dr. Scanlon

This hearing on modeling was the second of two hearings, hosted by the NCVHS Subcommittee on Population Health and requested by ASPE, on data availability for policy analysis. The hearings covered two themes: access to existing data, and data that do not exist but should. The first hearing focused on health insurance data capabilities. In addition, on February 26 (the day before this hearing), the Committee approved a letter to the Secretary calling for the investment of more resources in data collection under the ARRA.

Dr. Steinwachs noted that at present, data are not adequate for understanding the issues the country is facing, nor for supporting and evaluating new investments as part of health care reform. He and Dr. Scanlon thanked Mr. Hitchcock, Dr. Dorsey and Dr. Bilheimer for helping organize this hearing. Its panelists, Dr. Scanlon said, are “the heavy lifters” who are trying to model aspects of the health care sector. He noted that health care reform will be a process over time, and the question is how information and analysis can be enhanced to support it. Today’s focus with respect to modeling is the hurdles and how the panelists have overcome them, the capacities they have developed, and their suggestions for recommendations to the Department to strengthen capacities. (Please refer to the transcript and slides for the details of the presentations.)

Panel 1

  • Beth McGlynn, Ph.D., RAND

Dr. McGlynn described RAND’s microsimulation modeling, and acknowledged the contributions of her colleague, Federico Girosi. The RAND initiative COMPARE—Comprehensive Assessment of Reform Efforts—began three years ago with two goals: to provide a factual foundation for a national dialog on health reform options, and to develop tools to help with the development of Federal and state legislation and policy as well as that of large private companies. It evaluates the effects of policy changes on multiple dimensions of system performance in the areas of cost, quality, and access, as well as looking at operational feasibility. These are both aggregate and distributional results. COMPARE is an agent-based microsimulation model that provides a way to look both at the personal level and at other units such as families, health insurance eligibility units, and tax units. It estimates premiums and insurance status endogenously.

RAND uses a variety of data sources (eight are listed). The base population is built out of the Survey of Income and Program Participation (SIPP) and linked to the Medical Expenditure Panel Survey (MEPS)-HC; and Kaiser Family Foundation’s (Health Research and Educational Trust Employer Survey (Kaiser/HRET) data are used to describe employer characteristics and benefits. Dr. McGlynn described the modeling of the behavior of the agents in the model. RAND is converting to a utility maximization framework, and the behavior of firms is mainly modeled using cost-benefit analysis. The results are intended to describe “the new state of the world at equilibrium following a policy change” in a two-to-three-year time frame. Employment variables are static, and analysis is performed at the national level. RAND also has a method for doing state-level analysis, but data for this are limited. It has modeled six policy options to date, including individual mandate with national insurance exchange and subsidies, Medicaid/SCHIP expansion, and Medicare buy-in. It also has implemented a “combo package” proposed by Senator Baucus. It is developing a provider module, to focus on behavioral responses to changes in payment policy and “health services delivery interventions.” A white paper on the microsimulation methodology is posted on the RAND Website, which lets users interact with modeling results.

Dr. McGlynn outlined the strengths and limitations of the data sets. The limitations include (for various ones) a lack of timeliness (a major issue), inadequate size, and limited access. She also described the difficulties RAND has had with MEPS HC-IC (Household and Insurance Components) and MEPS IC, both of which have restricted and inconvenient access.

Finally, she described five “fantasy data sets” RAND wishes existed. They include a large cross-sectional population survey that assesses both the insurance choice and the choices offered at the level of insurance eligibility units; a long-term, longitudinal, annual survey that follows people from birth to death; and data to understand the growing non-group market. She added that she worries that policy options are being modeled strictly on the Medicare population, which is quite different from the commercial market. She identified these priorities for Federal data:

  • Easy access to data sets that already exist, and better documentation, to enhance utility for modeling;
  • Timely availability of data (no older than 2 years); and
  • New data sets that improve the ability to evaluate options beyond coverage expansions.

Jim Baumgardner, Ph.D., Congressional Budget Office (CBO)

CBO’s central model (since 2004) for examining proposals to expand health insurance coverage is HISim—the Health Insurance Simulation Model, a microsimulation model. It is used with proposals affecting non-elderly health insurance coverage and the Federal budget, and was used in 10-15 of the proposals (of 115) published in CBO’s December 2008 Budget Options, Vm.I. CBO is also interested in distributional effects. A December 2007 paper posted on the CBO Website has more detail on the contents of today’s presentation.

HISim has an exogenous baseline, and has SIPP as its platform. CBO also uses other data sets (e.g., MEPS and Bureau of Labor Statistics data) and imputes people from other data sets into SIPP. The model allows for the multiple coverages some people have. These components are converted into health insurance units. The model is elasticity-based and uses elasticities from the literature to develop behavioral responses. Dr. Baumgardner showed a rough schematic of how the model is set up and the changes in it. The outputs are coverage changes, distributional analyses, and Federal budgetary effects. He discussed four types of proposals the model has been used for, the way the model is used in these instances, and illustrative conclusions: a regulatory change, subsidies for nongroup coverage, subsidy for small-group employer-sponsored insurance (ESI) and nongroup via subsidized reinsurance, and employer mandate (pay or play).

Going forward, CBO will be “in the heart of big proposals to change the health insurance system,” and looking at more options. Dr. Baumgardner agreed with Dr. McGlynn’s description of the data issues and said, “We certainly would like everything” — including a “huge sample” of people, their employers and fellow workers and details on insurance coverage, all in one big sample, with people’s health conditions and history of health spending.

Jonathan Gruber, PhD, MIT

Dr. Gruber was one of many panelists to thank the Subcommittee for convening this meeting. He noted that “even very smart, well-meaning people…put a number on something, they are going to differ.” The challenge is how to work through that and figure out which numbers to rely on; and for that, meetings such as this are useful.

He described his microsimulation model, which he said is “not so different” from others. The two inputs are data sources and policy parameters. The “black box” converts everything to price responses and makes behavioral assumptions. The output is population flows and dollars.

He referred to Dr. McGlynn’s presentation as a good overview of alternative data sources. His model uses the Current Population Survey (CPS) to get coverage and employer offerings. It is more recent and has larger sample sizes, but lacks certain dynamics that other sources have. State analyses can sometimes use state surveys to recalibrate the numbers. Employer premiums and nongroup premiums come from MEPS-IC, and tax rates are imputed using current tax law from TAXSIM. The model is divided into four insurance groups (ESI, non-group insured, publicly insured, and uninsured), in a 3×4 matrix with three avenues to buy insurance—non-group, ESI, and public. He described aspects of his methodology, and gave examples of questions and variables involved in predicting consumer behavior.

He stressed the necessity of fully integrated policy analysis, which characterizes recent proposals, in order to consider all factors and variables at once. Otherwise, “path dependence” can lead to very different answers depending on “which order you stack them in.” As part of this approach, Dr. Gruber built his model to minimize knife-edge distinctions. He described his work modeling firm behavior, saying it is similar to what others are doing.

Finally, he offered these caveats: First, this is a “garbage in-garbage out process”; the better the assumptions and evidence, the better the answer. Where no direct evidence is available, he recommends tying as much as possible to related evidence. Second, precision varies with the magnitude of change; i.e., the bigger the change, the harder to predict the impact, and “we need to recognize that uncertainty.” Finally (and key, he said), modeling and process, including key assumptions, should be as transparent as possible.

Bowen Garrett, Urban Institute

Dr. Garrett discussed the Health Insurance Policy Simulation Model (HIPSM), a microsimulation model of individuals, families (health insurance units/HIUs), and employers making coverage decisions. He noted that NCVHS members are “hearing a lot of variations on a theme now.” HIPSM builds on the Health Policy Center’s experience with the Health Insurance Reform Simulation Model, used to model reforms in Massachusetts. HIPSM is designed to cover a wide range of policies and to be expandable to ones not yet developed. Its output is detailed tables of the estimated effects of the reforms, including people’s coverage status. It covers four categories: uninsured, Medicaid/public, nongroup, and ESI. Among other things, the tables show the cost of reforms for different groups and changes in premiums. Output tables can be modified and extended for specific needs.

The model uses an individual-level data set that resembles the health insurance situation in the U.S. in a given year, for which the Institute uses as its core file the 2005 CPS, annual demographic file, matched and combined with data from other sources. Dr. Garrett stressed that they have to “bring all these variables together from different data sets,” and he speculated that “things are lost” in doing so. The model organizes workers into synthetic firms, and the data are adjusted to match various benchmarks such as income. Premiums are endogenous. Dr. Garrett explained more details of the modeling to simulate reforms, using a utility-based approach to behavioral effects. The utility functions are developed based on economic theory and are similar to those in the RAND model. He described the flow in the model and illustrated it with an example about Medicaid/SCHIP expansion. The steps are the changes brought about by reforms, their effect on workers’ willingness to pay, firms’ reactions, decisions by individuals/families, premium adjustments, and overall coverage. The model iterates until coverage is stable. Alternative assumptions, such as potential impacts of cost containment strategies and supply constraints, are incorporated into the results. The Urban Institute is conducting several studies using HIPSM, both national and state-specific, including looking at what would happen if there is no reform.

Regarding data needs, Dr. Garrett agreed with Dr. McGlynn’s list and pointed in particular to the need for 1) employer-employee linked data and 2) state-specific data combining demographics, coverage, health care expenses, and premiums, with large sample sizes. He added that analysts “do a reasonable job of compensating for the data we don’t have.”


NCVHS members raised the following topics for discussion: the use of elasticities and utilities and how the latter are estimated; whether and how the models are evaluated (“autopsies”); including health effects in the modeling; health disparities and minority group factors as variables; the need for more funding; modeling employer behavior in the current economy; and more on data wish lists. On the last, Dr. Breen asked what identifying information the panelists need to do matches, and what local/regional data would enable a more meaningful look at health disparities.

Panel 2

Jessica Banthin and Tom Selden , AHRQ

Dr. Banthin noted that she and her AHRQ colleagues “wear many hats,” in that they collect data and put out public use files, serve as a resource for other microsimulation modelers and researchers, and develop flexible microsimulation models and simulations themselves. Their statistical tools are available to others to help fill gaps. She talked first about MEPS Household Component (HC) data and household-based models and tools, and then about the MEPS Insurance Component (IC). These are public use files and products that are inputs to many of the other models described.

The MEPS household survey covers 35,000 persons in 13,000 households and is a one-stop data source for many key elements in the models. People are followed for two years. In addition to insurance coverage and employment, it includes detailed measures of self-reported health status and conditions. It includes non-group premiums and employee contributions to premiums, but the MEPS lacks the employer portion of the premium. Except for one year, it also lacks information on the adequacy of people’s coverage, which she noted is an issue of growing importance. She described several forms of augmented data that AHRQ has developed to make the MEPS more useful for modeling and simulation. Importantly, they align their data and periodically reconcile the MEPS expenditure estimates with the National Health Expenditure Accounts. This file is used to project forward to 2016. The failure to align benchmarks in the past accounted for “wildly different results” in earlier health care reform projections, based on different starting places. MEPS also has a detailed income and assets section that makes it possible to look at the tax effects of reforms.

Dr. Banthin referenced a paper by Selden and Sing, posted on the Web, and other papers based on MEPS-HC basic research to inform simulations. A key output regards the financial burden of health care as a function of family income, which can be estimated for specific subgroups.

Dr. Selden reiterated that AHRQ analysts are developing and enhancing data products and also doing their own modeling efforts. A key area in which they have been providing estimates is through the KIDSIM model, through which insurance take-up and the number of eligible uninsured children are estimated. It can also be used to track progress on public programs and to think about alternative scenarios, some of which he described. KIDSIM has been expanded to include program eligibility for non-elderly childless adults (PUBSIM).

Turning to MEPS-IC, Dr. Selden said a basic problem the MEPS-HC is that it is based on a household survey; while it shows the worker and the spouse, it does not show the other coworkers in the worker’s firm. The reverse is true for data based employer surveys such as the MEPS-IC. Both pieces are needed together, and that is difficult to do. Thus AHRQ has developed a methodology for statistical matching that brings together the HC and the IC. The process yields a very large sample with a good response rate. The augmented data are being used for microsimulation efforts, and more can be done with the model. He described some of the tools and their uses, and plans for further development. He invited the analysts to give him wish lists for when the two data sets are merged.

John Sheils, The Lewin Group

Echoing other panelists, Mr. Sheils said this kind of get-together is “good therapy” for modelers. The Lewin Group has had its model, the Health Benefits Simulation Model (HBSM) for 22 years and has analyzed several different plans. It recently did a comparison of the impacts of the proposals of President Obama and Senator McCain, and with the Commonwealth Fund modeled ten different Congressional plans.

The model is set up to model Medicaid and SCHIP expansions, premium subsidy programs, pay or play employer contributions, insurance market regulations, and changes in consumer price incentives. It uses MEPS data as the baseline database because it depicts the characteristics of the population, both individuals and employers. (He commented on why they favor MEPS, partly because it is conducive to a month-to-month simulation methodology like theirs. He noted that the month-to-month simulation feature is complicated but useful for policy.) The model also uses CPS updates on coverage and population and SIPP data on Medicaid enrollment. He said they “rely heavily” on data from the CMS Office of the Actuary, and they use CBO projections whenever possible.

He discussed the thinking behind the modeling process, how the model is developed and used, and how they bring in data from multiple sources. The model has a synthetic firm portion. (He noted that Lewin uses the word “synthetic” to warn people that they are “concocting something from the available data,” likening this to the way “certain poisonous frogs are a bright color, to warn people not to go near them.”)

He then presented illustrative data from Lewin’s modeling, commenting on some of the contradictions among data sources that make the modeling difficult and result in differing results. For example, there is underreporting of Medicaid coverage in the CPS. He concluded with his own data wish list: adding state identifiers to MEPS; stabilizing SIPP longitudinal coverage data; program cost data by eligibility group to match proposed expansions; and MEPS employer micro-data.

Joan Turek, HHS/ASPE and Linda Giannarelli, Urban Institute

Ms. Turek, the government project officer, and Ms. Giannarelli, the Urban Institute project director, discussed TRIM3—the Transfer Income Model, Version 3. TRIM is a modeling system that models all the major Federal tax and transfer programs affecting U.S. households. It has been in use for more than 30 years. It applies program rules to person records from survey data, using as primary data the CPS Annual Social and Economic Supplement (CPS-ASEC), augmented by other data sources. (CPS is adding point-in-time health insurance questions.) There is a web-based interface with a public use version. The programs TRIM models are health (Medicaid, SCHIP and ESI), cash income, non-cash benefits, and taxes (including medical tax credits). Its purposes are to impute eligibility for means-based programs, to correct for underreporting by imputing participation, and to compare annual baselines with “what if” simulations of alternative policies. Ms. Turek also outlined the quality review arrangements for TRIM.

Ms. Giannarelli provided further details of TRIM’s Medicaid and SCHIP modeling and ESI modeling. The modeling of eligibility is done on a month-by-month basis, capturing state-specific rules for each type. She showed some of the resulting data, such as eligibility by main reason for eligibility and by user group. TRIM is also used to model enrollment, correcting for underreporting in the CPS data for all social programs. She showed examples of Medicaid and SCHIP alternative policy simulations, and described uses of the model with respect to employer-sponsored insurance.

Work planned for the future include examining CPS-MSIS match data to compare simulated eligibility and enrollment with actual enrollment; completing 2006 baselines; improving Medicaid spend-down modeling; and developing a long-term care module. Finally, Ms. Turek showed the steps TRIM goes through to do a simulation.


To a question about comparisons of the models, Dr. Bilheimer described such an analysis by Dr. Sherry Glied of Columbia comparing three national models, funded by RWJF. The major lesson learned was “that there is an enormous amount to be learned from having multiple modeling groups modeling.”

Dr. Green observed that for him, the take-home message of this meeting is “how unnecessarily complex the system is.” He raised questions about geocoding and privacy considerations.

Dr. Scanlon referred to a recent IOM report on HIPAA privacy rules and the impact on research. He wondered how to convince the public that the rules for disclosure should be changed. Panelists offered ideas on how to increase protections and public confidence, such as with a certification program. The need for greater centralization of data collection was noted, and panelists described steps their programs are taking to protect privacy. Overall, Dr. Green said, the issue is to find ways to reap the potential benefits of research for population health while minimizing the risk of harm and allaying fears. He welcomed suggestions from the panelists on how to do this.

Dr. Steinwachs asked if data are available to project the potential health and related benefits from universal health insurance. Dr. Selden said the financial dimension of full coverage for families would be easy to tabulate. Dr. Banthin noted that even countries with universal coverage have health disparities. Mr. Sheils observed that quality of care remains the more important determinant of health outcomes.

Dr. Scanlon asked the extent to which the characteristics of policies are built into the models, and raised the issue of insurer behavior. Mr. Sheils said that selection effects are difficult to model, and other panelists also had comments. Some plans model approaches such as medical savings accounts.

Wrap-up and Final Comments

Dr. Scanlon reiterated that the challenge for NCVHS is how to think about and communicate to policy makers the reasonable balance between risks and benefits. Dr. Hornbrook pointed out the “really bad” data lag for this work, and noted the terrible current state of the economy and the anticipated impacts on Medicaid enrollment and state budgets. This “strange situation,” he said, “is way beyond some of the dynamics in the models we have now.” Dr. Green pointed to the need for new thinking and a new framing of the question in view of the “desperate emerging need.” Dr. McGlynn agreed, noting that the data infrastructure is “unbelievably inadequate” for what needs to be done to inform day-to-day decision-making; she asserted that this “needs to be the focus.”

Dr. Suarez offered a recap of the major points and themes raised during the hearing. First, he noted these categories of needed data and analysis:

  • Increase the size of population surveys
  • Have better longitudinal data sets
  • State specific data
  • Better understanding of the sub-national level
  • Linking employer and employee data sets
  • Better employer micro dataset
  • Better non-group market data
  • Look at subsets of the population such as minority health

He then identified these themes and ideas that emerged in the discussion:

  • Privacy issues and constraints on research due to privacy protections; the idea of a certification program for data users; and
  • The question of whether “a different approach altogether” is in order.

In conclusion, Dr. Steinwachs said the Subcommittee and Committee would continue to examine this area of data needs—a part of the vision for the NHIN that is moving “all too slowly.” With that, he adjourned the meeting.

I hereby certify that, to the best of my knowledge, the foregoing summary of minutes is accurate and complete.


Chair Date