[This Transcript is Unedited]
Department of Health and Human Services
National Committee on Vital and Health Statistics
Hearing on Minimum Data Standards for the
Measurement of Socioeconomic Status in Federal Health Surveys
March 9, 2012
National Center for Health Statistics
3311 Toledo Road
Hyattsville, MD 20782
CASET Associates, Ltd.
Fairfax, Virginia 22030
TABLE OF CONTENTS
- Call to Order and Welcome/Introductions – Larry Green, M.D., Co-Chair
- Summary of Previous Day – Vickie M. Mays, Ph.D.
- Panel: Data Linkages
- Jennifer Madans, PhD., NCHS
- Jennifer Parker
- Fritz Scheuren, Ph.D., NORC
- Panel: Methodology
- John Czajka, Ph.D., Mathematica
- Linda Giannarelli, M.A., Urban Institute
- Susan Queen, Ph.D., ASPE
- Committee Discussion
P R O C E E D I N G S (8:46 a.m.)
DR. GREEN: Good morning. Welcome back to the National Subcommittee meeting
on Population Health concerning the minimum data standards for the measurement
of social economic status in federal health surveys. This is day two of a very
interesting meeting. We’ll commence by introducing ourselves around the table
first, then we’ll go to people on the phone, and then audience. I’m Larry
Green, a member of the full committee, co-chair of the Population Subcommittee.
I have no conflicts.
DR. MAYS: I’m Vickie Mays, University of California Los Angeles. I’m a
member of the full committee, and I’m chairing the hearing. I have no
DR. QUEEN: Susan Queen from the Assistant Secretary for Planning and
DR. BREEN: Nancy Breen, economist at the National Cancer Institute.
MS. JACKSON: Debbie Jackson, National Center for Health Statistics,
MS. GREENBERG: Marjorie Greenberg from the National Center for Health
Statistics, CDC, and Executive Secretary to the committee.
DR. SUAREZ: Walter Suarez with Kaiser Permanente, a member of the committee
and the co-chair of the Standards Committee and a member of the Population
DR. COOPER: Dr. Leslie, NIH. I have no conflicts.
DR. CORNELIUS: Dr. Llewellyn Cornelius, chair of the NCHS Board of
Scientific Counselors and faculty member, University of Maryland. I have no
MS. O’HARA: Amy O’Hara from the Census Bureau’s Center for Administrative
Records, Research, and Applications. No conflicts.
MS. PARKER: Jennifer Parker from NCHS’ Office of Analysis and Epidemiology.
MS. GREENBERG: Let me spare you all, because only the members can have
conflicts. The rest of us have conflicts also, but not for this purpose.
DR. BARON: Sherry Baron from the National Institute for Occupational Safety
and Health, CDC.
MS. JONES: Katherine Jones, CDC National Center for Health Statistics and
staff to the committee.
MS. COOPER: Nicole Cooper, staff to the committee.
MS. HOLMES: Julia Holmes, Division of Biostatistics, National Center for
DR. MADANS: Jennifer Madans, National Center for Health Statistics.
Agenda Item: Summary of Previous Day
DR. GREEN: The first item up this morning is for us to remind ourselves
about the purpose of our meeting. Dr. Mays is going to help us review the
results of yesterday and where we’re arriving this morning. Vickie and I
thought that we should start off restating our purposes. Mr. Jim Scanlon did a
nice job summarizing this for us at the beginning of the hearing yesterday
morning. There are really three foci for our work here.
The first question we’re really focused on is what is the state of the art
and the standards for collecting data to measure SES in federal surveys today.
Jim also reminded us to make a distinction between possibilities in standards
with standards being a higher bar. The second issue is what are the variables
that are being collected? The third is what opportunities exist to standardize
these variables across federal surveys? With that, I’m going to turn this over
DR. MAYS: What I’m going to do is spend a bit of time talking about
yesterday just to try to lead us to today. We were really very fortunate in the
kind of stellar presentations that we had yesterday. Where we find ourselves is
to talk about data linkages and methodology today.
As background to yesterday, we started off talking about trying to get a
good sense of the definition of social economic status. In thinking about
social economic status, what our presenters were helping us to really focus on
is the components.
As all of you know who were with us yesterday — and if you’re just joining
us, I want to kind of recap just a bit — social economic status is an issue
that has been looked at. There is a significant amount of work, and there are
many ways in which to look at it. The discussion that we had yesterday was
looking at the issues of income, education, and occupation.
One of the issues that was raised in terms of thinking about our current
focus on social economic status, is whether or not we wanted to include the
issue of class. The issue of social class did come up. We worked a bit back and
forth about the American perspective on social class versus looking at social
class in other countries. We decided to discard social class as an actual area
that we’re probably going to focus on.
But instead, we will make sure that we pay attention to things like social
status, prestige. Going forward, that will actually be kind of one of the
litmus tests as we think about the issues of education, income, and occupation,
because in the literature on social economic status as well as social economic
position, what distinguishes it is really the issue of where it is that you sit
in society, and sitting in that place in society, its implications for health.
I think that we had some great presentations yesterday to help us make some
distinctions about that.
Moving forward, what we began to do was to look at the various components.
The first panel that we had, which we had stellar presenters on, Dr. Bauman and
Dr. Wong — what we heard was the ways in which education can stand as a proxy
for other things. It was like struggling between whether the focus is
educational achievement, educational attainment.
We were faced with listening to some of the issues around the quality of
education and how, for example, in a federal assessment when you have data
collection that says one has a certain grade level, whether that grade level is
really equivalent. Are there methods for being able to determine equivalency? I
think what we heard in the education panel is not only to be careful about our
definitions, but to be careful about what we pick. Should we pick and make a
suggestion about a standard? We should be very careful about making sure that
it’s something that’s going to serve us well.
We also heard kind of steps to move forward in terms of thinking about ways
in which to look at education that may add quality to the measurement of it so
that it is within the context of social economic status. Again, each time what
I want to do is make sure I bring us back to we’re not just measuring education
for the sake of education. We’re talking about these items as a part of a
construct, which is social economic status.
The next panel that we had, which was also equally as exciting as well, as
equally as challenging, was the panel on income. In thinking about income
relative to social economic status, it’s very complex. I think it’s probably
for us going to be something that we’re really going to struggle with.
We heard some very clear messages about continuous variables, about making
sure that the measurement that is collected is one that the user can actually
modify in ways in which it fits their particular needs. This was a panel, as
well, that raised issues about there are many different ways that income is
collected in the surveys. There are many reasons why, because the surveys vary
in what they’re using the income variable for. We also heard that we should
consider the poverty measure, the new approach to the measurement of poverty,
and look at that for the value of that particular approach.
Finally, the thing that I was also heartened about is that there were some
specific directions given to us in terms of people’s learned opinions about how
it is we should measure this in terms of recommending it, if we do, as a
standard, what the minimum would be.
Finally, our last panel was the occupation panel. I think we were very
excited also by the occupation panel in terms of understanding lots of changes
that have taken place. We heard a report about O*NET. We heard a report about
I think that going forth in terms of occupation, some of us thought that
this was an area in which we would really be struggling with what to do. But I
think, as we heard in the presentations, there are a lot of very exciting
things going on in terms of the way data is collected, the way data is being
coded to make the usage a bit better going forth.
I will stop here and invite my colleagues to also give any input in terms of
any additional comments that they want to make. I was trying to do just top
highlights as opposed to our next stage, which would be the deliberations.
MS. QUEEN: Susan Queen from ASPE. I just want to express my thanks to the
committee for taking this on. Keep in mind that the challenge for our surveys
in trying to implement any new standards, what we’re facing now with the
standards that were just adopted, which are perhaps much more straightforward
than anything related to socioeconomic status variables. Even implementing such
changes of those standards is definitely going to be a challenge.
It’s those kinds of things that we have to keep under consideration as we’re
moving forward with this, the survey purposes, survey costs, survey time, et
cetera. Just want to keep that in mind. I appreciate your taking this under
DR. BREEN: I want to thank the committee chairs and the chair of the hearing
for taking this on and doing such a great job and finding such great speakers.
I know we’re here to talk about standardizing SES, but I think some of the good
news that we found yesterday was that there’s already a lot of standardization
There’s already a lot of good thinking about relatively minor tweaks, like
in the poverty measure there are ways to improve that that have now been
explored for about two decades. We may want to consider improving the poverty
measure, for example, and to urge the department either through the Data
Council or through some other mechanism to have a more ongoing dialogue among
the different surveys. Maybe that would ASPE.
I’m not sure where that would come out of, but that we consider that
possibility, because I think there’s an interest on the part of all the surveys
to be more consistent in the way the questions are asked. As we found, it
doesn’t look like the task is really too difficult. I would urge us to consider
initiating a dialogue like that or conversation like that that we can do
regularly in order to make and keep the surveys fairly well harmonized.
Also, I think what we learned was that there’s an enormous amount of
innovation that’s going on in these surveys all the time. Of course, we want to
encourage that and keep that going and keep the dialogue going there so that as
good ideas are brought up and implemented in one survey, they can be used in
DR. GREEN: Walter, are you there? I know that you were on the line
throughout part of the day yesterday. Do you have any comments that you’d like
DR. SUAREZ: Absolutely. Then comments of the presenters were superb. I add
my thanks to everyone. The comments I want to make actually come from my
perspective on the Standards Subcommittee. Generally speaking, we know that we
have sort of three different major sources of information about healthcare.
One is in the surveys. Of course, I understand that the focus of our hearing
isn’t that. The other two major sources are the medical record of patients and
the administrative data that is captured and maintained, for consumers,
enrollees, members, and patients in the healthcare system.
I just wanted to bring the perspective that I think it’s going to be
important and useful to create a mapping of the way we would be looking at
recommending standards for capturing this type of information, socioeconomic
status information, in population-based surveys.
Then map that with the standards that have been and are being adopted for
how to capture this same information in electronic health records, for example,
and in administrative data. It will be very useful to try to ensure that there
is some level of mapping and harmonization, if possible, across and between
those three major sources.
We all know with Meaningful Use, the Meaningful Use standards are adopted
for data that needs to be captured or that electronic health records might be
capable of capturing. It includes some of the elements that we are talking
about with socioeconomic status.
Then on the administrative side in the administrative world there are
standards that are being used to capture in enrollment forms and in enrollee
data and in the claims data and in the reporting of encounters. There are
standards that have been and are being adopted to capture socioeconomic status.
This, of course, doesn’t have to be done during the hearing, but in the
background after we complete the hearings and begin to work on the information
that we receive in testimonies, we can look at mapping this to the data and the
standards that we use to capture this information in electronic health records
and in administrative datasets.
DR. GREEN: Thank you, Walter. Dr. Cooper, I know you were on the line quite
a bit yesterday also. Do you have any comments?
DR. COOPER: No. I just want to applaud the committee in setting up this
panel and selecting the speakers that shared information with us. This is
unbelievable information and very enlightening. I’m very interested in terms of
how do we actually look at the issue in terms of education, given the fact that
requirements for completing a high school degree vary within states, as well as
across states, so how do we adjust for that as some type of a proxy measure for
moving forward with SES?
I definitely would like to thank Walter for bringing out the point of the
medical records, because if we move closer and closer to having increased
utilization of electronic medical records, that’s an unbelievable data source,
and we need to think now in terms of how to capture SES.
MS. JACKSON: From the staff perspective, I want to follow up with what Nancy
mentioned. That is the value of this kind of interchange, because that came up
when Larry asked last night, where does this kind of communication occur.
Of course, the Data Council is there, but that’s at a different level than
what’s really kind of going here where the kind of details from various types
of groups that generally would not talk to each other otherwise, at this level
of machination of looking at the elements for data collection and really
getting to the heart and soul, the undercurrent, that came through in the
income, for me, where the code is one thing, code for a position, but what is
actually happening when you change location from an assistant in the White
House to assistant to somewhere in the neighborhood for a family? That kind of
communication and interchange really kind of came through. I’m looking forward
MS. QUEEN: Can I follow up on Nancy’s comment. When the recently adopted
standards were developed or agreed upon, there was a lot of collaboration.
Under the auspices of the Data Council, there was a work group formed. It
involved representatives across HHS, OPDIVS, OMB, and Census Bureau. I would
expect that kind of continuing collaborative process.
DR. KAPLAN: Just one quick comment on Walter’s comment about electronic
medical records. There is a government effort to harmonize the psychosocial
components of the electronic medical record, largely by Russ Glasgow at the
National Cancer Institute. I think it would be nice to try to coordinate these
MS. GREENBERG: In that regard, this summer I was at what they called, summer
camp, but it wasn’t exactly the way I remembered summer camp. It was from the
HIT Policy and Standards Committee, that were part of the HITECH Act out of
ONC. They were vocabulary and clinical quality measures work groups that I was
We had weekly teleconferences to try to identify all the different standard
vocabularies or code sets for a variety of measures that are being required in
Meaningful Use. They, again, looked to what the Data Council had done in these
first areas of primary language and race, ethnicity, et cetera.
Then when SES came up, they specifically deferred to what was going to come
out of the next round of investigation by the Data Council, which now the
subcommittee is helping to collaborate with. I think there is very much of an
awareness of where this is taking place, where these discussions are taking
place. Once this process is worked through, then they’ll pick that up again.
DR. SUAREZ: One specific example of real opportunities to not just try to
create a consistent harmonization of these standards, but begin to even expect
that some of this information is added to the capturing and the maintenance in
systems, on the occupational information side, recording of occupations.
I wanted to just enter into the record the report that was published back in
September of last year not too long ago by the Institute of Medicine that
focuses specifically on – this is the title of the report — Incorporating
Occupational Information in the Electronic Health Records.
There was a letter report format issued by the Institute of Medicine, and
the conclusion basically was there are basically at this point three important
data elements that are mature enough to be incorporated into EHR capabilities
requirements. Those were occupation code, industry code, and work relatedness.
I think we would have an opportunity here out of this hearing and in light
of, for example, the current review of Meaningful Use requirements for states
to consider a recommendation to the Secretary and to CMS and ONC regarding the
incorporation of this type of information into the EHR capabilities. So we move
the bar a notch forward with respect to capabilities for capturing SES in
electronic health records.
I just wanted to enter that on the record. I’d be happy to share the URL, if
people would like to see it, where you can see a copy of this report.
DR. MAYS: Walter, there is a going to be a response to you, but I would
suggest that you say what the URL is so the people online who are listening are
able also to access it. Dr. Baron from NIOSH is going to comment.
DR. BARON: Thank you very much for that comment. Actually, NIOSH was the one
that commissioned that report for the IOM. We have a very active work group on
electronic health records and would be very happy to collaborate with you.
There’s an effort now to try and get things ready for phase three of the ONC.
We would be very happy to work with you on that and have quite an active team
that’s been involved in this activity.
DR. SUAREZ: That is terrific. I won’t be able to repeat the URL because it’s
so long that it would take me 15 minutes.
DR. GREEN: I think we are ready to proceed to our next panel. We have an
hour and a half. Yesterday some of the most productive and useful parts of the
meeting were when our presenters started asking each other questions and
talking to each other. It was in the interaction. We are anticipating that
we’ll continue that model with these panels. We have an hour and a half for
presentations and also then for interaction and reactions to them. Vickie’s
going to lead us through these next few panels.
DR. MAYS: Thank you for being here. What we’re going to do now is move to a
panel in which we’re going to talk about data linkages. I think today is a day
that’s very important to us. Starting the day will be Jennifer Madans, who is
the Associate Director for Science here at the National Center for Health
Agenda Item: Panel: Data Linkages
DR. MADANS: Thank the committee for inviting us. I’m going to make some
brief opening remarks, and then I’m going to turn it over to my colleague
Jennifer Parker who leads the linkage program at NCHS. I think I bring the
historical perspective, but I realize that I probably do on everything because
I think I’ve been here now longer than most other people. I also wanted to
start with some reporting on behalf almost of OMB and the chief statistician
because some of this activity is coming out of her office.
But before I do that, I just wanted to comment on some of the discussion
before this on electronic medical records because I think it does have
something to do with data linkage. I think many of you know at NCHS we have a
long history of using health records as a source for our provider surveys.
One thing we have learned from that, as well as from actually the Vital
Statistics System, is that when you don’t have control over the source of the
information, you sometimes are not happy with what you get. On our surveys, as
we discussed yesterday about education and income and occupation, there are a
lot of challenges in collecting some of that information. There is a lot
missing on income. If people don’t describe what they do in their job in
sufficient detail, you can’t code it. Education’s a little bit easier, even
just talking about attainment.
But we have control over that. We write the questionnaires. We train the
interviewers. We have all kinds of fancy things on the computers that we use to
collect the information to probe and to get the information that we need.
When we don’t have that control, when someone else is the primary data
collector, you have much less control over the quality of that information.
When you’re talking about items that are perhaps not yes, no, check something,
or something that people may not be all that interested in providing in that
context, you have to be very concerned about the quality of that information.
I’m not saying that one shouldn’t pursue and that this isn’t a worthwhile
endeavor, but I feel duty-bound to say that one has to also be very concerned
about quality of that information. If we get occupation on the death
certificate, we’re not exactly sure how good it is. If we get educational
attainment on a birth certificate of a mother, we really have no way of really
showing the quality of that.
Other kinds of evaluations suggest that that information is not of the same
quality that you would get on a survey. We do have control over the data
collection process. Actually, it’s through linkage that we try to fix that,
that we have ways of combining data so that we can address some data quality
issues. Jennifer will talk about that a little bit, I believe.
Let me go back to OMB. This also kind of relates to other things that have
come up about just how much coordination is there in the statistical system.
What do we talk about? I don’t quite agree with Debbie. I think there’s a lot
of this conversation that goes on all the time, particularly in the area of
I think there’s been an acceptance for a very long time that data linkage
has a lot of advantages for the statistical system both from a point of view of
data quality, but also in terms of cost. This is not to say that data linkage
is free. It is not free. It has extensive costs, but relative to primary data
collection, they are less. I think the statistical system in its individual
parts and across the system would like to make its best use of all sources of
data as possible.
Many of the individual agencies, NCHS in particular, have been doing data
linkage for a long time. I think we started in the ’80s, at least linking to
mortality data. It has not been an easy process. I think that one of the first
agreements we tried to get — we weren’t even linking it, we just had to get an
agreement with the other agency — took us four years to do that, to just come
to an agreement.
At the statistical system level I think this has been recognized in that by
working together, we might be able to address some of those joint issues
better. There are committees. There is the Federal Committee on Statistical
Methodology — I mentioned that yesterday — that is specifically dealing with
administrative records and how to best use them.
Also, there is the Interagency Council on Statistical Policy, which is all
of the statistical agency heads get together. It’s chaired by Kathy Wallman and
the chief statistician. There are major activities involving the heads of the
agencies to try to develop practices and guidelines so that it is easier to get
access to the administrative data to bring the statistical community in itself
to other parts of the government, bring us together so it’s easier for us to
But then once we can share it, how do we do the linkages? It’s not exactly
straightforward how to do that. What do we do when we don’t have linkages? How
do we analyze the data? How do we make the data accessible to other users? It’s
no longer confidential when you do all this data.
There are a lot of things going on now that I think will improve data
linkage across the federal system. There will be a lot more consistency in how
we do it. We are happy about moving forward on that. NCHS is happy to be one of
the key agencies that’s been involved.
Coming back to the specific topic of this hearing in terms of the standards
and what one might do, there’s a lot you can do with linkage, but you have to
have something to link to. We like things to link to that are national
databases, that are consistent across geography, and that contain the whole
population that we’re interested in.
Mortality is a good one for us to link to. We actually have all of the
deaths, and we can link to them. But in terms of linking to a file that has SES
data on it that we could put on our surveys rather than ask about it, there’s
not a whole lot. There aren’t nice little inventories of everyone’s occupation
or everyone’s education.
There is information on income. We do have tax records. We have earnings
records. Those are very hard for us to get access to. There are a lot more
hoops to jump through to get that kind of information. One has to weigh the
cost of doing that with what you’re going to get.
I guess it’s been our opinion so far that that’s not worth it. First of all,
it takes a lot of time. We have to ask it anyway. I think we’ve looked more to
those kinds of linkages, if we could do them, as expanding on the information
that we would have on SES rather than using it instead of asking on the survey.
That may change in time, but right now for our data collection systems, the
population data collection systems, we’re probably going to have to ask about
these basic SES variables.
We did do a project, but I can’t remember how long ago it was. I tend to
tell people everything happened last year, but it must have been maybe five or
six years ago. We did a statistical link to CPS because CPS does get a lot more
detail in terms of income than HIS does. The idea was if we could do a
statistical match, not a real match, not an in-person match, that we could pull
that information to augment the survey. It was a complicated process, took a
fair amount of time. I think we have not done it again because it was so
time-intensive. But it is another possibility should the linkage become easier.
To get SES information about the individual people in our surveys, I don’t
see that happening in the near future. There are other kinds of linkages that
we can do that would add to that. We can do a lot of other kinds of linkages
through other data systems. I’m going to turn it over to Jennifer Parker to
talk about that a little bit.
DR. PARKER: I hope I don’t repeat too much of what you’re saying, but I just
want to emphasize that I’ve just got into the data production world, but I’ve
been a longtime data user. Typically we do rely heavily on the information
collected in the survey — it’s pretty high quality — to augment what we get
with the linked files.
Not only do we have more control over it, as Jennifer said, but if it’s
missing, we have developed wonderful imputation models for the Health Interview
Survey, so we can use all sorts of information collected on the survey and by
the design frame to even make our data better. Again, we have control over
When we link to mortality, for example, many people have used the SES
information to look at disparities in mortality outcomes. It’s not just the SES
information; it’s the race and ethnicity information, which isn’t the topic
here, but because all these things interplay with each other, being able to
simultaneously look at both of these things in terms of mortality outcomes with
other admin records like the CMS data, Medicare utilization, and experiences
with Medicare program. Most of our data users who come to the Research Data
Center to use these linked files are very interested in the socioeconomic
status information on the surveys.
When we do these linkages, as Jennifer said, we like to have the entire
world, or at least the whole country. We need the whole universe of people who
are eligible. Deaths is a good example. We also have program participation. We
have the Medicaid population.
Not everybody is in Medicaid, but when we link to these sorts of sources, it
allows us to examine things, like using our information about SES, how these
things factor into program participation. The fact that program participation
changes over time and we have a single measure of income provides some analytic
challenges, but I think that’s part of the strength of our surveys, to be able
to uncover some of these things.
The other thing that linking to programs has allowed us to do in pilot
studies — we have a pilot study with Texas where we’re linking up to food
stamp information. We’re able to actually use that linkage to make some of our
questions to determine program participation better. I think that some of these
linkages aren’t going to be directly related to asking SES on the surveys, but
are able to give us a better picture of the socioeconomic status of the program
I’ll just say we do have another pilot project linking to a single state,
the Florida Cancer Registry data. We took one year of our Health Interview
Survey and linked to multiple years of the Florida Cancer Registry, not related
to SES. But I’d like just to point out that about a third of the people linked
in the Florida Cancer Registry were not in Florida during the survey. They were
in another state when we conducted the survey.
When you link to subsets of the country, for example, if you could get all
high school records from California, you’re not necessarily going to get the
people who were in the survey in California. You might get people who moved
there from Montana. That provides some analytic challenges, but it means that
what you’re getting isn’t necessarily what you think you’re getting. We’re
working on some of these analytic issues.
Finally, we haven’t really talked about the contextual information. We are
doing a little bit more of that in house, but I know that when of the biggest
uses of the Research Data Center is to attach contextual information to our
We are in the process of coordinating some of the Geocodes in house. It’s
fairly straightforward. If we can use the census data at the census tract, at
the census block, at the county level, we can just merge to outside information
on the median income, that provides a way of augmenting the socioeconomic
status data in our surveys. There’s a huge literature on this, as most of you
are probably aware of.
From a statistical purpose from our agency, there are a lot of challenges in
deciding what the right unit of analysis would be, and still we have our
individual level of data. We know that there’s a lot of variation even within
these units. We would always defer back to our individual-level data, even if
we are augmenting with external population or area-level information. We’re
doing a lot of research in this research in this area too.
In short, I think our surveys have good information on the socioeconomic
status that we use for the record linkages and for the geographic linkages. I
don’t see us getting better data from the administrative records.
DR. MAYS: Tell us where you’re from and your position there.
DR. O’HARA: I’m from the US Census Bureau. I’m in charge of the
administrative records research section and the Center for Administrative
Records, Research, and Applications and the Research and Methodology
Directorate. The center I work in is abbreviated CARRA, which is handy because
it rhymes with my last name.
Within the center and during my time at Census, like Jennifer Parker, I’ve
been a data user. I’m going to give you some information about the record
linkage capabilities we have at Census and also echo the information that both
Jennifers have presented, that record linkages works well when you have data to
But that said, in my time at Census I’ve conducted record linkage with the
American Housing Survey, the American Community Survey, the CPS ASEC and the
CIP. Looking at those surveys, each one of those has offered unique challenges
whenever you need to validate the records and get them ready for record
I know that my colleagues at NCHS have experienced this. You don’t always
have consent to link all records in a survey, and you don’t have the ability to
link all records in the survey. So there are data quality issues with the
endpoint data. For the topic of your hearing this is very challenging because
the best information that you’re likely to get on income, education, and
occupation will be attached to those survey data. You have to understand how
you’re going to address incomplete linkages that often result from missing
One of the large projects that we’ve been conducting in the center at Census
is a match of administrative data to the 2010 Decennial Census results. To give
you an idea of the scale of missing data, 10 million of the census records
lacked name or data of birth information, which are our key identifiers that we
need to conduct record linkage. If you’re looking to conduct a person-level
match, those records are off the table right away.
But similar to what Jennifer has said, if you don’t need to do a person
match, if you can do an address match, whether it’s to the actual apartment
unit or to the building itself or to a broader area unit such as block or tract
or county, there are many opportunities to take data that can be assembled from
administrative sources that could be very useful for your purposes.
Back to the broader array of what we have at Census, we’ve conducted record
linkage projects to evaluate data quality. We do collect the income information
on all of those surveys, but we want to benchmark it against another source of
income data. We’ve matched to both the W-2s and the IRS 1040s. I believe Fritz
is going to be talking about this in a few minutes, so I won’t labor on that
We have also matched the surveys to the Medicaid data, to assisted housing
data, to FHA loan data, to understand whether the information that we’ve
collected in the survey seems to be capturing the information as indicated in
the administrative records data. The sort of match that I’m referring to is a
direct match. We’ve looked to find the same person in both files.
Other programs at the Census Bureau have used administrative records in
indirect applications. A good example of that is our smaller area estimates
branch. They conduct the SAPI program and the SAHIE program. These programs
take the survey data that we’ve assembled and augment it with administrative
records as predictors for the estimate of interest.
The direct versus indirect is key there to understand where the sources are
available. I believe for the SES you would be looking for more indirect and
relying on the survey data points that my other colleagues at Census, I’m sure,
What Census can do, because I believe I was invited here to discuss what
Census could do in terms of data linkages, is we have some data at Census —
and as I mentioned, Fritz will discuss these linkages to the IRS data. Title
26, the IRS tax law, states that they are to provide extractive data to Census
for Title 13 benefits.
That means that Census can get tax data and use it in analyses, provided
they have a census benefit. That’s going to get pretty squirrelly when you try
to match it to an NCHS survey. It’s not going to meet that standard. Title 13
is the Census Act. It’s the Census Bureau laws that describe what we will do
and how we will do it.
But Census does have the capability to conduct linkages for incoming survey
data with various forms of identifiers. Ideally, we like to see Social Security
numbers and complete name and complete date of birth and complete address, but
in the past two years we’ve started acquiring commercial data, which offer
record linkage challenges because the quality of data is not the same as the
information collected in federal administrative records or the federal surveys.
Again, I can’t state enough the challenges that come from record linkage,
particularly involving data quality.
But should someone come to the bureau with two lists, one of them being an
HHS survey and another being some magical list of occupational data with person
identifiers, we could conduct that linkage. Through the Census Bureau’s
Research Data Center network, it is feasible that we could host access for
individuals to come and use those data. We have the capabilities, but I just
wanted to restate the lack of obvious dataset candidates to link to the various
surveys. There’s a lot of promise in these datasets. You just have to have the
DR. MAYS: Thank you. Fritz Scheuren is Vice President of Statistics and
Methodology at NORC.
DR. MADANS: I know you all know this, but sometimes we forget. The US is
unlike other statistical systems. In Statistics Canada they’re all together,
and they have a statistics act that basically says everyone has to give them
information. That’s in the law. The US is not like that. Every agency has their
own legislation, and there’s very little of this you must give them these
things because it’s important for the statistical system.
The Census Bureau’s legislation Title 13 and its relationship to the IRS
data, does not include NCHS. We do not have that relationship with IRS. That’s
why I said we cannot get that data. If we get certain kinds of approvals and we
work things out, it would take a long time. But our authorizing legislations
are very different in this case in what they allow us to do and also what it
makes other people do.
One of the things that the ICSP is looking at is ways to work within the
system to maximize. We were talking yesterday about we ask a lot of health
questions and a lot of behavior questions on our surveys, and people answer
them, but ask them income and forget it. IRS data is the most sensitive data.
To get access to that, that is tightly controlled. I think that’s because the
population feels that way.
There is a lot of work being done now on trying to understand better what
the population thinks we’re doing, what they want us to do, through some trust
surveys. But I think that you need to kind of keep that in mind about the
highly sensitive nature of tax data and the fact that our legislation does not
kind of make it easy for us to get them.
DR. O’HARA: The Title 13, our statute does allow us to ask because it is
written in our statute that we are to improve data quality and reduce
respondent burden through the use of administrative records data. We have the
ability to seek them from federal, from state, from county, and from private
entities, including individuals. We can ask. That doesn’t mean people have to
agree and share their data with us.
DR. SCHEUREN: I used to be head of statistics at the IRS before I went to
work for Amy, which is one of the people I work for, by the way. I used to be
at the center. They had a scientific community thing. It was really nice. It
was in this room. There’s a lot about linkage that I know, and I’ll tell all of
it to you, but you’re not giving me too much time. I do know a little more than
I have some ideas that I’ve organized. You have the handout. I’m going to
use it line by line. This is a huge topic. You’ve already hear from three
people, and you must have a sense that you’ve looked at only one part of the
Moon, the only part that you can see. The rest of the Moon you haven’t seen. If
I believe that’s right, it’s a little over a third of the Moon that you can see
from the Earth. Until 1968, we hadn’t seen the other side.
I’m going to guess that the main link to you is something I say in the
second or third slide about no free lunch, this hugely difficult thing. Can you
earmark a specific focus activity and get something achieved? Absolutely. There
are some pets I will bring out.
One I might particularly mention now that I will come back and echo is I
work on Native American issues, Aborigines, as they are called in Australia. I
was just in Australia two weeks ago. If you have a very small population — and
the Aboriginal population in the United States varies from 2-4 million — and
you want to look at health issues, if we have numerator and denominator
problems and we have misclassification problems both in numerator and
denominator and misclassifications are not correlated, which they’re not, then
you have a serious problem.
This actually extends to other minority groups. You just heard about the
geographic problem earlier here about how when you go to match in one place and
you find people are not in that place — that’s really an issue to come back
to. I’ve been working on Native American issues for 11 years in the US. I have
a book coming out on it. I know nothing compared to what I need to know. You
heard the business about the Moon. The percent I’ve seen is a lot smaller than
that, even though I’m part Seneca, by the way, which is one of the Iroquois
Nations and worked with George Washington. First they were allies, and then
they were enemies, but they were always friends.
I’m going to talk a little bit about context weaving and a little about
paradigm. Context is the context we’re doing the linkage in and paradigm is how
you do linkage. It’s very cartoony, very sketchy, but it’s a way to open us up.
Then I’m going to give an illustration.
Amy has already mentioned that I’m going to be looking at matches to the CPS
that Social Security requested. They browbeat the IRS. You have to do that. I
used to be in Social Security too. Your earnings data is shared jointly between
the IRS and Social Security, and so Social Security purposes also bear here.
These are Social-Security-driven purposes because they’re part of HHS.
That’s an access window for you, but be careful. Amy has already warned you. On
the other hand, the HHS people at Social Security are part of your team. You
should avail yourselves of them. They really are good. They were even good when
I was there.
I’ll talk about the purpose of linkages and administrative data. That has
already been done. There’s a nuancing that you need to do with administrative
data, particularly in the linkage world. If the administrative data is being
used in linkage, like the W-2s are matched to the 1040s to check whether the
earnings data on the 1040s, not on the CPS, is okay, then they pay a lot of
attention to get that linkage right. But if you’re interested in a linkage just
to CPS or NCHS, you’re going to have a different problem.
You really need to move from a focus on data to a focus on information.
There is a big difference because you can get the information from weak linkage
and get it pretty well in the sense you get a decent point estimate and you get
a variance that you can measure.
You’d like to have an even better point estimate in a very narrow variance.
That’s not going to happen. You can’t afford it. But you can do the other if
you’re careful. Some work that I’ve done with Bill Winker(?) and others, and
some work that you’ll hear about in a few minutes illustrates that.
There’s this content versus coverage. Amy’s goal is coverage. She wants to
be able to find out who is in the census that’s in the administrative records
and vice versa and who’s in now commercial records and vice versa. That’s
because the goal of the decennial and the goal of the census is to count
everybody. A lot of other goals — and probably not your goal — are more on
the content side, not the coverage side. Both are needed.
For an example I’m going to use a project that Amy had. Amy worked out an
arrangement with the Treasury Department to look at EITC, earned income tax
credit, and to see to what extent it was being employed by everybody. That was
a joint project, but for a specific treasury purpose very different from the
decennial purpose. She knows about that. She can talk about that if you want to
I used to work on EITC when I was at Treasury, and we had a wonderful
response rate. We had interviewers that were packing, if you know what that
means. They had something right here. We had a very high response rate. I have
never had as good a response rate in any other survey. I don’t recommend that
to people, but I do want to tell you that that was what was happening.
Let’s look at some externalities. One of the things that you do when you do
linkage is to look at the editing and data quality problems. There’s a cost of
editing and a whole lot of issues. Susan, did I go off the reservation on that
DR. MAYS: No. We were just discussing that we just wanted to make sure
you’re aware that you’re on a public record, that you’re being recorded.
DR. SCHEUREN: I don’t have any problem with that. It’s a true story. It was
a marvelous process because the people at the IRS don’t really want to do EITC.
They see their job as collecting taxes, not benefits. I need to send some more
of them up to Canada because in Canada they collect benefits and taxes at the
same time. The Canadians are a wonderful neighbor for us.
We really need to learn a lot more, as I recommend you do, too, from the
Canadians. They have some really wonderful record linkage ideas. We’re going to
be talking about Fellegi-Sunter, that’s a Canadian paper, in a few minutes. Go
to Canada. They’ll be amazed that you’re there. Listen carefully. They know
their stuff. They’re very good, arguably the best or the second-best
statistical agency in the world. I was in Australia, and I think Australia may
have edged ahead of Satscan(?). I’m on the record here. Don’t kill me, Ivan.
The cost of editing is an issue. Confidentiality is a double-way issue. One
of the problems with confidentiality is that we now live in a data-dense world.
We used to live in a data-sparse world. The surveys we did and still do were
designed in a data-sparse world. They’re now being augmented with data from the
administrative records as we move towards a data-dense world. We’re not there
with a data-dense world yet.
One of the reasons within the statistical system in the government is
because of difficulties across agencies, but that is being addressed. You’ve
heard it being addressed. It’s being addressed very cautiously. I guess they
knew they were on the public record and I didn’t, but what they’re saying is
What they don’t give you a sense of is a vector. There is movement. It is
moving, but it is hard, but it is moving. The thing is that it is moving, the
Heisenberg principle, you know exactly where you are or exactly at what speed.
If you want to know both at the same time, you have to be a little bit
flexible. Jennifer did a very nice job of focusing on where we are. They are
doing a good job. Census is doing a good job. It’s hard to do, especially when
you’re dealing with Title 26.
There’s a complexity issue. We think when you match them together that you
can treat them as if you collected them at the same time. You know that’s not
true. You have all kinds of context issues that change. If you ask a question
in a survey and the next question comes from the administrator source, it is
going to be answered differently than if you’d asked it in the survey for
obvious reasons. That’s a problem, but not a bad problem. Explainability and
comparability are issues too.
This is really hard to explain. You’ve listened to some of the experts
already. They did a good job, but it’s hard to explain. Here’s the most
important bullet I have. This is very hard. Don’t try to get it done fast. If
you are like this — and there are people like this who believe that the way to
manage something is to give people a date when it’s due — don’t do that. Give
them things to do and let them tell you when they can get them done, and then
hold them accountable to what they tell you they can do. Don’t start the other
way. It’s just much too hard.
Let’s talk about the paradigm itself, the linkage paradigm. You’re going to
match two or more records together from different sources and you want to do it
uniquely, as distinct from doing it statistically. There’s a whole literature
about statistical matching, a very fine book by three Italians on statistical
matching called Statistical Matching. I did a book review in JASA a few years
ago. It’s typically called data fusion in Europe. I like the data fusion title
Anyway, you try to match two records uniquely, so it really is the same
person or the same entity or the same unit. At some levels you’ve already heard
— the geographic levels, high enough geographic levels — the matching is
almost unique, at lower levels less so, and at the individual level you have
all kinds of issues around data quality of the match itself because usually
unless it’s an administrative source being matched to another administrative
source that was designed to be matched, then you’re going to have these
problems of it didn’t quite match. Amy’s story about 10 million people without
names — wow, that’s a pretty big story.
What about the themes here? Obviously Fellegi-Sunter,, a very famous paper
in 1967, JASA, it’s a mantra. Everybody says it. None of the things that are in
Fellegi-Sunter are we doing anymore, but the idea of having two limits, an
upper and a lower bound, still exists. I’ll be coming to that in a moment. It’s
a mantra. We should keep that focus, but the tools we’re using now are very
Computer technology is very different. The role of paradata, which is a
recently coined word, which is how we got the data that we have, the process
itself, who the interviewer was, things like that, those turn out to be very
important. When we were doing the earlier work in CPS before the current
method, the earlier work was based on asking the Social Security number in the
CPS, but we aren’t doing that now, it mattered very much who the interviewer
was, because some interviewers would not ask that question. They just wouldn’t
ask it. Of course, if you look at the data, you could tell that.
We’re going to do a little bit about imputation because that’s my goal and
the work I’m doing and how to validate results and how to complete the
inference. I will not be able to complete my discussion of the inference today.
There are essentially three bounds. There are the true links, the upper
bound, which is really the thing you have to focus on the most; the non-links,
which is lower bound; and below the non-links you should start spending money.
That’s really the way I look at it.
Then there’s the middle ground, where there are links in there, but you
can’t tell. At the data level you can’t separate the true link from the true
non-link. The way you deal with that is by using methods that Bill Winker and I
have used and others have used. The kinds that were used in the old days.
Let’s look at this picture. These are the log of frequencies, because the
non-links are huge relative to the links, and we have to push that down. There
are two bounds in here. There’s the lower bound right here and the upper bound.
The ones above the upper bound are ones that we really believe are links.
They’re probably not all links, but there are so few non-links up there that
you have to model them, which we’ve done, to estimate, because you can’t afford
to study them electronically or any other way.
The middle group are the ones which used to be sent for clerical review.
I’ve done some studies with clerical review not just in this context, and
clerks aren’t always right, because they’re human beings, but they’re pretty
good. What they’re really doing is looking for other variables that weren’t
involved in the match itself that was used in the link. That’s what Bill and I
were doing and that’s what can be done now. The word we’re in is a data-dense
world, and we have lots of other variables that are computerized that we can
use, but that’s a new challenge and still open.
Let’s talk a little bit more about bounds. The bounds depend on the match
variables themselves. You really want to look at improving the match variables
if you want to do this well in whatever context you’re in. However, even if you
improve them, you’re still going to have a problem with the linkage
probabilities. They are model-based. What you can do is look at the robustness
through the models. But if you want to really nail that sucker down, get
All the variables on the files can be used for the inference, not just what
are typically called the match variables. That’s a paradigm shift for the
linkers. Fundamentally we think of, in a Deming world, you have a different
bunch of processes, someone finishes one process, they throw it over the wall
to the next person who does the next thing, and so forth all the way to the
Get rid of that idea. Look at it as a total system. You have to bring the
linkage into the analysis itself. You have to bring the analysis goals
backwards into the linkage. That the linkages are done ahead of in time of the
analysis should not be a barrier to you.
I’m going to say just a very little bit about this. I’ve said it already.
The one thing that I didn’t talk about is that we’re now linking not just two
systems or two sources; we’re linking multiple systems. That can mean that the
lower bounds are going to be different. The upper bound is set by the user. I
want this much quality.
In the context that Amy’s staff is in the upper bound is what the focus is.
There are many records being matched, and it’s the upper bound that has to be
done well enough. It’s so well done that you can’t estimate in a normal way
what the errors are above that bound. You have to simulate it and you have to
see to what extent the simulation results change the final results.
But the low bounds are different because the quality of each record system
depends on the source. The sources are not designed for your purpose, except if
the sources are within an administrative system.
Let me talk about paradata. When I first started to learn about paradata —
I did a paper a few years ago on this — I was thinking that the linkage
process already had a lot of paradata, and we should use the linkage process as
a model to get better at what we were doing for surveys. But then I got into it
a little bit more and I realized the survey people have already moved beyond
the linkage people, and we need to turn it around and look at the linkage
process and do better at the paradata.
Some of the work that I’m doing with the CPS is focused on using the
paradata that’s on the CPS to aid the linkage. That’s very important. If I’m
doing the next linkage project, I’m going to focus a lot more on the paradata,
the context in which the information is being obtained, from all the sources
One thing you could do for that middle period between the true links and the
true non-links is to model that uncertainty. We’re recommending that you use
multiple imputation. I’ll say that in a minute. You have to move away from
counts and towards estimates. You cannot expect the linkage to be good enough
unless you’ve done it on purpose and paid for it. There is a big difference in
the context of using statistical methods for matching to administrative
Let me say a little bit about imputation. I’ve already mentioned multiple
imputations to you. This is an idea of Jon Rubin’s to handle the uncertainty in
the middle. You not only do a better job that way, but you get a measure of the
quality of the job you’ve done. That’s the great advantage of multiple
imputation. You get a distribution, not a single-point estimate or vector of
I recommend this highly, and I recommend you begin doing it and we begin
doing it. We are in CPS doing it. We learned from that what we needed to do to
do a better job of collecting the paradata, because if you get into surveys,
you’ll realize that not everybody is collecting the paradata the same way
within any survey — CPS, NHIS. It’s not being done the same way. There are
regional differences, for example. There are differences with interviewers and
supervisors. All of those need to be looked at if we’re going to use them at
the inference stage.
Let’s talk about validating results. I think you need to use small samples
to ground-check the data. Even though you have computerized everything, that’s
not good enough. You need to use this idea of Kaisen. This is a Japanese word
which is usually translated to continuous improvement. But if you focus on what
the Japanese are really doing, you keep this word because it’s a cultural
issue. It’s not a technology issue. We need to have that built into our
If you turn out to have worked as many years and in as many different
agencies as I have in the United States, you know it’s there. It’s just not
high enough up on the list of priorities, because we are competing with the
need to be comparable as well. That’s a challenge. Both are needed. I would
recommend you go to Canada and benchmark what’s being done internationally. I
don’t recommend you go to Australia because it’s so far away. I just came back.
I’m glad I’m back.
I mentioned Rubin’s multiple imputation. There’s a wonderful book by Bishop,
Fienberg, and Holland called Discrete Multivariate Analysis, 1975, that has a
great chapter on linkage. The chapter on linkage misses two points.
There’s nothing about the fact that you’re not going to get a perfect match.
That’s kind of big. There’s nothing about calculating the variances. That’s
kind of big too. But it’s a wonderful idea. The hardest part is to explain what
you’re doing in a way that allows the other persons involved to actually find
out whether you’ve been doing it, for their purposes, well enough.
We’re going to look at matching CPS to the DER. That’s the Ernst(?) system
at Social Security. It’s coming under Title 26, but because Social Security
wanted to do this, HHS asked about it, it’s being done. Our original goals were
to do this to look at robustness of the CPS poverty estimates, the changes in
The imputation rates have changed enormously from March 1962. I’m sure you
all know this. That was the first year that CPS did imputation for its income.
That was the year that Mollie Orshansky used to begin the poverty estimates.
Mollie was a friend of mine. She’s gone now.
Mollie’s first report was Children of the Poor, which came out in 1964. We
are going to — and I’m working with a bunch of people who worked with Susan at
Census — do a 50th anniversary series around that. A lot of work
has already been done. You don’t start on the year; you start way before, and a
lot has been done.
The relationship between CPS and DER is very important because imputation
matters. Imputation matters a lot more than it did when Mollie started in 1962.
By the way, interestingly enough, although I can’t get historically the
context, the matching to the CPS also started in October 1962, again asking the
Social Security number question on what used to be called the control card in
They don’t ask the Social Security question anymore in CPS, although they
still do in HIS. In HIS they have a much better way of convincing people that
it’s needed. I think that’s the main reason it continues, because it really is
a very important variable to have. But you can do pretty well without it, and
if you are willing to accept a somewhat wider confidence interval, what’s being
done at Census works fine.
I did mention proxy there. We’ll talk about that, but I’m going to show some
pictures in a minute. Another little footnote here is we’re looking at two
measures of earnings, one from DER and one from CPS, that have been matched. In
this context they were matched using the system at the Census Bureau because we
don’t have Social Security number. They are intervals of $10,000. We’re looking
at the agreement between when they’re in the same class.
Here’s a nice little picture. This is a histogram. This is the histogram
which looks at the people that agree and people that disagree, one below,
smaller class, one above, smaller class. The rest of these are all like this.
I’m just going to use this one.
You’ll notice, which you don’t expect, that the CPS data is bigger than in
the Social Security. That’s probably definitional. That’s very interesting
because we’ve been saying forever that there’s underreporting in the surveys.
I’m sure there is, but there are a lot of other things going on when you’re
matching surveys and administrative data, definitional.
One of the issues here has to do with some jobs are not under Social
Security. I used to be a newspaper boy. My sister used to be a babysitter. I
never was covered under Social Security at my newspaper job and she never was
covered under her babysitter job. Some jobs just aren’t covered, even though
maybe they should be covered. The rest of these are very similar.
This is the no imputes. This is what you want, very high agreement rate, but
look at what happens to the imputes — a lot more variability. That’s a
problem. Interestingly enough, when you’re looking at poverty — and Joan
Jurich(?) is the driver of this — there’s enough cancellation.
The poverty estimates that we’ve been producing for nearly 50 years are
robust to lots of data problems that exist in the survey. That’s a very
important policy finding. I would not have believed that. In fact, when I
started out, I didn’t believe it. I thought the opposite, and I kept pushing to
find out if, in fact, I was right. Eventually, even I gave up. There are
problems that have to do with misclassification, but even those problems are
This is just how we did it. You have all of that. Let’s ask questions or not
ask questions. Thank you very much.
DR. MAYS: We’re going to take questions of everyone. One of the things I
just want to start with to say is that I’m very appreciative of this
presentation. It’s just incredible. I really learned quite a bit. It is a lot
harder than I thought, so I have a great appreciation. Let’s start with
DR. KAPLAN: I am curious about your thinking about this issue of tolerable
range of error. Sometimes in the census assigning congressional seats precision
is very important, but in epidemiology when we’re just trying to get a sense of
relationships and we’re in a hurry, how do you address those sorts of issues?
What’s the thinking about how much error is tolerable for social science
research around public health?
DR. MADANS: I guess we often talk about fit-for-use. The question really
relates to what it is you’re going to do with the data, as you implied. Some of
the things that we collect for some of the analyses we probably could tolerate
a lot more error, and for others we can’t. The problem is we’re using the same
data collection system for both.
If you look at it from an NCHS point of view — and I think this would apply
to Census — we are kind of monitoring the nation’s health, so a lot of focus
of what we do is on this monitoring function. Have things changed? What are
monitoring now? We’re monitoring a lot of things about the healthcare system.
Small changes are important, especially in a short period of time.
Everybody’s timeframe has really shrunk. In the old days we used to put data
out years after it was collected. Now where is last month’s data? There has
been a big push for us to get more stable estimates and more valid estimates so
that the policymakers really do know what’s happening across time and things
like insurance rates and characteristics of the uninsured population. For those
we can’t tolerate as much error, whereas if you’re doing more of kind of a
multivariate, looking at various relationships, maybe you can.
There are some things that you have to do, just kind of the infrastructure
costs, to make sure that the things you need a high level of quality you’re
getting. Then for some of the others you can kind of let up a little. That’s
why we do have different, perhaps, quality requirements for different items on
the survey. But in general, since we don’t know what the information is going
to be used for, really trying to maximize the quality all over.
I think as a statistical system, we feel some responsibility for putting out
information that those who are not very familiar with the data collection will
accept as valid. A lot of the information we’re putting out is not to an
academic community where they can kind of look and say I know there’s a lot of
error in this income stuff and I’ll take that into account, but a very
different kind of audience.
There’s the accountability issue, the transparency issue, and just the
credibility issue. I think that tends to push us towards having less tolerance
for error than we might have if we were just providing data for the research
community. We do that as well, but it’s almost a byproduct of the other things
that we’re doing.
DR. O’HARA: Jennifer pointed out that NCHS wants to monitor what’s going on.
I think Census is similar, but we really want to measure what’s going on. To
echo what she said, fit-for-use is essential. You must know what you’re trying
to measure or what you’re trying to monitor.
In an application where we’re conducting record linkage, to try to
understand the characteristics of persons who appear to be eligible for a
program, but are not participating — we would like to know that we have the
same person in list one and list two.
If you’re looking at the characteristics of neighborhoods or housing
structures in areas experiencing foreclosure, again, it’s the unit of analysis.
If I’m matching at the address rather than the person, it has to do with what
questions I’m attempting to address.
Even in the algorithms that Fritz was describing, the data that you’re
putting into that match and trying to say is it the same person or is it nearly
the same person — our tolerance for that has to do with the eventual question
that we’re trying to address.
DR. SCHEUREN: There is a tendency — and I think rightly so, and you just
heard it said — to over-engineer things. If you want to build a bridge, it
better be good. It better be able to last a long time, and it better be able to
be such that anyone can get across it. If you’re in a jungle and you have a
single line, only an expert’s going to cross over it. But if you’re talking
about getting the semis across as well as ordinary people across —
On the other hand, my focus is on the conference interval. You’re the user.
You’re the client. You’re the one who has ultimately the resources to do this.
I’m not an employee of the Census Bureau. I work for the University of Chicago,
NORC. They don’t have enough resources, either one of them, to do the job that
you want at the conference interval that you want, which is really tight.
You won’t get them that by getting them to be smarter and faster. They’re
already doing a really good job, by the way. I’ve done this international
comparison. I’ve been in Canada a lot. I just came back from Australia. They’re
already doing nearly best-in-class, but they don’t have enough resources to do
it. This is for the record, too, I realize. They don’t have resources to do it.
Some of the problems we have, like with HIPAA — I was at a conference on
the Hill this week, and there are a lot of issues. We need to think about as a
total system, each user has to be cognizant of the other users so that the
resources that are needed get aimed relative. There were some things that are
fit for some uses, but not all uses. Those things are the things that should be
at the data centers for the academics. What should be published should be such
that anyone can use it.
DR. QUEEN: Fritz, you mentioned using paradata to approve record linkage.
How do you do that and what paradata do you use?
DR. SCHEUREN: What we are doing now is looking at the information the Census
Bureau asks already, not the interviewer name, but which records were done by
which interviewers. That turns out to matter a lot. There are regional
differences. Some of those are confounded with the fact that there are real
differences in the society in different regions, but some of them are
The thing that I’m driving most for is this proxy respondents. I don’t have
those charts right now, but you saw those beautiful histograms. That comes from
something you’ll see at the end of the handout that I recommend you look at. It
really gives you a way of seeing what’s going on in terms of relationships.
But when we look at proxy respondents, we’re going to find that the proxy
respondents fall between the high level of self-reporting and the lower level
of the imputed data. There’s a lot of variability in them. They’re a form of
imputation, in fact. They’re imputation by the people in the household.
Sometimes my friends at the Census Bureau will tell you that you can’t say
it that way. They’ll say you have to say this is the self-reported person and
this is everybody else, because the other persons could be in the household at
the same time during the interview. You don’t know that.
DR. KAPLAN: Do you do much in the way of what you might call planned
linkages? For example, CHIS is thinking about linking up with other activities
where they say can you provide consent to your electronic medical record. I’m
just wondering with income information, could you ask people with consent can
you do a credit check?
DR. PARKER: We do have an informed consent process. The survey respondents
are asked if we can link up their data for statistical and research purposes.
We don’t say we’re going to link to CMS specifically, but we do ask about
medical records. We don’t ask about the IRS and income data. We’re not allowed
to do that, income and credit histories and things like that. There has to be a
case that directly relates to health, and I think you can make a case that some
of these things do, but it has to be pretty close to that.
DR. MADANS: There is one of these committees at SCSM that is looking at the
whole informed consent issues. That’s why Jennifer and Amy said that there are
a bunch of our records we cannot link because we were not given approval to
link. There’s an actual question that says can I link. Then how do you write
that informed consent so that it’s informative but not constrictive? Because we
don’t know what records are going to be available later. We work on that
wording. We would like to be able to link to records as they kind of come up,
as new databases come up.
My understanding with IRS is their requirement is not just that we say they
said it’s okay for you to get this. You have to have written consent. The
consent has to be with a certain amount of time, so after three months it’s not
good anymore. I think there are even some issues about how long you can keep
This is where the two agencies, the IRS and us, are not in sync on this
linkage process, which is very different than the relationship they might have
with the Census Bureau, because of our different authorizing legislation. Yes,
we do exactly that, but we cannot link to any record. Some of the record
providers have their own requirements about that linkage.
DR. KAPLAN: This is maybe pushing it to the extreme, but is there anything
in the statute or in the human subjects regulations that would prohibit you
from paying people to consent?
DR. MADANS: We prefer not to refer to it as paying. We do offer incentives
in some of our surveys. We do a survey of immunization where we have to go to
the immunization provider, so there has to be a consent, but it’s a different
kind of process. We do have incentives.
As part of the OMB process and the IRB process you have to have that kind of
things approved. The IRB is concerned about coercion. If you pay too much
money, then a certain part of the population is going to agree to do it even
though they don’t want to do it, even though we don’t quite pay that much.
Then from the OMB process, this is part of your civic duty. They are very
wary of allowing incentives, although there was an entire conference on
incentives and how they should work. Usually they will require that there is
some experiment that shows — and they’re mostly interested in response rates.
We’ve never actually tried incentives just to get approval for linkage. I
think there is a feeling about the linkage that kind of gets close to a Big
Brother kind of thing, that you have to be very sensitive to invasion of
privacy. Unlike the ACS you heard about yesterday, which is a mandatory survey,
all of our surveys are voluntary.
If we present this data collection in a way that really turns off the
respondent, then we’ve kind of made it worse in terms of the data quality.
Again, you have to weigh how important is that linked data versus how much of a
negative effect it is going to have. That is kind of constant evaluation that
you’re going through all the time, which is also affected by what’s happening
in the environment.
Anytime there’s a breach on any kind of federal system like what happened
with VA, that has an effect on our relationship with our respondents. It’s a
fluid situation. We may do something one year and the IRB best practices change
the next year and you have to change. I assume it’ll be different in five.
DR. KAPLAN: In the commercial world there are linkages going on all over the
place. Every time you swipe your card at Safeway you’re actually getting linked
to all sorts of databases. They know who you are.
DR. MAYS: I’m going to turn to questions here at the table. I know I have
some and you have some.
DR. GREEN: I have three totally unrelated questions. One of them is about
this issue that we went by very early on in the morning about whether or not at
a federal level there is the location and place, what I’ve heard you call the
statistical community, which I suspect is understood by you guys better than it
is by me. I think you know who each other are. What is the adequacy of the
current situation in terms of bringing the statistical community together to
tackle hard problems? Is it ready to go? Is it highly functional? Is it working
well? Is it sort of working all right? Come clean here. What’s the deal?
DR. MADANS: I will talk from the federal system. I’m sure Fritz will have an
outsider’s perspective. There’s an OMB statement that identifies the agencies
that are part of the federal statistical system. There are 12 that are the main
agencies. We have a lovely little chart that shows all of them on a star with
NCHS at the top. These are kind of the big 12. Then there are a bunch of other
agencies that have statistical functions, but they’re not considered a
NCHS is a statistical agency, Census, BLS, CS. There are these 12 of them.
Their primary function is data collection. Even though they’re embedded in the
departments, they’re governed by principles and practices of federal
statistical systems. That’s something that comes out of SIMSTAT(?). It’s a
book. We can give you copies if you like. There are also international
principles that we also have to kind of go by. There are directives that come
out of OMB that really control what we can do as a statistical agency versus
some other programmatic or policy agency in the government.
The heads of those 12 agencies meet monthly as part of this Interagency
Council on Statistical Policy. The focus of that is in OIRA, the Office of
Information and Regulatory Affairs. It’s in OMB on the management side. It’s
headed by the chief statistician of the United States. There is this ongoing
group that meets and deals with issues of the federal statistical system, and
they meet every month. They do a wide range of things. They’re looking for
things that crosscut.
There’s this other group that I mentioned yesterday and today called the
Federal Committee on Statistical Methodology. It’s a group that is chaired by
and organized by this group in OMB. It has representatives from agencies, but
you don’t go as an agency representative. The members are selected because of
their contributions to the statistical system, something like that.
They meet quarterly, but they have work groups. If you look at their site,
they do a lot of white papers, provide standards, guidance. This is the group
that put out the standards for statistical surveys. They historically have this
long set of white papers and guidelines. There’s one now on cognitive
interviewing that just started. There’s one on the administrative data. Even
though that group only meets quarterly, there’s lots going on.
Yes, there’s a lot of work. There’s a lot of coordination. There are a lot
of things going on. On the other side of the coin, we are not like Statistics
Canada. We do not have one authorizing legislation; we have 12. They are not
often in sync.
The reason that the US statistical system is the way that it is is that
there was more interest in having the agencies embedded in the departments so
they would be closer to providing the information needed by those departments
to do policy and program development and evaluation. If we were all in
Statistics USA, we would maybe have a lot of things that are much easier to do
and we’d have a lot more coordination and a lot more consistency, but we would
be separated from the needs of HHS or for Labor.
We meet with the Assistant Secretary for Planning and Evaluation and other
parts of the department all the time. There’s the Data Council, but there’s
also a lot of back and forth. What do you need, Madam Assistant Secretary, to
monitor changes in the healthcare system? There is a big connection there.
That means we have our own authorizing legislation and we also have our own
requirements to our departments. Sometimes those requirements to our
departments are in conflict with our desire to be a more cohesive federal
statistical system, so it’s a constant balancing act. Sometimes the departments
win and sometimes the statistical system wins.
I think we hope that in the end, we can meet everybody’s needs, but it does
mean that some of the coordination activities perhaps don’t go as quickly or as
directly as they might. But I think the counterargument is that’s because are
primary goal is to meet the needs of the departments in which we are embedded.
I actually think lately it’s been working quite well. There’s a lot of
interaction on these cross-cutting things. We’re really trying to move towards
more consistency where we think it’s appropriate, best practice, especially in
linkage. Probably linkage, I would say, is the one example where the federal
statistical system is working very well together, even though we’re not going
to get IRS to change Title 26.
DR. SCHEUREN: Congress changes Title 26.
DR. MADANS: You are right. I’m sorry. We’re not going to get IRS to tell
Congress to change it. Our little statistical world is very little compared to
changing IRS legislation.
DR. GREEN: You actually covered most of my second question that I’d like to
ask you on balance. What is the statistical community’s position in terms of
current law? Do you have the laws that enable you to do the right thing that
needs to be done, or do you have laws that impede you? Where is the balance
DR. MADANS: The answer there is really the same. There are some laws that,
if they were changed, would be better for us, but there is always the other
consequence. I prefer not to answer more than that.
DR. O’HARA: As I stated earlier, Title 13 gives us the authority to request
data from the various entities, but those entities have their own governing
regulation statute that often prevents them from sharing data with us.
DR. SCHEUREN: I used to be on this committee of 12. SOI Assisted Income
Division, which is where I used to be at the IRS is one of these 12 agencies.
IRS sits at the table, but it sits in a different way because of its different
focus, but nonetheless it’s central that we use the tax data for many purposes.
We have a good system — I think you just heard it well represented — but
it’s not like it was in the ’40s when we were really a single country and we
came together to solve a problem as a single country. Agencies matter more than
they should. Legislation is what it is. Its interpretation could be improved,
and some of the things we can’t do we could do if we would simply sit down
together and talk about how to interpret, and then get clarity.
One of the great things that the national center here has is an IRB. That is
enormously important because it’s the link between the agency and the American
people and the policy and thinking of the people. That’s a crucial step. You
should ask how many other agencies have record linkage activities that are
covered by IRBs.
DR. GREEN: To bring it back to our topic, so much of what we’ve talked about
has much broader implications than just assessing socioeconomic status. Coming
back to SES, a question for you, Dr. Scheuren. I didn’t understand the point
you were making about measuring poverty near the end of your presentation.
Would you mind running back over that again?
DR. SCHEUREN: The system of record that is used to measure poverty in the
United States is the Current Population Survey. Now it’s changed its name, but
still the same system, which is basically the March income supplement to the
CPS. We’ve been looking at that. It was the data system that Mollie started
with in March 1962 and her first paper, and there was a subsequent paper in
1965. We’re based on that.
The interest in linkage, I think, grew out of Mollie’s interest in measuring
the poor. That was a very full area. But in terms of the CPS itself, in early
going there was hardly any misreporting, hardly any understanding of
misreporting, and hardly any non-reporting. Remember, we’re still in the halo
of WWII. We were just very cooperate people then, so we got very high response
rates. That has really changed. About a third of the income that is measured in
the CPS is imputed, and yet we’re still doing pretty well.
DR. MADANS: The CPS is very concerned about making an estimate of the
poverty rate. That is where it comes from. It’s reported very often. We collect
information, as well, on our surveys, and we use the formulas. For each person
we identify where they are on the poverty threshold.
We are not interested in making estimates of the poverty rate. That doesn’t
come from our surveys. But we are interested in looking at health
characteristics by poverty status. Our concern is much more with
misclassification and the amount of misclassification when I say you’re 100
percent of poverty or 200 percent of poverty.
When we do our linkage back to CPS, because CPS can then link to IRS, we
know if we get the exact amount right, then, of course, we don’t have any
problem. But if we don’t, where are errors coming? Where is our
misclassification coming? Are we making people who are poor not poor or not
poor poor, and where in that continuum? That’s where our fit-for-use comes. If
we’re off on the total and it doesn’t affect the relative ranking, it doesn’t
matter to us.
There is a lot of conversation now in terms of we don’t have official
statistics, really, in the US. There are things that come out of the Census
Bureau that are kind of official, but after that there really is nothing that
this is the official statistics. We collect disability and they collect
disability and we get income. Which is the number? There is no number.
We have to figure out as a statistical community how to present this so
everybody’s not confused. Part of the key to that is I’m collecting it not
because I’m making the estimate, but because I need to use it for other things.
Of course, CPS uses it for other things as well. They’re looking at other
things collected on the CPS by poverty status, but their primary thing is to
collect poverty status.
DR. SCHEUREN: Joan Jurich led an effort to look at these cross-comparison
among the surveys, and John Sheiker(?) has the results. That’s a very important
activity. John may even be in the room now. That was a good answer, but that’s
where to go for a deeper answer.
DR. MAYS: Let me ask some questions about linkages, specifically in the
areas of SES. I want to start with education. One of the things that we heard
is that maybe an approach in terms of thinking about getting data that gets us
better quality data about education is to begin to link to information about
schools. It might be the nature of the school. It might be the context of how
well they perform. There are those type of statistics that exist in the
Department of Education. I’ve heard you all talk about linkages that are
predominantly between Census and NCHS. Can you talk about any plans or things
that you think you can do in terms of linking with education data?
DR. MADANS: Most of our linkages are not with Census. We don’t link at all
with the actual census. We link with contextual data from the census, but most
of our linkages are actually with CMS. It’s the healthcare data. It’s the
Medicare, Medicaid, and things like that.
If there is a national database that has consistent data at the Department
of Education, we could link to it. It would mean we would have to get the
linkage information into the survey. What I was hearing yesterday is we have to
get the name of the school. We would have to look into what kind of burden is
that and how hard is it to do and what’s the quality of the data. Then we’d
have to talk to Education about how would that linkage go, what do they have on
their files. We’d have to go back to the IRB and say do you consider this
within the informed consent. They may say no. I don’t know.
Given that there’s a cost of it, what would be the payoff? I think we would
have to justify internally that getting that other piece of information puts us
further enough down the road that it’s worth getting. I guess I’m not sure,
from what I heard yesterday, that the cost would be worth the benefit to just
get where the school is or what kind of a school is it.
California has the achievement data. That was a little bit more interesting.
But that piece of information is on the causal path to things. How one uses it
in terms of the kind of information HIS collects, which is cross-sectional —
how would we use it? I think we would have to think hard about that. I’m not
saying we wouldn’t do it. I’m not saying it’s not an interesting thing to do.
But because we have limited resources, we’d first want to make sure there was
something linked to. We would have to do all those things together.
DR. O’HARA: From the Census perspective, I’ve had some limited experience
negotiating with the Department of Education when we attempted to acquire the
FAFSA data, which is the free application for federal student aid. We got
pretty far into the drafting of an agreement to share the data before their
lawyers questioned what Census Bureau’s intent was with the data.
Their statute states that the data could only be used for the administration
of that program. Because our interest in the linkage exceeded that boundary
because we, of course, wanted to understand the quality of information that had
been collected by the Census Bureau, we then shelved that agreement.
You will get back to an authorization legislative change argument
potentially. But as Jennifer said, if there was a national database with the
school-level information, that would be great. They could attempt to approach
that. To my knowledge, right now they’re managed at the state levels. The
Census Bureau has been somewhat foolish enough in the past to try to negotiate
state by state.
That’s why we have a program that’s called LEHD, the Longitudinal Employer
Household Dynamics program, that in order to get income data, agreements were
written with each state to get the unemployment insurance wage information.
Then the data were collected at Census and harmonized in order to build this
If a national-level resource doesn’t exist, some agency may have to go and
investigate whether it’s worthwhile to negotiate and try to assemble the
national database, understanding the risks and the timeline and the cost
involved in gathering all the pieces of information and attempting to make them
into a national-level resource.
DR. SCHEUREN: The Census Bureau has a School Staffing Survey, which they do
for education. You could look at that. It’s done relatively frequently. It’s
quite good. It’s done for public schools and private schools.
DR. MAYS: The name of it is?
DR. SCHEUREN: School Staffing Survey. It’s quite a good resource. I’ve used
it before. There are public records for public schools. There’s what we call a
frame in the sampling business. That’s available to you, and it’s pretty good,
by the way. We’ve actually analyzed that frame. Private schools are different,
but there’s some information about private schools that’s available publicly,
too, but it’s not complete.
DR. MAYS: Let me ask about a statistical approach here. One of the things in
terms of income is that we know that income may vary by area, so often you have
some type of geographic approximation that allows you then to be able to make
comparisons. In the area of education is that possible?
For example, if you linked this data, my next thing that I start worrying
about is ninth grade in Alabama is different than ninth grade in California.
You’ve managed in terms of income to be able to come up with a way to do a good
comparison. Do you think that there are statistical approaches where you would
be able to do the same for education?
If I were to buy a house, for example, in Alabama, the cost of the house
might be $60,000. Say, if I were to buy a house in California, the cost of that
house would be maybe $160,000. What individuals are able to do is if you’re
trying to give me a package that’s a housing package, you could tell me what
percentage of my salary makes a difference in terms of being able to buy the
house in one place versus the other.
Sometimes in terms of money what we often talk about is the cost of living
in A is different than the cost of living in B, so therefore your wages in A
would be different than your wages in B, but you could say that you’re really
still getting paid in a very similar fashion.
DR. MADANS: It sounds like you’re saying can we make an adjustment to the
quality of education. No, I don’t know any way of doing that. Maybe if the
education people said based on the standardized tests, we can kind of do a
discounting of what — we know what percentage of the population that census
tract has a given level of education, and they can do a discounting. If they
did that and we could attach it to the geographic, then it’s easy for us to get
it, but we’ve never done that.
DR. MAYS: That was the only thing that was suggested yesterday, was the
potential of the standardized test scores.
DR. SCHEUREN: Have you looked at the work they’ve done internationally
comparing fourth grade math courses and eighth grade math courses? You don’t
have to speak German. You don’t have to speak Japanese. Just see the same
lesson taught in three different cultures, and then understand what our problem
is. You can see that regionally, too, but that work hasn’t been done, to my
knowledge, not much of it anyway.
DR. PARKER: As a health statistics agency, we do focus on those things, but
all sorts of data users come to the Research Data Center with their own
contextual data. Certainly, people with expertise or education or income bring
area-level data, and we can merge it, as I said, quite easily using any of
those census geographies. Or if it’s more pertinent, there are lat and longs
for most of our survey respondents now, and different types of measures can be
done. We rely on the people with those expertise, with those data, to bring
them to us.
Those data are owned by the people who bring them into the Research Data
Center in that we do not keep them and distribute them to others, although I
have got personally some research arrangements with people to do that. I think
those sorts of things can be done, we just don’t, as a rule, give out other
people’s data when they bring them for research purposes.
DR. MADANS: The other thing Jennifer should have mentioned is what we do do
are some of these kind of higher-level SES in terms of where do you live. Some
of those measures, let’s say, are air quality. We do link to EPA data on air
quality and transportation.
DR. PARKER: As a specific example, with the air quality, I worked with
people at the EPA to do that. Part of the arrangement was that those data would
be public, and also the transportation data. Both of those required expertise
from other people. Part of the in-kind I had a personal relationship to make
those available to the public.
Again, people just coming into the Research Data Center with their own data,
for example, education quality, for lack of a better word, at the census tract
or census block or county level, those would be owned by those people unless
they were collaborating with someone in NCHS to make them more public. Then we
would stick them on our webpage and say they’re available, and that would be
As part of that, part of the linkage would be an evaluation of it. We’re not
just going to take data from somebody else and say we’re going to release this
as an NCHS product. Both the air pollution and traffic data, our part of it was
an evaluation of fit-for-use — I like that term — but what does it mean, what
are the caveats, before we released it to the public, because we aren’t experts
in air quality or transportation.
DR. MAYS: Let me just ask about in terms of occupation, if there’s synergy
between NIOSH and NCHS.
DR. BARON: In 2010 we did an occupational health supplement as part of NHIS.
I think that’s probably the strongest place where we’ve collaborated.
DR. MADANS: Also, hasn’t there been some on the death certificates?
DR. BARON: We have something called NOMS, the National Occupational
Mortality System. We collect the occupation codes on the death certificates and
have a huge data system where people can look at that.
DR. KAPLAN: I am still sort of hung up on this roundtable of this dozen
agencies getting together and talking about things. I was just thinking about
harmonization is something that we’re interested in. On the other hand, there’s
a cost to harmonization also. As we learned yesterday, when people do things a
little bit differently, we discover discrepancies, and that leads to
improvement of the methodologies. I was wondering about this discussion among
the agencies. Is there discussion about sort of common methodologies that you
might experiment with in the future?
DR. MADANS: Yes. That is actually where a lot of the cross-agency
methodological work is done. It’s more on the methods side than it is on the
substantive side because where we’re all separate is in our substantive
interest, our subject matter interest. Where we all come together is in the
methods. That’s where most of the conversation is.
I think where people try to come together is, first, does it really make a
difference if we’re all the same, or does it just look odd, but it doesn’t
really matter? Or can we kind of come together where we have a common core? I
think this is where the idea of the minimum standards comes from.
You start with this core, and then you’re going to go off and do a whole lot
more stuff in income that we’re not going to do, but we’re going to do a whole
lot more stuff in health that you’re not going to do, but you’re going to have
those disabilities ones that we agreed on on your survey, so we can kind of do
a little bit more cross-walking. I think the key question is when are
differences important enough to get rid of and when can we harmonize them and
when do we really have to kind of come together?
The Census Bureau actually has another interest in this. That is they are
the data collector for us on a lot of our surveys. They would really like that
we all ask education the same way because their interviewers will do a better
job if they have to — oh, god, HIS does it this way. There’s been a push for a
long time from the Census Bureau to say let’s standardize the demographics. I
think the other push is, let’s standardize the core, and then kind of have
everybody go off on their own direction.
DR. MAYS: Nancy has to ask a quick question. You can tell this linkage stuff
— we’re really happy to have you here.
DR. BREEN: This is potentially linkage and also kinds of harks back to
Larry’s question about whether the federal government statistical agencies were
linked. Now that we know that there are some really good mechanisms for
collaboration among the federal agencies, statistical gathering agencies, I’m
wondering are there also any links with state and local?
That’s another thing this committee has been interested in, is community
data is very powerful for changes in public health and probably changes in
other things as well. But it doesn’t have an infrastructure, as far as we can
gather. It’s pretty catch as catch can. Is there any formal or informal
networks or ways in which the federal statistical agencies or individuals in
those agencies to communicate with people who are doing the same kind of work
or trying to do the same kind of work in local and state contexts?
DR. MADANS: I would say probably every agency is different on that. Probably
the Census Bureau has more contacts than anybody else, perhaps Labor, because
of the way they’re organized and get some of the information from state
agencies. Our biggest connection with the state and local is through the Vital
Registration System. We don’t have a lot of contacts with state-level data
collection. A lot of that comes out of Atlanta for the surveys that they do.
Part of the problem is it’s not always in the same place. We don’t have that
infrastructure. I think in the past there were more connections. I don’t think
they exist as much now.
DR. BREEN: Could you elaborate on that?
DR. MADANS: I think there have been some mechanisms that kind of linked
NCHS. Don’t quote me because I don’t know a lot about it. I kind of have this
vague memory, but it’s a long time ago.
MS. GREENBERG: Obviously, I think the core was always the vital statistics,
but then we were quite instrumental in helping to develop state centers for
health statistics. We have some intention and even an objective of gathering —
at one point we had responsibility for manpower data, and we try to collect
that now at HRSA. There are other areas, too, hospital district data, et
cetera, but we really just didn’t have the funding for it.
AHRQ actually works with all the states for their HCUP, the Health Cost and
Utilization Project. They work closely with the state entities, as Jennifer
said, some of which are state centers for health statistics, some are state
hospital associations, some of them are universities that collect. That’s one
of the strongest beyond the vitals.
As was observed and recommended in the report that the committee recently
put out, there isn’t a good infrastructure across the broader areas. There are
some states that have much stronger statistical capabilities and centers. Some
of these state centers for health statistics never really did that much more,
not that it isn’t important, than the vitals, but others are very robust. I
think that could be explored further by the committee.
DR. SCHEUREN: I was going to make a comment about the BFRSS. That has an
advantage in that it doesn’t have an OMB clearance. I’m saying that publicly. I
said it privately too. They have a distinction between fitness for use and
performance requirements. They’re using performance requirements definitions of
quality where fitness for use depends on the user. That should be what is done.
Census Bureau finds a way to balance that. The BFRSS does that.
DR. MADANS: Except it also causes a lot of non-comparable data.
DR. QUEEN: It has one income question, and then they have a single
categorical education variable.
DR. MADANS: On the health outcomes, some are very similar to the national,
some are not, because they collect what they need at the state level.
DR. O’HARA: Two of the agencies that sit around the ICSP roundtable are the
Census Bureau and the Economic Research Service of USDA. There is no national
database for SNAP participants, which the program that’s formally known as food
stamps that is now the Supplemental Nutrition Assistance Program.
We are engaging state by state to try to acquire and assess and harmonize
SNAP participation data from the Census perspective so that we can understand
whether we are collecting it appropriately in our surveys and also for
downstream usage in the supplemental poverty measures Fritz was describing. But
it is state by state.
In several of the states we are fortunate to be partnering with research
institutions like Chapin Hall Center for Children at the University of Chicago,
the Ray Marshall Center at UT Austin, and Jacob Franz(?) in Maryland, if you
can find research partners that have that in with the states and they really
understand their area’s data
In New York we didn’t have a research partner in that capacity, so we
actually wrote an agreement with the New York State Office of Temporary and
Disability Assistance in order to enter a data sharing agreement. There is a
different level of understanding of the data that we’re getting because we
don’t have a research partner cut in, but it’s time-consuming and it’s
coalition-building in order to gain information on the sub-national programs.
DR. MAYS: Thank you very much. This has been very useful. As you can see, we
could spend a day with you, but we’re going to do a couple other things today.
Thank you very much for being here. We really appreciate it. We’re going to
adjourn until 11:10.
Agenda Item: Panel: Methodology
DR. MAYS: We’re about to move to our panel on methodology. One of the things
in terms of thinking about the recommendations that we want to make, we also
want to understand lots of things about that and make sure that they are
reasonable. I think part of what we have had the opportunity to do is to hear a
little bit about things that have gone on in terms of the other 4302 standards
to give us some insights to help temper our thinking. Let’s get started with
John Czajka from Mathematica who is going to be talking about measuring income
DR. CZAJKA: I want to make some acknowledgements of sources of support for
some of the things I’ve learned and will be presenting here. I’ll talk a little
bit about some work that I’m conducted currently for ASPE and for the Census
Bureau. Another part of this talk comes from a report that I prepared for ASPE
with a colleague, Gabrielle Denmead, a few years ago.
I wanted to make a few general observations before getting into some
details. I suspect the observations may be more helpful to you than the details
that are coming. I have a couple slides devoted to a few conclusions.
If we’re talking about income data, there’s really no survey that gets it
right in all respects, despite the great effort the different surveys spend to
collect income data. There are limitations to all of these.
Another is that income is most difficult to measure in probably the bottom
third of the income distribution. Way up at the top there are great challenges,
but those people don’t show up a lot in our surveys, and we don’t report most
of what they say anyway. But the lower third, which is where the policy
interest is, is where we have the greatest challenges. This comes from the fact
that both income and family composition are less stable down there, and the
sources of income that people draw on are more varied.
Another point is that it is, for some purposes, important to distinguish
between current income — monthly income may be a way to think about this —
versus annual incomes. This is important because many of the means-tested
programs determine eligibility based on current monthly income. This is
something that most of our surveys don’t collect. Most surveys collect annual
data. For a lot of the population you can divide annual income by 12 and that’s
really what people are getting every month. That’s not the case in the lower
part of the income distribution.
Another point that I think occurred multiple times already is that
non-response to survey income questions is among the highest rates of
non-response that we seen in surveys. Although I’m not going to talk about
asset data, it’s worse for asset data. It’s a combination of people feeling an
invasion of their privacy in responding to these questions, and many of the
things that they’re being asked to answer they cannot answer readily or not to
the degree of accuracy that they think is being expected of them.
Something that we see in survey data and I’ll be looking at later is that
there can be a lot of rounding in how people report their income. This has much
bigger implications the lower you get down into the income level. This is part
of the difficulty of measuring low-income people, measuring their income.
Another point that’s really not often distinguished is most of our surveys
are household things. They go to an address, they collect data on the people
who live in the household. But within the household you have family, and it’s
not always the case that there’s one family in the household. Then there are
individual people. It can be easy to lose sight of this. We talked about
household income, but when we measure poverty, it’s family income that we’re
using. Something I’ll be showing you is that the poverty rate is sensitive to
how we define the family, who is included in the family.
Another point is that if we’re looking at ways to simplify or even improve
income measurement, it’s important to allot attention to the sources of income
to the extent that they represent total income. Some talk about with earnings
it’s 85 percent of income or something close to that. Presumably you want to
spend more effort collecting that than something that accounts for a tiny
fraction of income.
The last point — and this will also be the last part of the presentation —
is that the collection of retirement income presents a growing challenge. Given
levels of poverty that we can see among the elderly, this is a significant
problem going down the road.
There’s really no gold standard for survey estimates of income. We have the
Current Population Survey that is considered the official source of estimates
of income and poverty in the United States. In work I’m presenting here I’m
looking across six different federal surveys.
Given that there is a general tendency to underreport income in surveys, as
a general rule, if you get more income from your survey, you’re probably doing
better. Based on that standard, if we look across several surveys at some
aggregate amounts of income, we find that across the Current Population Survey,
the American Community Survey, the Medical Expenditure Panel Survey, and the
National Health Interview Survey, there’s just a 5 percent difference in total
income despite huge differences in measurement.
The estimate here for the National Health Interview Survey is based on one
question. The estimate for the Current Population Survey is based on 20
sources. The estimate for the Survey of Income and Program Participation
collects 68 sources monthly. Yet, as you see here, SIPP’s estimate of total
income was 11 percent below the Current Population Survey and 6 percent or so
below the National Health Interview Survey.
Another survey, the Panel Study of Income Dynamics, had a weighted
population that was considerably below the total for the other surveys, and yet
it had the highest aggregate income of all of them. These are differences
across surveys that devote varying amounts of attention, and you don’t
necessarily see that the more effort that goes into income produces the biggest
We’re more interested in looking at how income breaks out across segments of
the population than these aggregates. The aggregates can be driven by what’s
happening at the very top, and that’s less so when we look here. Setting aside
the PSID over on the right, one of the things that’s very striking is how
similar the American Community Survey is to the Current Population Survey.
The American Community Survey — and I guess you heard about this yesterday
— is basically the long form for the Census pulled out and turned into an
annual survey. They basically have eight income questions, but they do ask them
of every person in the household.
Two-thirds of the responses to these data come in through a mail-back.
There’s not an interviewer. People are sent the form, they fill it out, they
send it back, and yet this looks very much like the Current Population Survey
in terms of how income is distributed across quintiles. What I mean by
quintiles here is we take family income and we determine the highest FIPP, the
next FIP, and so on, based on persons.
We find that the Medical Expenditure Panel Survey, which includes a waiting
adjustment to make it line up at the poverty distribution in the Current
Population Survey, nevertheless has these differences. It’s higher in kind of
the middle part of the distribution, a good deal lower at the bottom.
SIPP, I mentioned, which devotes all its effort to collecting income, does
capture more at the bottom than the other surveys. It’s about 5 percent more
than the Current Population Survey, but it trails off pretty steeply as we go
up, and it’s only about 82 percent in the top quintile.
I just put these up to give you a sense that the population totals aren’t
the same across these surveys, even though they’re referring to basically the
same point in time. There are differences of a few million, and that can have a
difference in total income when you weight it up. If we divide things through
to come up with a per capita count, you can see what the differences are like
and actually how much per capita income varies across these quintiles.
You still see that SIPP is higher than CPS and ACS at the bottom. The PSID,
this is a long-running longitudinal survey. There may be issues about how
representative it has become after the 30-40 years that it’s been running, but
it’s quite a bit higher. You see the Health Interview Survey.
What you’ll begin to see is one question they get close to the total, but as
you start to look further at the data, you’ll see that there really is a price
you pay for that simplicity, and a big chunk of the price is down at the
bottom. This will come through more clearly later.
Earnings accounts for, overall, a little over 80 percent of the income. If
you look here at how these surveys differ, I think the most important thing
here is in the bottom quintile, that the ACS and SIPP are getting quite a bit
more earnings than the Current Population Survey.
In the case of SIPP when they’re going out and doing three interviews a
year, one could say that this is a case where it’s really helpful that people
with fluctuating incomes, you’ve got to get out there a lot to pick up all
these differences. But then how do you explain the ACS, which sends out this
mail questionnaire and gets even more income in terms of earnings from the
If we look at unearned income, we see a big difference. The ACS is quite a
bit lower than the CPS at the bottom. Some Census Bureau people have said that
part of this may be a classification issue. People are not necessarily counting
earnings in this survey the way they might in the CPS. They’re giving you
similar totals, but they’re breaking it out differently.
The National Health Interview Survey here has an extremely low figure. Where
this comes from is that while they have just one question for total family
income, there is an earnings question that’s asked of everybody. That’s what we
used in the previous slide. As you saw, even that was higher than the CPS.
Every survey was higher than the CPS in earnings in the bottom quintile.
What we did was to get another income estimate, we took this total and
subtracted the earnings. There were actually a lot of negatives. People
reported more income when you asked them to provide earnings for individual
members than they gave as their total family income. In some cases you look at
the data and it was pretty clear they weren’t counting this kid when they gave
a family total. It does show the hazards of oversimplifying the data
There’s a lot of interest in the poverty rate. It’s an important measure.
This is what things look like across the surveys. One thing that stands out
here again with the Health Interview Survey and this simple approach is that
the poverty rate measured by the survey — and these are back in 2002 — was
2.5 percentage points higher than the official rate that comes out of the
Current Population Survey. That’s part of the issue of really not getting a lot
of income out of the lower part and having it show up this way.
The PSID was a good deal lower. SIPP had the lowest poverty, slightly, but
it had the most people in this next group from 100-200 percent of poverty.
Other than HIS, it was slightly higher than these others.
There are pretty substantial differences — and I think Linda’s going to
talk a lot more about this — with respect to how surveys capture participation
in various federal and state programs. If we look at the combination of
welfare, which is what used to be called food stamps, SIPP’s approach clearly
is showing a payoff. It’s capturing these at a much higher rate than the other
The Health Interview Survey with their question on food stamps is getting
only 5 percent of the population participating versus 11 in SIPP. The actual
should probably have been around 13 percent at this time. We see a similar
thing with SSI. The bottom deals with health insurance coverage. That’s a
different issue we’ll pass over today.
The CPS has a family definition that’s been in place going back to before
the poverty rate was established. It basically includes all persons who are
related by blood, marriage, or adoption. Two or three critical pieces that are
not in here is that unmarried partners are not put together into the same
family in the CPS. There are actually instances where an unmarried couple both
parented a child, and yet one of the parents will not be in that family for the
purposes of measuring poverty.
Both the Health Interview Survey and MEPS apply a broader definition of the
family where they include unmarried partners. They also include foster children
who are not included in the CPS family. We were able to compare for these two
surveys the impact of using one definition versus the other.
This is something that’s been pretty consistent across other surveys, that
you will find that it affects the poverty rate by about a full percentage point
if you lower the poverty rate by a percentage point by including unmarried
partners and foster children as well. When surveys set out to define the
family, they may not be thinking about this, but this is a little implication
for income measurement that can come through.
As you see at the bottom, there were big differences among different subsets
of the population. Not surprisingly, single parents makes a huge difference
because a fair number of these parents have an unmarried partner.
You heard about non-response. Basically, to try to measure this across
surveys, it differed dramatically in terms of how many questions they asked.
Rather than looking at the frequency with which people failed to respond, we
looked at how much of the total income was imputed. We find pretty substantial
differences across surveys, starting with the Current Population Survey. In
this particular year about 34 percent of the total income that’s measured in
that survey was imputed. It was assigned to people who did not respond to the
The American Community Survey had a rate of only about half of that. It’s
not clear why it’s so dramatically different. The ACS is a mandatory survey.
Perhaps people carried this over to their willingness to respond to income
questions. But it’s a clear outlier. Everywhere else you’re seeing things that
are pretty comparable to the CPS.
We made a distinction here between whether the imputation was done using
information that you could say is related to the missing information. Maybe the
best example is SIPP, where people are interviewed every four months. You might
have a response the previous four months for an income source that you’re not
getting this interview. That’s information that could be used. Arguably it’s
better. We don’t know that for sure.
MEPS has a lot of non-response to their annual earnings question, but in
this survey they get somewhat better response to wage rates and hours worked,
so they do an imputation based on blowing up hourly wages to an annual figure.
In any event, they end up with between 40-45 percent of their total income that
had to be imputed in some way. HIS is very comparable to CPS as well. One real
consistency across the surveys is how unwilling many people are or at least
feel themselves unable to respond to the income questions.
We didn’t have any mechanism in our analysis for assessing how accurate the
reported incomes were. There’s been a lot of linkage work. I don’t know if you
heard anything about this yesterday or this morning. One of the things we can
look at is how much the responses are rounded. In particular, if you look at a
figure, what percent of the people are giving you an answer that’s divisible
exactly by $5,000 or even $10,000?
The problem with the rounding if you’re thinking about a poverty measure is
that rounding to the nearest $5,000 can make an enormous difference in whether
you’re considered poor or not. The other thing is that in trying to look at
changes in income over time, if people are rounding a huge amount, you’re not
going to capture those changes. Then you may see a huge shift.
We cap this at incomes below $52,500, and we look separately at personal
earnings, which is generally one question, versus total family income, which
could be the result of many questions, and ask how often was the figure exactly
divisible by $5,000. In the Current Population Survey it was 28 percent of
earnings was exactly divisible by $5,000. Even 11 percent of total income was
divisible that way.
In ACS it was slightly worse. SIPP is extremely low because it’s monthly
income that they’re collecting, so somebody would really have to be pretty
devious to figure out how to report monthly income that round out at the end.
MEPS was a good deal lower than CPS. But look at the result of the
single-question approach in the HIS. 36 percent of total family income is
divisible by $5,000 evenly. The figures run for personal earnings at about 40
That’s something we can quantify about the quality of the data. It’s a much
bigger concern lower in the distribution than it is. I think in the ACS when we
looked at it, the fraction of people who reported exactly $20,000 as their
income was really quite surprising.
I want to say a little bit about where income comes from. In the first
column here we were looking at the percent of families who report each of these
sources or a set of sources here, then continuing with less important sources
on the next slide. For the percent of families reporting, these don’t add up to
anything in particular. People can have many sources.
On the right we look at what fraction of the total income fell into
individual types. We see that wage and salary earnings for the whole population
was 77 percent of the total. Self-employment added about another 5 percent. You
look down to Social Security, you combine that with retirement income, you’re
talking about 10 percent.
But then there are a lot of components that are really pretty small with
respect to how much of the total they account for. When we move to some of
these other sources, they’re well below half a percent of the total. One does
have to ask what is the value of going extensively after all of these sources?
There are really two reasons why you want to go after a lot of sources.
One is you think you get a better total if you ask for a lot of the
components because people may forget that they had this account over there that
they get a couple thousand dollars from. But you may also want to know these
Way back when the Survey of Income and Program Participation was being
designed, there was a real focus on being able to measure eligibility for
various means-tested programs. A lot of these programs treat different sources
differently. They may not count income from a certain source or they may
discount it. So there was a need to bring in all of that detail.
You do see that you get the most bang for your buck going after earnings
because that’s such a big chunk. It does differ by income level. Here I’m
comparing people below poverty and people above 400 percent of poverty. The
share that’s due to earnings falls off quite a bit when you get down to the
poor. It’s about half, and you see that Social Security is pressing pretty
close to a quarter of the total.
But many of these other sources are small for both. We still collect data on
alimony, even though hardly anybody pays it. Even between the low and the high
end of the distribution, it’s really tiny, whereas child support is really
where most of this money is going these days. SSI becomes very important for
the poor, about 10 percent of total income. It’s almost nonexistent above that.
But the fact that among the poor these aren’t frequently really tiny is part
of this problem. This means for particular families many of these sources may
be the bulk of what they’re relying on for income. That’s why a simple approach
is not going to serve you as well with the low-income population as it does
We had some comparison of earnings between SIPP and ACS and the CPS. This is
an area where ACS, SIPP, and CPS are pretty similar in the fraction of families
and unrelated individuals who report that they have earnings. ACS and CPS are
also very similar down at the bottom in how they compare in total dollars,
whereas SIPP falls off really substantially despite this approach that SIPP has
of going to people monthly.
One of the theories about that is that when you ask about earnings
frequently and ask people to report it by month, there may be a tendency for
people to give you take-home pay, even though you’re asking them for total,
because at that level your take-home pay is the salient number. You know that
that is more than the gross.
But when you get to the end of the year, it kind of flips. People can tell
you their gross salary at the end of the year, but they have more trouble with
the take-home pay. That could be a lot of what’s going on with SIPP in losing
such a big chunk. You get the benefit for people with erratic income, but for
people with extremely regular income, this approach doesn’t necessarily work.
There’s another huge difference in self-employment here where SIPP gets
dramatically more people showing that they have it and dramatically more total
dollars. This was part of a consciously different approach in SIPP to define
what self-employment income is.
SIPP asks people to report not only the profit that they get from a
business, but what they draw in salary, because there are businesses out there
that are failing, doing terribly, but the owners are still paying themselves a
salary. That’s a part of their income. That’s something that has not been
addressed in these other surveys.
The last area I want to talk about is retirement income. The traditional
defined benefit, as it’s called pension, where your employer promises to pay
you a certain amount of monthly income for life after you retire is really
becoming history in the private sector. It’s partly a function of change in the
laws governing how adequately pensions had to provide for their future
But increasingly, employers in the private sector have turned to defined
contribution plans where they’re making a contribution to an account that they
employee can frequently manage in terms of how it’s invested. The employee can
put money in that. The IRS came along with complementary changes in tax law
that set up individual retirement arrangements where people can contribute to a
retirement account that accumulates funds with tax deferred that has no
relation with their employment. It’s something they put in.
If you look at where the funds are, in 2009 private defined benefit plans
held about $2 trillion in total assets and annuities, which is a way of
converting accounts into monthly payments, accounted for about $1.4 trillion.
But the DC plans were almost double what the defined benefit plans were and the
IRAs were comparable. We’re not yet seeing this stuff come through in income,
but it’s going to keep growing dramatically, and down the road it’s going to
become more and more important as a source of retirement income.
The basic problem is that the surveys haven’t really adjusted to this
change. If you have a 401(k) and an IRA, you don’t get a payment; you make a
withdrawal when you want the money. Withdrawals from savings have not been
considered income over the years.
But what’s basically happened is that we’ve changed the whole retirement
system from something where people got a steady income stream that was very
easy to measure in surveys because it was so consistent to one where people
have these savings accounts. So a lot of this retirement income has kind of
been moved offline with respect to how the surveys capture income.
Over time, part of the difficulty in dealing with this is that there’s this
conceptual question of what is income, really? As I said, we haven’t tended to
count withdrawals from savings as income, but that’s how these things work.
Unless we come to grips with this, more and more of the income that retirees
have access to is going to move out of the surveys and not be counted as part
of their support.
We’ll be looking at Social Security as basically most of the income that
elderly people receive, maybe classify more and more of them as poor when they
may be sitting on these multimillion-dollar — not everybody, but certainly
some — accounts and being able to withdraw money at will more than they’ll
ever need, and yet it’s not being measured.
We see a difference here between surveys. SIPP does a better job at this.
The top line here we’re looking at Social Security. Here we’ve got SIPP on the
bottom. CPS is hitting in terms of the number of families reporting at about 87
percent of what SIPP was.
But look down at these other sources. Income from a pension, which would be
mostly the defined benefit, was about 76 percent, other retirement benefits
about 37 percent, and then these withdrawals from IRAs only about 11 percent
being reported in CPS compared to SIPP.
The dollars look a little better for CPS. Social Security is pretty
comparable. But there’s still a 19 percent difference in pension money, about a
37 percent difference in other types of retirement, and then about almost an 80
percent difference in what’s being pulled from IRAs. This is back to the point
where CPS is our official source, and yet for something like this it’s doing
dramatically less well than another survey. That was it for prepared remarks.
DR. MAYS: Thank you very much. Our next presenter is Linda Giannarelli, who
is a senior fellow at the Urban Institute and works in the Income and Benefits
Policy Center. Thank you and welcome.
MS. GIANNARELLI: It’s a pleasure to be here today. I’m the Project Director
for the TRIM3 Microsimulation Project at the Urban Institute. Not surprisingly,
my comments are going to be pretty micro-level. We heard from John about some
of the big-picture comparisons of income collection across a bunch of surveys.
I applaud that report. It’s a fabulous report. It’s huge. It has some amazing
tables that really compare across surveys how they went about collecting
income, what their sampling frame is, all kinds of comparisons. If you’re ever
interested in those things, that’s the place to go.
The context for my remarks is as a user of the federal surveys that collect
data on income and program participation: CPS, ASC, SIPP, and to some extent
also NHIS and MEPS. I’m also going to focus primarily on the low-income
population and subgroups of that population.
I know that you all are interested broadly in socioeconomic status, but a
lot of the work that we do is focused more on that lower end of the income
distribution. Some of the things that in a broader context, may not seem to be
as much of an issue when we’re narrowing our focus to the low-income population
or even subgroups of the low-income population, they can be more of an issue.
Some particular issues are analysis of program participation, who is
receiving benefits from various programs, level of dependency, which is
something that ASPE is very interested in. There’s an annual publication on
indicators of welfare dependence not just looking at who’s receiving benefits
from programs, but also the extent to which people are dependent on benefits
from those programs. Also, the ability to assess the impact of hypothetical
changes in policies. That is some of the background to what I’ll be talking
about for a few minutes.
I want to focus primarily on two big challenges that users of these surveys
face. One of them is misreporting and primarily underreporting, but not always
underreporting. We can look at that in a couple of different ways. We can look
at aggregate counts in the survey data compared to actual totals. I guess I
should put “actual” in quotes because there are questions about
administrative datasets as well, but as close as we can get to actual. Also,
findings from exact match projects, which we also heard about a little bit in
the prior panel both from the agency staff and from Fritz a little bit.
The second issue that I want to touch on is missing data and how missing
data are addressed through allocation. Finally, I’ll close with a little bit of
a wish list.
Here’s a big picture. I just pulled this from the work done by some
colleagues at the Urban Institute, Austin Nichols and Karen Smith. They, along
with some other staff, worked on a project that was funded by the Census
Bureau. Urban Institute partnered with Westat in doing this project. The focus
of the project was on the income data in the ASEC. You probably heard about
that or may have heard about that from Chuck Nelson yesterday.
But just to kind of set the big-picture stage here, CPS-ASEC underreporting
looking at calendar year 2009 income data that were collected versus the
national income and product accounts. For each one of these line items we could
debate the suitability of the NIPA or alternate sources, but this is just one
overview way that’s comparable across all these income items.
You can see up at the top that according to a comparison with the NIPA, that
this year’s ASEC data captured 93 percent of wage and salary data. That’s
consistent with what we heard earlier, that the wage and salary reporting is
pretty good. But then we see some rather depressingly small numbers, 27 percent
of interest income, 14 percent of dividend income.
If we’re going to focus for a moment mostly on the lower-income population,
dividend income is probably not the most important thing, although I will point
out that if someone’s primary income is from dividend income and they do not
report it, then they may appear to be low-income when, in fact, they are not.
But if we look down at the bottom two rows at two programs that are very
important for the lower-income population, in that year’s ASEC data it looks
like about 75 percent of SSI income was captured in the data and about 56
percent of SNAP income was captured in the data. I’m trying to be a little
careful with my terminology here and say captured in the data rather than
reported in the data because there is a fairly wide gulf between those things.
We’ll look at that in a minute.
The previous slide was looking at dollar amounts, but often if we’re doing
some sort of research project, we’re not necessarily interested in exactly how
much someone received from TANF. We’re interested in, perhaps, the population
of families who received TANF versus non-TANF.
This slide is trying to look at to what extent is enrollment in these
programs captured in surveys. Again, I’m picking on the ASEC data here, but the
picture would not be terribly different with another survey. SIPP does do a
somewhat better job at capturing enrollment in these programs, but there is
still a substantial amount of underreporting of enrollment in these programs,
even in the SIPP data. I’m sorry I didn’t bring a slide on that.
In this particular slide if we look at the TANF enrollment reported in the
ASEC data for calendar year 2009, 58 percent of that enrollment is captured in
the data, including not only the truly reported enrollment, but the allocated
as well. 58 percent or so happened in this particular year to be the same for
Medicaid and SCHIP, if we combine those two and look at them together,
because there is often a lot of confusion between those for the reporting of
those programs, we do get much higher, 87 percent for SSI. Our internal
calculations for that are 76 percent.
Even if we include the allocated responses, we’re not getting a full picture
of the people who are enrolled in these programs. To the extent that someone is
doing research that is a shorthand for a portion of the low SES population that
is looking at enrollment in these programs, they’re not getting the whole
The last couple of slides looked at the ASEC data. To touch real briefly on
the ACS, as John mentioned, one thing that’s quit different about the ASC is
that many respondents are filling out the form independently, and also very few
income sources are collected individually in the ACS data. I’ve listed them
The only ones that have their own little box for you to write a dollar
amount in are wages and salaries, self-employment, Social Security, and SSI.
Everything else is a box that would be combining a bunch of different things.
All public assistance and welfare income is written down in the same box. All
asset income, interest dividends, and rent is written down in the same box. All
retirement income and all other income is written down in the same box.
Veteran’s payments, child support, workers’ comp, unemployment comp, and
anything else is all written down in the same box.
To look some possible implications of that, this slide says likely TANF
income versus program data. The reason it says likely TANF income is that we
don’t know exactly who is reporting TANF. We know that people are reporting
public assistance or welfare. I’m defining likely TANF income here as income
that is reported as public assistance or welfare payments that is reported by
low-income families with children.
If I call that likely TANF, we did some work at the Urban Institute recently
where we were focusing on poverty in three states using the ASC data. That’s
where those three states were coming from, from that project. If we compare the
likely TANF income to actual TANF benefits paid out in the state according to
program administrative data, in Georgia the ACS captures, including
allocations, 84 percent, Illinois 96 percent, Massachusetts 23 percent.
One’s initial reaction to this might be way to go, Illinois. Look at what a
great job they’re doing reporting their TANF income. Sadly, we don’t think it’s
actually such a great thing. Because we were focusing on these individual
states and we were also working with some partners in those states, we did look
quite closely at these data because that 96 percent just seemed too good to be
We looked not just at the public assistance or welfare payments that are
reported by low-income families with children in Illinois, but we added up all
those dollar amounts across everyone in Illinois in that year. Then we talked
with our partner in Illinois to ask what other public assistance and welfare
programs are there in Illinois that people could be reporting there. We tried
to add up all those dollar amounts that logically it seemed could be reported
We could not come up to the total that was reported in the ASC. Honestly, we
really don’t know what was going on there, just that it appears that some of it
was not actually public assistance or welfare or at least not what the survey
designers had in mind by that question.
Here’s the same kind of comparison for likely unemployment compensation.
We’re defining likely unemployment compensation as income that’s reported from
any other source that is reported by individuals with apparent unemployment.
Someone has written down something on that other line. It could be child
support. It could be veteran’s payments. We tried to make a pretty good guess.
If they were a veteran, we figured it was probably veteran’s payments. If they
were a single mom, we figured it was probably child support.
If we compare likely unemployment compensation income versus program data in
those three states, none of them cracked 30 percent. I think this is just
another illustration of John’s point that, in general, the more questions you
ask, the more you are going to get people to remember certain sources of
Finally, SSI income, which does have its own box in the ACS in each of those
states — it appeared that over 80 percent of the SSI income was getting
captured, although the data do show a higher incidence of very high SSI
amounts. By very high, I mean we know what the maximum SSI payment is for
someone in a particular state, at least the maximum payment. It’s the maximum
monthly federal payment plus whatever the state supplement is in that state.
If we take that and we multiply it by 12, ignoring for a moment people who
are getting a retroactive SSI payment, we should know the maximum that anybody
should get in a year. There’s a higher incidence of very high SSI amounts in
the ACS data versus the ASEC data, which may be related to greater confusion
with Social Security, since you are responding to this survey on your own in
I want to just mention real briefly what the exact match studies tell us
about how we’re doing with our current methods of capturing some of this income
for low-SES individuals. We heard in the earlier panel about one use of these
exact match studies, which is to see to what extent are people who appear to be
eligible for programs receiving or not receiving those benefits. But these
studies can also be used to look at how well are people reporting.
We heard from Fritz about the exact match work that he’s done with ASPE and
with partners at Census Bureau matching CPS with DER data. We also heard from
Census Bureau about some of their exact match studies. In general, focusing on
those projects that are looking at program benefits, many actual recipients
fail to report enrollment, and some reporters are not actually enrolled,
according to the administrative data. I just want to put up two examples.
This is published work by Julie Parker at the Census Bureau. At the end of
my handout I’ve got the citation to this. This was an exact match of 2005 ASEC
data with SNAP program data for three states. These are weighted numbers of
households, but keep in mind these are just those three states. She found
833,000 correct reporters, so they said they got SNAP and they were in the
administrative data. She found 922,000 false negatives. That is not a typo.
This is not in any way remotely to criticize the amazing work that’s done at
the Census Bureau, but I do think that sometimes when we say there is
undercounting and there is missing data, we forget, or maybe it’s just too
depressing to think about the extent to which that is true in some of these
cases. She also found 92,000 false positives.
Here’s another exact match project that I’m pretty sure most of you are
familiar with, the SNACC Project, or the Medicaid Undercount Project. I hope I
caught all the agencies there — CMS, ASPE, NCHS, AHRQ, Census Bureau, SHADAC,
RWJ. I just pulled a couple of numbers from the phase five research results,
which are also public. For calendar year 2005 they found 24.8 million people
correctly reported that they were enrolled in Medicaid.
There were 18.7 million false negatives, in other words, someone who was in
the administrative data as enrolled in Medicaid who did not report that they
were enrolled in Medicaid. There were 7.7 million false positives, people who
said they were enrolled in Medicaid who were not. Some of that was confusion
with Medicare. They came up with an adjusted undercount estimate of 32 percent.
Why don’t people report their benefits correctly? Here are a few possible
reasons. In the case of the ASEC data, one possible reason is the long
reference period. If the last point you got benefits was last January and
you’re being surveyed in March, you may forget. Confusion between programs —
we think there’s confusion between SSI and Social Security, between Medicaid
and Medicare, possibly between TANF and general assistance or state and local
assistance programs. Stigma is also a possibility.
Interview fatigue — you’re just tire of answering questions. You’ve caught
on to the fact that if you say yes to one more question, there’s going to be
six more, so you say no. There is a lot of ongoing research on this topic. I do
not want to pretend for a moment to be the person who knows the most about this
Census Bureau has a follow-on to that project. The Westat and the Urban
Institute has done work. There has been a lot of cognitive testing work on many
aspects of this. These exact match analyses are becoming more frequent, and
we’re continuing to learn more about who is and who isn’t reporting.
Just a couple of brief comments about allocated data. Just to prep this for
a second, one of the speakers on the earlier panel mentioned that there are
many different kinds of users. There are very knowledgeable users, people in
government agencies, people in large research organizations, who may be well
aware of the extent to which allocated data are a part of what they’re
I think there’s also a lot of users who just look at a survey like the SIPP
or the NHIS and say look, noticing data. I don’t think it occurs to all users
that what they’re analyzing was not actually reporting by the individual in
many of these cases.
This slide is just looking at the impact of allocated data on what we’re
looking at in the surveys for three particular low-income programs. The
percentages are the CPS-ASEC annual dollar amount as a percentage of the
administrative target. For TANF if we only look at the dollar amounts that were
truly reported by the respondents as TANF income, that’s 40 percent. If we add
in the allocated amounts, we get up to 57 percent. In the case of food stamps,
the figures are 43 and 55. SSI gets the highest numbers, 60 percent without
allocation and 82 percent with allocation.
I think that the Census Bureau does an amazing job of doing the allocations,
and the other data-producing organization. They have hundreds of area goals
that they need to fill in. They cannot spend a year on each one of them.
However, as a data user, we do see some issues with the allocated data,
particularly for low-income families. I put down three of them here.
One issue is that we’ve seen that people who are allocated to be enrolled in
a program, in other words, they didn’t answer the question, but the
data-producing agency has zapped them with a yes through hot-decking. People
who are allocated to be enrolled in a program are less likely to appear
eligible than actual reporters. If we run a very complicated, very detailed
microsimulation model, TRIM, to assess eligibility, we get a much higher
eligibility rate among the people who truly reported the program than among the
people who were allocated to be enrolled in the program.
In some cases it looks like the allocated data for the benefit programs is
inconsistent with known policies. It’s very easy to see how this might happen,
because the hot-decking cannot be sensitive to all the very detailed things
that are going on. I’ll just give you one example of that.
States have highly different earnings disregards for determining eligibility
for TANF. In one state it might be possible to have quite high earnings for a
portion of the year because a large percent of those earnings are disregarded
as an inducement to TANF families beginning to work. Other states do not have
nearly that generosity in their earnings disregards.
If someone who failed to answer the TANF question in a low disregard state
is hot-decked to someone in a high disregard state, you could wind up imputing
TANF receipt to someone whose earnings would simply not make it possible for
them to receive TANF in that low disregard state.
It is generally recognized that allocated income amounts can also make a
person who actually reported a benefit appear to be ineligible for that
program. When we look at things like who looks like they’re eligible for the
Medicaid program and compare that to did they actually say that they were
enrolled in Medicaid, what we do see not infrequently is someone who actually
They don’t look like they’re eligible for Medicaid. Their income just looks
too high to be eligible for Medicaid, but then when we look more closely, we
see that their earnings were actually allocated. So they truly reported
Medicaid, they didn’t mention their earnings, and they were allocated to have
an amount of earnings that is inconsistent with the fact that they — I’ll stop
because one person is nodding and saying that that is making sense.
Implications. In the aggregate the impacts of misreporting and allocation
may not be significant. It may not make more than a couple of tenths of a
percent of difference in the overall poverty rate or in the income
distribution. But the impacts for specific studies for subgroups where many of
us spend their time — not even in the weeds, but one little corner of the
weeds — can be substantial. Just to state the obvious, even though we use
allocated data as though it’s the same as the truly reported data, it’s really
My wish list. If I were queen, this would be some of my wish list, because I
read the instructions and it said that we could provide a wish list, so I’m
taking that opportunity. ACS, I would ask more individual income items. As
someone filling out, I’m not sure that I would fill that it was more burden to
see more boxes, given that I can skip over them if they don’t apply to me.
If I were queen, I would reinstate the question on work-related disability
as being perhaps more tied to some of these programs than the ADL limitations.
Also, asking if a household lives in public or subsidized housing, which is
very key piece of information for a family’s economic wellbeing.
CPS-ASEC — I haven’t spoken earlier about any education issues, but
thinking about just, in general, issues around SES — and I know that your
interest is not only in the short-term, but in broader, longer-term issues —
being able to identify in the ASEC data individuals who are combining school
and employment at any age, not just 16-24, rather than only being able to see
school attendance or evidence of school or training if that’s a reason why
someone is not working.
My general wish list is continued cognitive testing. Every dime that the
Census Bureau and other agencies put into cognitive testing is a dime well
spent. Why don’t people report all of their income and benefits? It’s not the
same to have people not report it and to then make it up afterwards.
The more that we can understand who we can change the question ordering, the
question wording, so that we get more of that information from the get-go, the
much better situation we’ll be in. That work is ongoing, and I hope that
whatever is learned from that can then be used in multiple surveys, not just
Continued refinements to the CATI/CAPI systems, possibly more prompts or
checks when someone says something that is inconsistent with something else
that they have already said. Finally, refinements to allocation methods. We are
never going to get to the point where everybody answers every question, much as
we would like to, but refinements to those questions, recognizing that the
Census Bureau can’t spend an enormous amount of time on any one question, but
possibly trying to think of ways that those allocation methods can be refined
to avoid some of the apparent logical inconsistencies that currently appear.
DR. MAYS: Our next presenter is Susan Queen, who is actually the lead staff
for this hearing. Dr. Queen is with ASPE.
DR. QUEEN: I am just sort going to sort of wrap things together in terms of
what we’re looking at for SES, relating it back to the standards that were
recently adopted for Section 4302 of the Affordable Care Act. Section 4302 has
special provisions related to disparities and specifically listed the variables
of sex, disability, race, ethnicity, and language as requirements to be adopted
within two years by the secretary.
The Data Council took charge and had an implementation work group that
involved representatives from OMB, from Census, and then across HHS to consider
how to implement such standards using the ACS primarily as the model for the
data collection, since between the CPS and the ACS, these were the measures of
the official statistics. Standards were adopted in October of 2011, announced
at the ABHA meetings.
These are the variables. Race and ethnicity, the requirement for that
variable was that it still had to be able to conform to the OMB standards that
had the five minimum categories. The point of the standards was to come up with
a minimum standard that could be more easily complied with by the surveys. It
wasn’t meant in any way to limit data collection.
The middle section of Asian, Indian, Chinese, Filipino, et cetera, all of
those can roll up into the one Asian category that OMB specified. The Native
Hawaiian, the four at the bottom, they roll up into the category of Native
Hawaiian or Pacific Islander.
This is the same with ethnicity, which was required under the OMB standards
for ethnicity. Hispanic, yes or no. Here they’ve just expanded the categories.
Again, they roll back up into the minimum.
Sex was biologic sex or sex at birth. Then how well do you speak English, H5
and above. Then the disability questions came from the ACS.
The point is they were construed to be a minimum and not in any way to be
limiting the data collection or the granularity that will be collected from the
federal surveys. They can collect more data as long as it can be rolled back up
into these categories.
The standards that could be applied to HHS national population health
surveys, specifically those surveys where the information is being collected
from the respondent, they’re self-reporting, or you have a knowledgeable proxy
reporting for the family or the individual.
Fritz mentioned not having a date for implementation. In this case we don’t
have a date. Implementation is to be carried out by the agencies for new
surveys if they’re developing new surveys, go ahead and include these within
the new survey. If you have an existing survey, we don’t want to interrupt or
interfere with the routine data collection, so when there are revisions being
made and perhaps at the time when surveys are going back to OMB for their
clearance, that’s the time to implement the new standards.
A monitoring tool has been developed to keep track as agencies are
implementing the standards to the surveys. It was disseminated through the Data
Council. It just asks each OPDIVS to report for each of their survey
instruments about how are you collecting these questions now? What’s a
timeline? What’s your schedule for implementing the new standards?
Common language, et cetera. I do want to mention something that Nancy said
this morning for HHS surveys. For the majority of the major surveys like NHIS
and NHANES, NSFG, and MEPS they’re either complying with the standards already
or getting greater information than what’s contained in the standards,
particularly for race and ethnicity.
The one area where I personally think there may be a challenge relates to
the questions on disability because the questions have to be asked using the
wording and response categories that are in the ACS. So a survey that gets
considerably more information or different information, like say you have a
disability that inhibits your ability to work, they’re going to have to ask
these questions. They can tree off from those questions, but it’s going to be a
change for some of the surveys, a big change for that particular area. But for
the most part, they already are getting the information as the ACS is.
Jennifer talked about the Federal Committee on Statistical Methodology and
the Interagency Council on Statistical Policy. Some of the work groups already
do a lot of the coordination and examination of how we ask what we ask. There
are also a lot of informal collaborations going on within the department and
across the federal community to look at research methodology, research
questions, how we’re asking what we’re asking.
A lot of the presentations over the past few days made references to the
work that’s been ongoing between Census, HHS, Labor. I think as we move forward
and we’re looking at the variables with SES, that kind of work is going to
continue. I think we want to encourage it because that’s one of the best ways
for us to benefit from data sharing and exchange from information and who’s
doing best practices with regard to asking the survey questions themselves.
If people are interested in looking at the data standards that were recently
adopted, these two links — the first one’s from the HHS Data Council, the
other one from the Office of Minority Health. Both of these provide information
about the implementation guidance. I think the OMH website has more information
on the background of the process that was used, the public comment period. I
think it may even include some of the comments that have been received.
DR. MAYS: Thank you. Let’s open this up for questions. I was actually
concerned about the data in your presentation. I want to look at the slide that
you have that says CPS-ASEC underreporting reported enrollment versus program
data. You said there’s truly reported and allocated enrollment. I didn’t
understand what that is. Can you tell me what that means?
MS. GIANNARELLI: What this particular slide is looking at is the information
that a user of the public use ASEC data would see, a user who’s using the
public use ASEC data, not paying attention to the allocation flag, but using
the public use ASEC data. I’ll just take the first row as an example.
There’s a variable that says whether someone has TANF income. Then there’s
actually another variable that says how many months they received TANF income.
That variable is also in the data. You can use that information to say
according to the public use data, what’s the average monthly number of families
receiving TANF. We think we know from administrative data the true number of
average monthly families receiving TANF if you just do that comparison that the
survey’s getting 58 percent of the average monthly caseload.
DR. MAYS: I guess what I was trying to get at — because the issue of
imputation has been something that we’ve been hearing. For some of this data
that’s what I’m trying to understanding, because it has implications in terms
of the income. You used the words truly reported and allocated enrollment. I
was trying to figure out whether or not this is truly reported and imputed
MS. GIANNARELLI: Both of those are included in the denominator for this
particular slide. In this particular slide the numerator is both the truly
reported and the allocated. Even after you add in the allocated, the ASEC is
only capturing 58 percent of the average monthly TANF caseload, 58 percent of
the SNAP caseload. These are unpublished figures, but it’s not inconsistent
with other findings from other years and other surveys. SIPP figures would be a
little bit higher, but we’re not talking 80 percent for TANF and SNAP.
DR. MAYS: Can I ask about the one that’s called SNAP Exact Match Study in
which you say there’s 833,000 correct reporters, 922,000 false negatives, and
92,000 false positives? Can you talk to me about the quality of the data? Also,
this says weighted number of households. If it was actual number of households,
do you have a sense of what the numbers would be?
MS. GIANNARELLI: I should say that this is not my project. This was done by
Julie Parker at the Census Bureau. I have on the last page the citation to her
paper, which is public. These are numbers directly from her paper. I don’t
recall if she actually had the unweighted numbers or not. I think this is
useful more for the relative magnitudes of the correct reports versus the false
negatives versus the false positives.
I should say, since we’re focusing on this slide, that one thing that’s
going on in this particular study, since it was only a three-state study —
again, I’m sure getting data for three states was challenging enough, let alone
50 plus DC.
But one thing that’s going on here is that, for instance, some of the false
positives — someone could have received their SNAP in another state and they
wouldn’t be there. As I’m sure everyone in this room is aware, there are all
kinds of detailed methodological issues that are going to affect the exact
numbers that are here, but given these overall magnitudes —
DR. MAYS: So it’s the magnitude we should focus on.
MS. GIANNARELLI: Right. I think it’s the magnitude of there’s a whole heck
of a lot of false negatives and there are even some false positives. Regardless
of people moving from state to state and all those other issues, I think the
main point I was trying to convey with these two examples from SNAP and from
the SNACC Project is simply that for reasons I don’t think we completely
understand, a lot of people who are enrolled in these programs are not
reporting it in the survey data.
We would all have the same list of what are the 10 or 12 possible reasons
for that, but I think the work, to me, that’s very exciting is the cognitive
testing work that can be done by the statistical agencies. Forget the six items
or twelve items on my list of what it could be. What’s really important is
let’s do more cognitive testing to figure out what is it really so that then we
can change it.
DR. MAYS: My last question is on your last slide, issues with allocated data
for low-income families. There were two things. Could you help me understand
TRIM a little bit better? I was asking Dr. Queen over here about it. Then in
the slide who are these people? It says issues with allocated data for
low-income families. Can you give us any other parameters of these families? Is
it certain areas? Is it certain racial groups? Is it certain age groups? Is
there anything else we can learn about who these are?
MS. GIANNARELLI: To make sure I understand the question, you mean the kinds
of cases where we’ve seen these apparent logical inconsistencies?
DR. MAYS: I am looking at your last slide. I’m trying to understand people
allocated to be enrolled in a program are less likely to appear eligible than
actual reports. I’m trying to understand who might that be. And allocated data
for benefit programs may be inconsistent with known policy. Allocated income
amounts can make a person who actually reported a benefit program appear
ineligible. I’m trying to understand who those people are that this is not
MS. GIANNARELLI: It’s really talking about two different kinds of cases that
we observe in the data. One kind of case that we observe in the data is someone
who truly reported their income. They skipped the questions on, let’s say,
whether they were enrolled in Medicaid. They just didn’t answer it. Whoever
they were hot-decked to was enrolled in Medicaid. So now in the public use data
they’re a Medicaid enrollee. We see a fair number of those cases where if we
run this very detailed simulation model, we can’t see how they’re eligible.
Full disclosure, any microsimulation model also finds true reporters who
appear to be ineligible, because there are limitations of simulation models.
But it is much more prevalent that people appear to be ineligible among those
who are allocated to be enrolled as opposed to those who truly reported
enrolled. I think the shorthand for that case would be truly reported income,
allocated to be enrolled in a program, but they look too rich or they have too
much asset income or something like that.
The other kind of case we see is sort of the reverse of that. They truly
reported program enrollment — Medicaid, SNAP, SSIS — they skipped some of the
income questions, and whoever they were hot-decked to had a certain amount of
income, and that income looks like it would make them ineligible for the
program. It’s kind of the reverse.
Obviously there are lots of complications in people who have missing data in
both areas, but I think those are the two kinds of inconsistencies we see in
terms of what they truly reported versus what was allocated and appears
inconsistent with what they truly reported. John knows more about all of this
than I do.
DR. LUCAS: I have a question about whether or not in any of the work you’ve
done you’ve ever actually compared allocation or imputation methods across
different surveys to see whether a single imputation versus a multiple
imputation produces different levels of disagreement. The problem that you’re
describing about people being sort of false positive or false negative —
another way for me to ask that question is has there been any look at the
effects of the type of imputation on the likelihood of that occurring?
MS. GIANNARELLI: That is not something that we would be doing at the Urban
Institute. John may know whether that work is going on.
DR. QUEEN: Joan Turek is working with Chuck Nelson looking at item versus
whole imputation, but not across —
DR. CZAJKA: I have looked across surveys at differences. There’s a striking
difference between CPS and SIPP in what appears to be happening in the
imputation of welfare and food stamps and benefits like that. You see much more
of the kind of inconsistency that Linda was talking about in SIPP where
high-income people have food stamps.
I don’t have anything with me, but I had some tables that as you went up at
the higher income levels and looked at what the portion of the food stamp
values that were imputed versus reported, and they’re mostly all imputed at the
high-income level in SIPP, which indicates there’s a problem. This is something
that was actually identified many years ago. They’re using the same general
DR. LUCAS: That was my question, because yesterday there was some discussion
about single imputation versus multiple imputation. I’m just wondering whether
the imputation methodology itself is having some effect on what you’re seeing
in the data.
DR. CZAJKA: I think it’s more important what variables are going in.
MS. GIANNARELLI: This has probably gotten more challenging over time, as far
as allocation for benefit programs, as there have been increasing variations
across the states in the administration of these benefit programs. There was a
time not too long ago when across the whole country the asset limit for AFDC
Now there are states where you could legitimately have interest income that
suggest you have a few thousand dollars in the bank and still get TANF in many
states, although not still in some states. It may be a case that that may be
part of a problem getting more challenging over time.
DR. CZAJKA: Let me just add to that. The American Community Survey, because
it’s so large, actually does all of its allocations within the same state.
That’s a way of controlling for those differences.
DR. BREEN: Thank you very much for the very informative talks. They were
great. The work you’ve done is difficult and much appreciated.
MS. GIANNARELLI: Most of what I presented was not mine.
DR. BREEN: In summary, it sounds like the earnings data, we’re doing a
pretty good job of collecting that part of income, and that’s about 80 percent
of what we need to collect with income. I guess we don’t need to worry about
that, but if we do, you can say something about that.
In terms of the non-earnings income except for dividends and interest, in
other words, what you might call transfer payments or social welfare benefits,
SIPP seems to be the best source. They collect the best data. That was the
take-home message, especially from John’s talk, but Linda’s wasn’t inconsistent
with that. That’s where we stand.
Vickie was asking about the non-earned income. I wanted to follow up on that
a little bit. I think when you said imputation and allocation, this is the same
thing, isn’t it?
MS. GIANNARELLI: Using the Census Bureau terminology, there’s
“allocation,” and they use the word “editing” for certain
logical edits that they do. I guess we could combine both of them as some sort
of imputation. There’s the sort of hot-decking and then there’s editing.
DR. BREEN: You emphasized that you thought maybe one of the most important
things to do would be more cognitive testing on how to get these questions
right. But kind of implicit in what you said was that there is a lot of
information out there that maybe could be incorporated into the CAPI or the
CATI about actual limits that, as John said, sort of happen automatically with
the ACS because they’re only using data from the one state, so they’re not
going to end up making these mistakes, but to actually add in information about
these programs into the CAPIs or the CATIs so that these errors are likely to
come up at the interview point rather than later on.
MS. GIANNARELLI: I think if someone says something, then I think the
CATI/CAPI can do some sort of check. If you say you’re getting public
assistance benefits and you say that you received $20,000 in public assistance
benefits, then a check would be warranted unless you have eight children.
DR. BREEN: I don’t know how feasible that is, but I was just thinking there
may be checks that could be incorporated into the question process, because it
seems like what we want to do is improve the reports.
The third that struck me is training interviewers because we’re all trained
to believe that you can’t ask about income and get a very good response rate. I
think the interviewers go in with that point of view too. Maybe it would be
useful to spend some time and effort thinking about how we collect those data
and kind of turning it around with the interviewers so that they don’t feel
that it’s a sensitive question, because it doesn’t necessarily have to be.
The last thing that I’d like to hear a little bit more about is retirees.
They’re a growing proportion of the population, which I guess is why they’re a
growing problem in terms of collecting this information. But you also, John in
particular, mentioned that the kinds of benefits available to retirees are a
lot more complicated than they used to be. Is there something that we could use
as guidance for how we should be collecting that information going forward?
Does the Health and Retirement Survey to a particular good job? Is there an
example out there or some guidelines or recommendations you could suggest?
DR. CZAJKA: I haven’t done much work with the Health and Retirement Study.
It’s challenging data to use, and we did some work a few years ago that
produced some very high estimates of income for the elderly. I’m not sure we
got their data right. That was part of the challenge of working with it.
One of the things that we suggested with respect to retirement income and
especially the retirement accounts is that there is certain language that’s
used to characterize money you take out of an account. It’s frequently a
distribution, withdrawal, but calling it a payment seems to communicate
something very different to the respondent. I think that that could help.
DR. QUEEN: Do you know of a survey that’s asking it a better way currently?
DR. CZAJKA: I’d certainly look at the Health and Retirement Study, but I’m
not positive that they do. Actually, the Survey of Consumer Finances is one I
hold up. The Health and Retirement Study people have a different view of that
survey, but that’s one that’s connected by the Federal Reserve Board. Its focus
is assets. The interviewers are trained to collect asset data. The respondents
know this is going to be an excruciating interview about your assets and your
DR. BREEN: But they balk at the health questions on that survey.
DR. CZAJKA: They do very extensive imputation, very careful in that study.
One of the things that they do on the retirement side is that their approach is
to go by account as opposed to a type of income, laying out the different
accounts that you have, and then collecting asset holdings for each of those
accounts rather than having people try to make a distinction between different
types. There are many different names for these things, and they all kind of
act the same. But there is a lot of that potential.
There’s real ambiguity. We had a conversation earlier about the standard
interview prompt. What do you mean by that? Whatever it means to you. That’s
not what you want in collecting these data. You want them to tell you what you
mean. That’s a very different approach.
DR. BREEN: That may be an interviewer training issue too.
DR. CZAJKA: Then I mentioned that there is this conceptual issue about what
is income and what is savings and what is taking something out of savings. As I
said, the difficulty that’s been created is we moved our whole way of financing
retirement into something that’s now savings. There’s a reluctance on the part
of the surveys to collect income that way.
The ACS very explicitly says do not include withdrawals from savings. Then
how do I report something I took out? They don’t mention the word 401(k)
anywhere in the ACS, little things like that that indicate a failure to come to
grips with what’s really going on in retirement income.
DR. MAYS: First, let me thank you because this has been very enlightening.
Some of it has been wow in terms of things that we learned today, and it’s very
helpful to the work that we need to do around income. I want to thank you for
your time and putting together really good presentations that were very
enlightening to us. We’re going to break at this point for lunch.
(Whereupon, a luncheon recess was taken.)
A F T E R N O O N S E S S I O N (1:40 p.m.)
Agenda Item: Committee Discussion
DR. MAYS: What we will do is start to plan what our next steps are. We’ve
heard a lot over the past couple of days. I think we have a lot that has been
presented to us. But I think where we want to start is really to talk about
what is the product that we want to produce. I’m going to start us off with a
suggestion, and that is that since we’re talking about the June meeting, that
we focus our efforts on getting a letter out.
We had talked before about the issue of a report in a letter. It may be wise
for us, given the complexity of what we heard, that we get together very simply
with what we need in terms of thinking first about a letter. If that’s just
such a slam dunk, we can always back into a report. I’m going to put that on
the table for discussion.
DR. KAPLAN: Maybe it would be valuable to talk a little bit about health. It
would be valuable to talk about what we learned and where we think we need to
go. First of all, I thought this was really good. The stuff that we heard was
very well done. It seems to me that there are a couple issues that surfaced.
One was to what extent are the agencies getting together to harmonize? We
actually heard a lot about the 12 agencies talking to one another. I think that
we need to think a little bit about whether or not that’s accomplishing the
goal of harmonizing if it’s agencies talking to one another as opposed to
providers of information and users of information getting together to make sure
that they’re producing the right information. The second thing that I was very
intrigued by was the methodologic research component. To what extent are we
building the right research agenda to move this forward for the future?
DR. MAYS: Let me just comment on why I was suggesting that we start with the
discussion of a letter, and we can change a bit. My concern is that we have a
very specific task that really is about the minimal standards. I want to make
sure that we can meet that first, and then all these other things.
If you remember, we have a long-term and short-term goal. We’ve talked about
whether or not there would be pieces that we could continue with. What I want
to try to make sure of is that we can get us through because I think it’s a
very difficult process of deciding whether or not we have something to say
about the recommendation of a very specific minimum standard.
I want to make sure I acknowledge this. I think the other issues are
important to that, but at the end of the day, if we’re going to make it by the
June meeting, we really need to first get some kind of consensus among the
committee as to do we think that there is a minimal standard that we want to
recommend for each of the three variables.
I think the other challenge that was put before us is — and I think we can
deal with it quite easy — why or why not social class. Then I think everything
after that is pretty much gravy. I just want to hear what others said, because
June is around the corner.
MR. BURKE: Things I recall hearing when we were together in Washington a
couple weeks ago from Jim, who described what he thought of as the design
outputs of the hearing would be, obviously as you said, what are the minimum
standards? Should they remain or should they be changed? But beyond that, it
was about what a higher vision concept should be. It wasn’t just the as-is, but
it sounds like the request was what the to-be should look like.
DR. GREEN: We opened the hearing with this. We opened the hearing this
morning with it again. Our questions for the hearing were three. What is the
state of the art and the standards for collecting data to measure SES in
federal surveys today? Second was what are the variables that are collected?
Three is what opportunities exist to standardize these variables across federal
We learned a lot about one and three and the background work that Susan and
others have been doing pertinent to this. But to follow Vickie’s lead here, we
also heard that the timeline for this is to bring something forward to the
queue for the NCVHS meeting.
I’ve been struggling in the last hour just trying to sort a lot of what
we’ve heard here. As I was saying to Susan, forgive me again for being cursed
with the brain of the doctor, but I feel like we had an acute problem that we
were assigned to take on. As we evaluated the acute problem, we discovered that
the patient has about seven chronic diseases.
When Vickie divides this up into a paper or a letter or whatever, that may
mean something different to you than it does to me, but that makes sense to me
in that if we head toward a letter and we meet our timeline, it’s really been
more narrowly focused on these particular questions.
At the same time, it makes no sense to ignore what we’ve learned. There are
a lot of issues that are as pertinent to other aspects of federal surveys as
they are to SES, that have rolled out in this. A lot of them are process
issues. It seems to me that we don’t have to restrict ourselves to one product.
If I heard you right, you’re saying maybe we need two. Maybe there’s some sort
of report or paper that comes out of the work, and then maybe there’s also this
specific letter that responds to the questions we were asked to address. Am I
tracking with you?
DR. MAYS: I think what I am talking about is the time to do the job that we
need to do. I think that the complexity of just deciding if there’s a minimum
standard is actually one that’s difficult to do, even in the time that we have.
I’m just concerned that we’ve heard a lot of things, and if we discuss all
these other things, which I don’t think we’re going to lose, and I think it
would be great when we have the other committee to bring them up to speed on
that. But I think in terms of deciding about a minimum standard, particularly
given what we’ve heard, is a bit of a onerous task because we’ve got to come
here among ourselves to some agreement, and then we’ve got to write a letter
that’s compelling enough to support that agreement.
DR. BREEN: This is an area that is near and dear to my heart and something
I’ve been thinking about my whole career. I found this whole discussion very
illuminating, but I was also really trying to stay on task because you can go a
lot of places with this information and really kind of go wild thinking about
I think we learned that for education there’s a very clear recommendation of
what to do. I’d have to get back on the notes, but there were two surveys that
were doing it fine and two surveys that weren’t doing it fine. I think we have
the recommendation for that. I think NHIS was doing it fine. I’m not sure
everybody agrees with that, but I can bring out my notes and we can talk about
For occupations, which I thought was going to be a nightmare, it seems like
that also is being consistently done. The problem there is with analyses. There
are many ways to do that, but in terms of data collection, that’s also pretty
consistent. I went over and asked the speakers afterwards, I said you’re
collecting different numbers of occupations. Is that a problem? No, they all
back to the SOCM. These are fine. They’re consistent.
I don’t think there’s a problem there either. I think our recommendation for
that is also clear, because if people are collecting less, they’re collecting
fewer occupations. Obviously you can’t get more out of less, but they’re
consistent in the way they’re being collected.
I think income is more of an issue. There are eight questions that the ACS
uses which seem to be the basis for most of the surveys. I think what’s unclear
is whether those eight questions are included in the MEPS, because the way that
was presented, it sounded like that might be different, or maybe it just might
be more information being gotten. I think that’s some research that we need to
do to see whether that’s consistent or not.
The two areas within income that I think need attention are the poverty
indicator. Right now we’re all using the simple poverty indicator, which is
income based on age of head of household and number of people in the household
who are living in the household. That’s the Census way of doing it and that’s
what NHIS does as well. Connie Citro has developed another way of measuring
poverty that includes transfer payments and costs of childcare, which we may or
may not want to recommend, this more complicated measure, but if so, that’s an
I think the other thing is we heard that there needs to be some work on how
to measure non-earnings income. I think we’re particularly interested for
policy purposes of transfer payments of benefits that people are getting from
programs, but then interests in dividends, that doesn’t seem to be too accurate
either, then finally information about retirement income, that that area is
probably the one that needs the most work and is particularly important going
forward because that’s a group within the population that is growing.
MS. GREENBERG: I found that summary very helpful, since I wasn’t here
yesterday. In some ways it begins to answer some of my questions because I
wasn’t here yesterday. I’ve looked at some of the slides, but also I’m sure the
discussion was as rich or richer than the presentations.
I will just say it looks like those of you who worked on this put together
really a terrific day and a half. I thank you, and I was particularly pleased,
even with some of the limitations of meeting at NCHS, that you did meet at
NCHS, because I think there was quite a bit of interest among NCHS staff,
particularly yesterday. I guess what I heard today, too, were all the problems.
That’s why it was helpful to hear the things learned as all over the place, as
I might have come away from feeling just listening today.
I have a few questions. Other than checking out a few of these things like
you mentioned with MEPS or whatever, to what extent are the gaps in your —
there will be gaps in our knowledge on this subject until the end of the day,
but gaps to try to get some closure on whether you can make a recommendation to
the specific issue.
Another question I have is was there anybody who was saying just collect
education or just collect income or just collect occupation? If not — and I’m
assuming not — then would a minimum standard have to be a minimum standard for
each one of those, although when they say all federal surveys, I suppose there
are some that aren’t collecting any of those. We know the major ones you had
represented are collecting some, obviously, of this, and probably the major
ones all of it.
Is there any hierarchy? If you aren’t really collecting any of this now —
and I don’t know if there are any surveys that aren’t, but of the three, is
there some kind of hierarchy, or do you really need to collect some minimum
standard for each one of those three? I’m assuming you were talking probably
about coming out with some kind of recommendation on a minimum standard for
each of the three.
DR. QUEEN: I have to, unfortunately, leave in about five minutes for a
meeting. I would be hesitant about making an actual recommendation for using a
MS. GREENBERG: For what?
DR. QUEEN: Specific survey items as a minimum standard at this point. We
certainly heard that some surveys are doing it better than others. NHIS and
NHANES and MEPS are fairly closely matching on education, the ACS and CPS, for
I didn’t think that the committee was actually going to be looking
specifically to make a recommendation in this letter of a specific
questionnaire to use as a standard, because we haven’t looked at the way the
questions are worded yet. Yes, they’re all collecting the occupation variable
and industry occupation codes, but the way they even ask about, first, are you
currently working. The capturing of whether or not somebody is working,
unemployed, working multiple jobs, part time.
MS. GREENBERG: Or unusual occupations. At the end of the day, there’s got to
be a specific set of questions, just like for the disability.
DR. QUEEN: But I do not believe that the committee was necessarily charged
with coming up with the final determination of what the standards themselves
PARTICIPANT: Then what was the committee charged with?
DR. QUEEN: Beginning the work that’s necessary to move forward in looking at
the surveys and determining which ones seem to be capturing information in the
best way, which ones are most applicable for the health surveys, for example,
which ones we can look to for best practices, why they’ve changed, to begin the
background work that would lead to the development of some more harmonization
coordination, but not to say this is the survey you’re going to use.
MS. GREENBERG: I thought the goal – but I’m not sure that you can get
to that goal by June by any means, but the goal was to assist the Department in
identifying a minimum standard for collecting SES in federal surveys.
DR. QUEEN: I don’t think the committee could actually say by June, without
having some sort of interagency committee or work group such as we did for the
HHS data standards involving Census and OMB together working, I do not believe
that that could be done.
MS. GREENBERG: I don’t know whether it can or can’t be. I guess that’s for
people who were here the whole time and your thinking on it. I think this was
supposed to be a little bit more than just gathering background information.
DR. QUEEN: Yes, but I did not think by June, I think there was a recognition
– Jim was talking just yesterday about coming up with a letter to move things
forward and to make recommendations, but I don’t believe that we would have a
set standard that was finalized.
MS. GREENBERG: Anything that this committee recommended then would have to
be vetted through the Data Council through whatever the process is. This
committee only makes recommendations. I don’t think the committee should
recommend anything that they don’t feel they are in a position to do. I think
that is the question. What will come out of this or any follow-on in the next
month or so?
I think you mentioned to me, Susan — and maybe I shouldn’t say this — that
June wasn’t in stone. The question is can you say anything by June, and then
what would you say?
DR. MAYS: Let me comment to bring us, since Susan has to go, to what I think
are some of the critical issues. First of all, I think if we go around the
table, we’re going to see that we’re going to disagree as to whether we think
everything is — I already know that.
Secondly, there is this additional work. For example, we do need to look at
some of the materials they’ve been preparing, such as are the tables that show
you all of the questions — what I’m thinking is there’s a lot of work to be
done. I think maybe in another month and a half we are clearer about where
we’re going with it, but I think at this point in time I don’t know how much
agreement we have. We have several people who aren’t here, so you already know
how that process is.
My concern is that I understood that we were making a recommendation and
that that recommendation, just like if NCHS or somebody else made the
recommendation, has to go through all of these different agencies to look at,
but that we would not make a recommendation if we did not think that we could
make a reasonable recommendation. The steps that I think it will take to make a
recommendation are the following. Susan, you have to help me before you go. One
is we need to look at the surveys that this would apply to. I think there are
DR. QUEEN: There are 16 surveys that we were looking at, but they are beyond
DR. CAIN: I think that’s a point that needs to be made. This is not a
recommendation for all federal surveys. This is just for HHS.
MS. GREENBERG: The other recommendations, the other 4302, were those just
DR. MAYS: Yes, they’re just for HHS health surveys. We have the list of
which surveys we’re talking about. We have the three variables that we’ve been
talking about and whether they’re collected. We have a URL that takes us to
that survey. I think the next thing that Susan had indicated she needed some
help with was actually to pull the actual variable itself so that you can see
it, because apparently that’s not an easy task to do.
DR. LUCAS: I just want to mention that even within the scope of just HHS
surveys, this report was released in maybe 2004 or 2005. Actually, we only
scratched the surface of HHS surveys. This covers the full spectrum of HHS
surveys. We haven’t even tapped into the 19 that they’re including. There are a
whole bunch of CDC surveys that we haven’t even tapped in that will be affected
by this standard because they are conducted by HHS.
MS. GREENBERG: I don’t think that we have to do all that.
DR. LUCAS: I was just saying there are many more surveys that are affected
by this that we have had the time to really cover or discuss in great detail in
the course of these deliberations. I’m just saying that there’s a bigger
DR. MAYS: Can you give us a sense of what other surveys? Because I guess I
thought that our domain were those 16, 19, so if it is beyond that. Is it
everything CDC does as well? Can you give me a sense?
MS. GREENBERG: Not some of the surveillance.
DR. MAYS: That is what I don’t know that is why I want to make sure.
DR. LUCAS: In the background document we provided the citation to the actual
language in 4302. I can make the specific text available to you. It covers all
HHS surveys and programs and other data collection efforts of the –, so it’s
pretty broad in scope.
The reason why I mention this report is because they did look at (side
comments) across race, ethnicity, socioeconomic status, and one other thing.
They laid some of the foundation of what we’re doing in here. (Side comments)
disparities is the report of the National Research Council. It’s called
Eliminating Health Disparities: Measurement and Data Needs.
MS. GREENBERG: Isn’t it just population-based surveys? Let me ask Susan, are
you actually thinking that every surveillance system is going to ask all those
DR. QUEEN: I never said that.
MS. GREENBERG: You’re saying it covers much more, and when I said every
surveillance system, you said yes, but you’re not thinking every surveillance
system is going to ask those disability questions.
DR. QUEEN: We broadened it beyond HHS with the ACS and the CPS, and so we
were doing that just for exploration of who’s asking what and how in the area
of socioeconomic status.
DR. CAIN: I also think there’s a question of if you look at the language,
it’s all HHS-supported surveys. Does that cover all NIH grants then as well?
DR. QUEEN: It is only in the case where a grant is specifically directed to
conduct a survey. Think of it as part of the Paperwork Reduction Act.
DR. COOPER: What about the HRS, for example?
DR. MAYS: What about the National Child Longitudinal Survey?
DR. QUEEN: I don’t know.
DR. CAIN: It’s an odd situation that hasn’t been clarified, so in some ways
it could be expanded to include all NIH grants, in other ways probably not.
DR. MAYS: Is it like a U or an RO1?
DR. CAIN: It is not specified in terms of the legislation, so Us and RO1s
don’t talk about that. It sounds like what you’re saying is if it’s a grant to
specifically do a survey, then it might be covered by that.
DR. QUEEN: If it is a survey that is getting governmental —
DR. CAIN: That’s what I’m saying. Those are Us. That’s an interpretation
that has been made.
DR. QUEEN: It’s in the implementation guidance that was put up on October
31st to try to help clarify for the agencies which surveys would
even have use the current standards.
DR. COOPER: Are you saying recommended the U method?
DR. MAYS: No. She is saying because they would be the ones that would go
through OMB clearance. I don’t think that most of the RO1s would go through an
DR. COOPER: I don’t think most of the Us do.
DR. MAYS: No, you’re right.
DR. BREEN: It sounds like we need to figure out the scope of our task.
DR. CAIN: HHS definitely, as opposed to all government surveys.
DR. MAYS: Isn’t it also HHS health-related surveys? Then let me pump it up
one more time, because I also thought it was HHS health-related surveys that
were population-based. If you look at what other surveys were impacted by 4302,
I didn’t see 4302, which went into effect in October, kind of migrate down. I
thought that the individuals who we’ve had come in and talk is the level of
people who’ve had to worry about it.
DR. LUCAS: In the background document it says the law requires that any
federally conducted or supported healthcare or public health program survey,
collect, and report basic demographic data, race, ethnicity, sex, primary
language, and disability status; data at the smallest possible geographic
level; sufficient data to generate statistically reliable estimates; standards
for the basic demographic; and any other demographic data regarding health
disparities as deemed necessary by the Secretary of Health and Human Services.
It is not limited to just population.
DR. MAYS: Is it at all possible for us to get Jim on the phone because I
just don’t believe when he spoke with us – he started our day off, he gave
us our charge. Jim would know that we couldn’t do something in that amount of
MS. GREENBERG: A lot of things are like this. That’s what the language was,
that the Department has made the decision to start with federal surveys.
DR. CAIN: As part of the group that had been taking a look at developing the
standards, what we did was say that in some ways surveys were going to be the
easiest thing to do or that was a place to start. The standards were developed
for surveys recognizing that there are a lot of other programs or more clinical
activities that we may be able to move to next, but this is a place to start
with the surveys.
In the legislation it specified those five different areas. It also said
that the Secretary could look at other areas that might be relevant as well.
When we were developing those five different areas, we said that since it had
come up in several cases, that SES might be an area where standards could be
developed because people had been working on it for 30 years and we thought
there was enough work there that people could take a look at it. But the
committee that developed those particular items was not prepared to recommend
what the standards would be at that time. That’s where this activity came out
MS. GREENBERG: I remember from a few years ago — I’m really not sure what
the current status is, but some of you probably know — that some journals and
others came out with policy that said you should not or could not report race
and ethnicity, without some type of socioeconomic variable because they thought
it was misleading. Maybe it was only one or two journals.
DR. LUCAS: You mean that results being attributed to race and ethnicity were
really capturing differences in SES, and so just attributing a particular
finding to differences in race was not sufficient?
DR. BREEN: AJE would have been the most likely to do that, but I’m not an
epidemiologist. I don’t recall that, but I know there was a point in the
community, including the cancer community, because we don’t collect SES on the
Cancer Registry data, where people said we’re picking up a lot of information,
we’re attributing it to race, and there’s a good chance it’s SES, so we really
should work hard in order to get some SES data. There has been a change in the
community. I don’t know if there’s a rule or a policy.
MS. GREENBERG: In any event, I think the decision was made to start with
surveys, and probably more population-based surveys. That’s the most you could
really address at this point.
DR. CAIN: I don’t think the intention was to necessarily limit it to, for
example, national surveys, which is what we’re talking about now. I think it
really is in the interpretation. It wasn’t specified in the language, but it
sounds like in the implementation and interpretation they’re limiting it to
data collected under contract that has to go through OMB. That’s okay, but that
also covers a lot of other —
MS. GREENBERG: Does that mean they’re not including BRFSS? That doesn’t go
DR. CAIN: Then I would say probably not.
MS. GREENBERG: I think Jim knows whom this applies to.
DR. CAIN: Because all we had was the legislation.
DR. MAYS: I think it would be great if we could get him.
MS. GREENBERG: I will check to see if I can reach him.
DR. MAYS: I think in the interim what we can do is maybe we can do what
Nancy did, because I’d like to see how much consensus we have. If it would be
possible to do that, that in and of itself would be helpful. Larry, what are
your thoughts about each of the variables and where we are? I think Nancy was
being very clear about what she thought about each of the variables.
DR. GREEN: The things I learned in the last day and a half that I think are
right on mission and right on target here — it was well said in the second or
third presentation yesterday around defining SES survey measures. It was just
such an elegant simple statement. SES is a latent variable predicted by
education, income, and occupation, modified by race and gender. I think we
should adopt that as our position. That was a great expert summary of decades
of work and a lot of literature.
DR. COOPER: Can you restate that?
DR. GREEN: SES is a latent variable predicted by education, income and
occupation, modified by race and gender.
DR. BREEN: Michael Hout said that yesterday. He’s got a little graphic.
DR. GREEN: I heard nothing from any other presenters to refute that. That is
a great organizing conclusion, in my opinion. Why are you talking about
education, income, and occupation? That’s because that is what it is. These are
the predictors of the variable that we’re interested in.
The second thing I learned is more synthetic. I feel a whole lot more humble
about measuring SES today than I did at the start of yesterday. I have an
abiding conclusion that’s just personal. I think we want to be very careful
with recommendations that get extremely explicit about what to do because what
I’ll just call ramifying implications, which I don’t think we understand and
have command of. That leads me to the third conclusion from the hearing.
I actually think that we can write a useful letter that’s responsive to the
charge we started with. We just need to have modest expectations of how
dramatic our recommendations can be and how far they can go. I think it was
either Nancy or Susan or Jack that said we teased out the different locations
in which this thing colloquially known as the statistics community about who
that is, where it exists, how they get together, how they operate, and this is
an elite group that we should be respectful of.
It could be that we find ourselves making sort of midlevel recommendations,
and then a process recommendation about where to locate the follow-up work. So
a fourth conclusion I’ve come to is I don’t think NCVHS, given all the other
requirements that are placed on NCVHS, Privacy and Confidentiality Standards
Committee — you know the routine here.
Even if this were all we had to do for the next couple of years, I think it
would eat us alive. I’m not sure we’re the best group of people to dissect it
all, but I do think that we’re in a good position to answer our charge. I think
we need to commend Vickie for assemblage of this group of people that she and
Susan and whomever else was involved did. They obviously hit a homerun in
getting the experts and people who know about this in the room to talk to us.
I think we’re obligated to distill out of that some findings and conclusions
about measuring SES. I am personally disinclined to decide that we don’t have
to do anything for longer so we can think about it longer. I think we ought to
shoot for the June deadline, try to adhere to the charge we got, see if we can
I may be misfiring here, Vickie, from your perspective. I thought you
started us off saying we really need to think about the products we want to
produce. Are we going to produce a letter? Are we going to produce more or
something else? I believe we need to produce a letter. As time goes further,
there’s more work that is going to be in scope for us here. It’s got to be
balanced, our budget, our workforce, our staffing, and all that sort of stuff.
That’s my own distillation of the conclusions about this.
I really like Nancy’s specificity. My version to Nancy is there are people
around that know what the next best steps are to do about education. There are
people around that know what the next best steps are to do with occupation.
There is not consensus what the next best thing is to do about income. That’s
where I would come down.
DR. KAPLAN: A lot of thoughts in my head, so I’ll try to spit these out in a
coherent way, if that’s possible. Just reacting to Larry’s comment, yes, the
latent variable model is great, but when the rubber meets the road, you have to
know what the coefficients are to get SES. In other words, conceptually that’s
fine, but you still have to have measures of education, measures of income, and
so forth. Then you have to have a model that provides those specific
coefficients to come out with a score. We’re a long way from that.
What I was kind of thinking was what’s wrong in our country in relation to
Canada where they have Statistics Canada? We heard this several times. Given
that we’re not likely to go to their structure, how do we get to some system
that provides a coherent overview of these things, given that there are
My reaction is that I think that we do need some super-oversight. One
recommendation that you could put on the table is the meeting of the 12 people
on this committee. Is that the right way to go? Do we have to look at that in a
different way? If so, how would you pull it together and where would that be?
One possibility would be that this is something that could be given to the
Office of Science and Technology Policy, the OSTP, in the White House, because
it is across agencies and bigger than HHS.
The second reaction I had was it seems to me there are two different things
that you need. One is you need the measurement piece and some sort of
harmonization. Then the other piece is the methodologic piece. You have to make
sure that if you harmonize, you still continue to engage in activities that
will make things better in the future. I don’t know if that would be sort of a
separate task, but I do feel that’s important.
DR. COOPER: The information that has been shared over this last day and a
half has been unbelievable. It has let me know what all I don’t know, and that
when you look at those measures of SES, oftentimes we make assumptions about
what we read and what’s been reported. What’s been shared with us is that there
are so many other things going on that we really don’t know that have been
included in the inconsistencies in the reporting and what some of the variables
do mean as well as what they fail to include. At the same time, we know that
SES is definitely a strong proxy for health outcome.
Yesterday it was an unbelievable task, but I do think that it’s very
important for this committee to come up with some kind of recommendation. It
doesn’t have to be a recommendation that solves the whole problem, but it
should be some type of a recommendation that somewhat summarizes what happened
during this hearing and what the potential steps are in terms of laying them
out and the task that we have at hand, and that we do need others to join in in
When you look at the implications, it is not limited to just surveys, but it
may include NIH and others as well. That’s significant. What we really do not
want to do is continue to collect incomplete data when we know that there’s a
better way to collect that data, because that data has significant implications
in terms of programs that are supported and people’s lives in terms of
treatments and et cetera. When we look at SES, it’s a significant set of
variables. I do think we have a responsibility to do something.
DR. MAYS: From my perspective, when we look at the opening panel where
Michael began to talk about what SES is, one of the things that was throughout
the day is that the reason it’s SES has a lot to do with social stratification
issues. I think that’s where a lot of the difficulty is.
We have to remember it’s not just about an education variable, an income
variable, an occupational variable. But if we’re being asked to talk about them
not as who they each are, but how they work within SES, I think it’s a little
different. From what I heard, I thought that we weren’t settled on education,
whether it’s educational attainment, whether it’s educational achievement,
whether or not in order for it to work in SES, we need it to work with a couple
of other things.
Now I’m giving my personal input here. In terms of income, I think that if
we thought about the issue of what is the requirement of some of these surveys,
they have such different requirements that coming up with a minimum way of
asking about income may be very difficult, particularly right now when there’s
a challenge to how we measure poverty.
We’re kind of crossing poverty and income. Those are actually two different
things. I thought Connie had some great suggestions about it, but poverty is
actually mandated to be determined by the Census Bureau on behalf of the
federal government. It’s almost like we’re not talking about the determination
of poverty. That’s not really, as I understand it, our charge, but the income
variable to SES is.
I almost reached the conclusion that there is not a minimum standard to come
up with for income other than to say to collect it in a certain way. I’m not
sure that I know exactly what that would be other than to say to make sure you
at least have it continuous so that people can use it as they need.
For occupation I was kind of surprised. I thought I was going to be in a
very different place with occupation. I think it was much more exciting to hear
what they said. This is not from having an opinion; it’s ignorance. I don’t
know that I’m ready without a little more work because, very honestly, I didn’t
do the same amount of work on occupation as I did on the other two. I think I
tried to give everybody access to the Widgeo(?) site where I put up about 15 or
so articles. I’ve read most of those. I didn’t read as much on occupation, so
I’m unclear about that.
If I had to say where we are and what I think will work for us, I think a
letter that talks about what we found, what’s important, and an opinion about
the standards might be the best, because the conversation that I had with
Virginia during lunch, to realize the infrastructure that you all had in place,
to come up with those minimal standards, I don’t think we can do that. But I
think what we can do is to make a recommendation about whether we think there
could or couldn’t be a minimum.
I think, Bob, you were saying this. There are some other groups, and I think
then it needs to go to those other groups. But in addition, do we need to opine
about the kind of place and support that those groups need in order to be able
to achieve what we see as important to achieve.
I think this gets into the point that Larry’s making about those groups and
working together. Maybe there needs to be a process, but I don’t know. I don’t
want us to start trying to create an infrastructure to know that SES is the
only thing that’s going to be done, unless the secretary has other things that
she’s also going to do. Then it may be that making this much broader
recommendation about reengineering these groups may not be cost-effective right
now. That’s where I’m kind of settling at this point.
DR. KAPLAN: Let me just put something on the table for discussion. It does
seem to me that the level of precision differs as a function of what you’re
using the information for, and that there may be some circumstances in which we
actually don’t need to go into much more detail. I’ll give you an example.
The tension that we get all the time is investigators who say they want to
measure things in the most precise way possible, but they have to do it in a
very short timeframe. It’s got to be short because there are costs. Follow-up
studies and longitudinal studies, if the questionnaire is too long, then you
get more dropouts and so forth. So for some of the things that we do I’m not
sure that greater precision is going to make that much difference.
We know that there is a systematic relationship between socioeconomic status
and health outcome, for example. If we sharpen up the measures and make them
more precise, we’ll go to a lot more effort, and I’m not sure that we’ll learn
something that we’re not learning already, where for other purposes there are
certainly things about resource allocation in communities that, in fact, we’re
going to need to measure at a greater level of precision. I guess my point is
that one size doesn’t fit all. There may be circumstances in which the minimum
standard is much more a minimum than in others.
DR. MAYS: Jack, are you on line?
MS. GREENBERG: I left him a message and I gave him the phone number here.
The only thing I could find on the Internet was the thing asking for comments.
Obviously we need some clarification, but I do think the question at hand is
given the narrowest definition of your charge, what do you need to do to get
there? Take the narrowest definition, that it’s for federal health or HHS
health surveys. It’s true that the legislation is broader, but what would you
even need to make a recommendation on that, let alone some of these broader
DR. BREEN: Your own remarks reminded me of something that we might want to
add to the types of recommendations that we’re talking about. The reason I
didn’t mention the concerns about education is because we’re capturing
educational attainment adequately.
We’re definitely not capturing educational achievement, but as we probed
that, it sounded like we didn’t have the means for doing that because you don’t
have a means for measuring quality. Some states, like California, have that.
There are things like graduation rates from high school and proportion of kids
that go on to college, but I’m not sure if those are national measures or if
they would suffice. It just seemed like for that, that’s not sort of in the
But an option would be to talk about things that need to be thought about in
the future in the near term or in the longer term, issues that have come up
that are critical to understanding SES, but that we’re not going to resolve
this in six months or a year or by the time that these surveys have an
opportunity to revamp or new surveys start up, because that’s really what it
seems like we’re — from what Susan said, we weren’t asking for immediate
implementation; we were asking for implementation when it’s convenient.
DR. MAYS: The only thing is that I would like to hear from the Department of
Education because I think we didn’t have people from there to know for sure
what it is they have and don’t have or what they can give us or what they
can’t. Then if that’s the case, then I’m done. But are we measuring just
education, or are we measuring the concept that really is SES? I keep going
back to what SES really is supposed to be about.
DR. BREEN: There is a National Center for Education Statistics, and the
deadline was too short for them to get clearance, so they weren’t able to come.
MS. GREENBERG: I will say that this news release that went out in June said
the proposed standards — now they’ve obviously been promulgated and I couldn’t
find that, but it’s somewhere — for collection and reporting of those five
measures in population health surveys are intended to help federal agencies
refine their population health surveys in ways that will help researchers
better understand health disparities and zero in on effective strategies.
That’s obviously what we’re talking about at this point, but exactly how
they’re defining population health surveys and whether that has to do with OMB
clearance and all that I don’t know.
DR. GREEN: What I heard Bob talking about and Dr. Cooper and Dr. Breen
talking about I don’t hear being contradictory to Vickie’s summary. I think
Vickie laid this out close enough for us to head toward next steps. I’ll say it
back to you, and then you fix it.
Basically Vickie said we can do a letter by June about what we found and, in
so many words, said it could be a range of findings. Then I added in we could
have findings that relate everywhere from policy to particulars. It was a rich
hearing. We got a lot. We could render an opinion about the possibility of
minimum standards for income, education, and occupation. Then she suggested we
could then go further with some possible suggestions about some sort of how-to
process to get to those minimum standards for SES.
Then I added one thing to it. We should add cautions because learned a lot
about places where we could do harm, from NIH investigators to misuse and
having something that’s working in MEPS that gets trashed and we lose the
ability to look at 15 years of sequential data. Is that close to what you said?
I’m scared of going too far with a set of recommendations. I am a humble guy
that just drank from a fire hydrant for a day and a half, and I’m afraid of
saying more than I know.
DR. CAIN: I just want to go back to the point that Vickie was making earlier
about the education. The point that I heard that sort of alarmed me and made me
despair of getting what we want was a realization of the age range of all these
surveys. If you’re getting data about a school that is concurrent with somebody
being in that school or a recent graduate, that’s one thing.
But if you’re interviewing a 70-year-old person who the school has changed
names three times and then disappeared, or when they lived in that particular
inner city neighborhood it was upper class or high income or whatever and you
go back to that same school now and it’s a very different population and it has
a very different quality, even if you were able to make those links.
I think we have to think about the range of surveys that would be affected
by the minimal standards. Maybe the minimal standard, as you say, might be
educational attainment in some way, and that more work needs to be done on the
achievement. But I just can’t imagine who you’d ever get that for somebody who
wasn’t an older person, given that the school has changed so much.
DR. KAPLAN: What I heard from some of the people was the other side of
minimal standards. I heard that we would like to be somewhere else, and we’re
not there yet. With education, for example, there was a lot of discussion about
educational quality. The speakers said we just don’t have anything like that.
Or with income there was this interest in getting to network. For people
with equal income, some people have resources so that if they’re threatened by
a recession, they have a pad where others are stuck in these miserable jobs and
can’t do anything. The speaker said we really recognize that as important, we
just haven’t advanced our field to that yet. It would be nice if we were able
to both talk about minimal standards, but also talk about where we need to go
for the future.
DR. MAYS: Larry, thank you very much, because I was busy talking, so I wrote
down what you said I said and it felt right. I think if we can reach consensus
on some things, we’ll do really well. When I was talking about a minimum
standard, I was almost talking about not the actual wording of an item.
But what I was talking about was, for example, if we said that within SES we
collect education, and here are the things that we benefit from, that in the
context of social economic status we may want educational achievement, and then
to say why we think that that’s good and that having that as a minimum would be
great, but that we stop there and not specify an item, and that we talk about
that these things all need work, study, et cetera.
Marjorie, I’m just trying to make sure in terms of from Jim whether he
really wants us to write an actual minimum for him to be working with, or
whether it’s more that what we can give him is advisory about what we think it
DR. KAPLAN: If you want to get this done, the recommendation might go one
step further. It might express it just as you did, very well, but then say now
we want to hand it off to some structure and maybe even go as far as specifying
what that committee would look like or where they would live.
MS. GREENBERG: I would agree with Susan. I think that probably the Data
Council would establish some type of working group to review what you’ve said,
but they wouldn’t have to redo what the subcommittee has done.
DR. KAPLAN: But we should say who to the Data Council. We should name them
so that it’s just harder to disappear into the atmosphere.
MS. GREENBERG: I think the Data Council is the locus. But then they will
also put this out. Whatever they agree to, they will put out for public
comment, just as they did with the other ones.
DR. MAYS: What we begin to hear about is the overarching federal statistics
group, because I think the issue isn’t just HHS getting it right. For some of
these the very issue is we need them to be working with Census and BLS and the
Social Security Administration and Transportation because it’s a social status
issue and we need them at the table.
MS. GREENBERG: I think they are. Also, you have to realize that HHS surveys,
data systems, with the exception maybe of Vitals, is the numerator. You need
Census, obviously, for the denominator. Census is not an HHS activity, but you
have to be consistent with them or you’re not going to have your denominator
DR. MAYS: We have the Department of Education. That’s the group, the Big 12,
I think they’re called.
MS. GREENBERG: We can get information from them, but at the end of the day,
I think what the Department will do is promulgate this for some subset of HHS
data collections because the secretary doesn’t have any authority over anyone
DR. CAIN: When we developed the original standards, it was a Data Council
subgroup and meeting of the Data Council itself, and Census was at the table,
OMB was at the table, the relevant agencies were at the table for the
development of the original standards. I don’t know any reason why they
wouldn’t be in the future.
There is certainly precedent for OMB calling groups together to tackle
particular issues. For example, I’m involved in one right now that’s measuring
relationships in surveys. That has Census. That has Labor. It’s got everybody.
They do have a mechanism for bringing people together for that kind of
DR. GREEN: I sense that we’re about to adjourn, and I want to come back to
Bob’s last comment, which I heard as being will this letter include in it some
aspiration statements about where things could go next. We didn’t get reactions
to that, and I just want to see how other people feel about that.
DR. LUCAS: I have a question related to your comment, Dr. Kaplan, and also
to the products that we’re going to produce. If our letter said something along
the lines of what you’re suggesting, that these are recommendations, and then
we’d like to hand this off to some other entity to study further, I guess my
question is how would they know the information we gathered here? How would
they even be able to build on it without repeating if all we gave them was a
MS. GREENBERG: I know you have been working on a background document.
Obviously they would get more than the letter. They would get that background
document, I assume, or was that just for your own use?
DR. MAYS: Given what I heard today and given what I know of the committee,
it really humbled me also. I think that getting the letter done is primary. A
piece of that document, for example, is this chart that I passed around and
will take Susan an enormous amount of time to finish. I think it can come
later. I think we can make a brief intro to it, but I’m not as convinced that
we can write, because I’m just trying to get the charge done. You’re welcome to
make a recommendation about what you think about that.
MS. GREENBERG: I think a lot of effort has gone into this document. I don’t
think there’s anything confidential in it. It should be available certainly to
the department, but maybe more broadly. My second question is my typical
question. Do you want minutes or a summary from this meeting, or do you just
want to pull what you need from it for either that background document or your
You’ve said how are they going to know what we’ve covered. There will be the
transcript, of course, but only real aficionados sit down and read transcripts.
We could have 15-20 pages maximum, a summary of the information, and then that
would also give something that could be passed on. People can always go back to
the presentations and the transcript if they want more detail.
DR. MAYS: Maybe this is not the way to do it, but it is what I was thinking,
and that is step by step. We need to get the letter and get it through the
committee. Then we finish the document and put the document up on the web so
that the letter goes where it needs to go, but that we have a little bit more
time in terms of finishing the document itself.
MS. GREENBERG: But the committee may not feel that comfortable even voting
on a letter if it hasn’t seen background.
DR. KAPLAN: What happens is the letter would be finding, recommendation,
finding, recommendation. The findings can be relatively brief, a paragraph or
two for each area that leads up to the recommendation.
DR. GREEN: I agree with that.
DR. KAPLAN: What I am suggesting is that the letter would say we held these
hearings. Here’s finding one. That led to recommendation one. Here’s finding
two. That led to recommendation two.
MS. GREENBERG: That is sort of a standard committee approach.
DR. GREEN: That’s our standard approach, and we should do it again. In
talking with Susan yesterday, just a quick comment about this, and then I want
to ask my question again. The first question in our charge was about what’s the
state of the art of where we are with collecting data about SES. One way to do
that is to try to write a tome. I’m opposed to that. Rather, I really thought
that the quality of the presentations were such that I’m inclined to post
presentations and get them up there and distill stuff, get this paper, let this
paper mature and that sort of stuff, and it’s okay for them to overlap.
But I think in our letter that we’re preparing we can redirect people who
want to know more about that finding to the postings from the hearing. We don’t
want to write a very long letter that explains all of our findings in long
detail. I also agree with Vickie that we don’t want to wait until we’ve got a
more comprehensive document and explanation to get our basic findings from this
hearing and our responses to them out.
DR. MAYS: I think the question of what is the standards really probably
should be done the way we’re currently doing it, which is to illustrate what
the surveys are, what’s collected, whether it collects it or not, and something
about it, because the people making the decision — I think this will be
helpful as opposed to the presentations.
What we have to realize is that in the presentations they were designed to
do this hearing. That’s not the field, though. There were people who would
challenge, for example, things that were said here. So I don’t want to hold
that up as that’s the standard. I think when they’re tracking the standard, we
really should say here is what is collected, and that we do it by these charts
DR. LUCAS: I think some of the presentations dealt with things that are
going to be very salient to surveys, like some of the methodological issues
associated with collecting the data. For example, we heard today about how some
of the data collection methods associated with income do well in certain
incomes, but they don’t do well for people in other income brackets.
I think all of those things also have bearing on what kind of
recommendations you make for standards questions because they have an impact
and an influence. I’m not disagreeing with everything that has been said, but I
don’t think those two tables alone do it. I don’t know that any single thing is
going to capture the whole depth and breadth of what covered here, because it’s
big and it’s complex.
DR. MAYS: It was the standard. The first thing that we’re asked to comment
on is what is currently in the field, what’s the standard, not critique of it.
I think the stuff that was talked about helps to explain the why-ness of it,
but something as simple as where is education collected —
DR. LUCAS: But I think what Susan was saying was that we have no idea — you
know they have questions. What’s the agreement in the wording? What’s the
agreement in the response categories? None of that information is there. That’s
what she was saying. That, in and of itself, isn’t enough. That tells us that
all these surveys have a question on education, but not necessarily the same
MS. GREENBERG: It doesn’t tell you what the standard is, but I think what it
tells you is at least they’re asking something on education, so moving them
towards a standard is probably more likely than if they weren’t asking anything
on education. That’s a pretty low bar. Also, I have that you don’t want minutes
or a summary of the meeting.
DR. MAYS: We definitely we need the minutes. For this meeting, the more
detailed the minutes are, the faster we can move. The extent to which we can
have detailed minutes will help the entire committee, particularly because we
have people who weren’t here, that I think it will help them to just read them
or listen to them. Can
DR. GREEN: I am going to give up on my question for now because I really
want to make it to Denver tonight. I think our next steps are clear enough that
we’re headed toward distilling findings and draft recommendations for a letter.
We have to do that with the subcommittee. We’ve got to bring other people into
That means that we’re going to need some sort of an event in April where
there is prep work for that and we reconvene the subcommittee to where we
actually have to move toward having drafted some findings and some conclusions
that galvanize that sort of discussion, from which we can then prepare a draft
to circulate to the full committee.
MR. SCANLON: This is Jim Scanlon. It sounded like it was a good two days.
MS. GREENBERG: As I mentioned in my message, we need a little clarification.
There are two big issues: what you’re expecting from the committee and through
what data collections the standards that the secretary will adopt will apply
MR. SCANLON: The second answer is it’s basically HHS surveys. It will be the
same application as the first set of standards.
MS. GREENBERG: Is that only population-based surveys?
MR. SCANLON: Yes.
DR. CAIN: Are those things that go through OMB or does it also apply to
MR. SCANLON: It’s mostly not grants. There are some grants and cooperative
agreements that go to OMB, but we defined it as synonymous with agency
sponsorship. If it’s a cooperative agreement that would require OMB approval or
a grant that would require OMB approval, then yes. It’s the same as the
DR. CAIN: But it’s the OMB approval —
MR. SCANLON: That is the definition of agency sponsorship. This would apply
to surveys conducted or sponsored by HHS. That’s literally synonymous with OMB.
It’s all of the main surveys.
DR. CAIN: For example, would that include Add Health or HRS?
MR. SCANLON: Yes, if they are sponsored by HHS. I don’t know if they are or
not. If they had to get the approval or will have to get it, yes, it applies.
That’s the only practical way. We can’t get at every grant that NIH does
because you don’t even know when you work a grant what it’s going to involve,
and there’s no way to get back at the grantees. But the place to start,
everyone agreed, was the surveys that we have direct control over. It’s the
NCHS. It’s the SAMSHA. It’s the MCBS. It’s MEPS. It’s the basic National
Immunization Survey. It’s exactly the same as the first set of standards.
MS. GREENBERG: So it doesn’t include BRFSS?
MR. SCANLON: It don’t know if that requires OMB approval or not.
DR. COOPER: It does include the National Child Longitudinal Study?
MR. SCANLON: Again, if it got OMB approval, then yes. The National
Children’s Study will be there. Our Data Council member from NIH is putting
together the list of the surveys to which he thinks it applies. I, for one,
would like to make it apply to the BRFSS, but we can’t literally make it. I
would hope the agencies would step up. But it’s a very specific set of surveys.
There’s no way to control the grant world, to be honest. We don’t even see most
of those. There’s almost no way a priori to do that. Frankly, I wouldn’t even
worry about it. I think it’s more the major surveys that people were concerned
DR. MAYS: Are you wanting for us to actually have recommendations in which
we give you specific minimums, like here is education, here is what we think is
the best approach to asking? Part of what we’ve been discussing is this notion
of to tell you what we learned and what we found and give you some
recommendations about what we think next steps are in order to get you to that
minimum standard if you choose to move ahead.
MR. SCANLON: I’ve been listening. I don’t think anyone is in a position to
recommend the standards just yet. What I think would help is the first stage in
any of these endeavors is sort of what is the state of the art. What are people
collecting now? What are the strengths and limitations? Is there a lot of
variation now? Do some of these seem to be emerging as sort of the best
If you gave us what we might call an environmental scan or kind of a
baseline assessment of how these variables are being collected in federal
surveys now pretty much based on what you heard the last two days, I would say,
that would be very helpful to us. I’m not even sure of the standards territory
yet. I think we’ll need more of a collective assessment.
You’ve heard a lot of good things. You saw the way the major surveys are
collecting this information, you saw some of the newer ways, and you heard some
of the issues. I think if you just pull that together for us with a
transmittal, that would certainly meet our requirements. Then we’d have to go
into more depth about is there potential for a standard yet. I think if you did
that, that would be very helpful to us.
You don’t have to recommend the standard. I think that would be a heavy lift
now, and it would take a lot more hearings and work because you’d need to — a
standard is more than just a good question. It’s something that meets a very
high level of performance. That’s why there are so few of them.
That’s certainly one way to go, a nice transmittal, and you’re basically
summarizing and interpreting what you heard. But you don’t have to give
specific recommendations for standards. You might want to say that for the
major surveys like the CPS or census or others, there are ways of collecting
data that could be looked at. But I think the most you would say is to point a
bit of direction. I could be wrong, but I don’t think I heard anything that was
an obvious standard from the two days.
The other thing that I think would help is if you just reminded everybody
that any standards should be based upon proven methods and measurements and
sort of a concept of the core and the minimum, the constants that we’ve
employed previously and that came up again in the discussion. I think that
would be very helpful.
Then we would have to take it in HHS, along with Census, and we’d probably
have to look at what exactly — is there a standard here for education? Is
there a standard for income? Occupation? If so, then we still have to go
through a full process because the standard makes it mandatory.
This is the minimum, obviously. It’s not meant to limit research that
everybody does or to collect additional information. The goal of a standard is
a very tough thing to meet. I think if you, at one level, simply summarize,
analyze what you heard in the form of an assessment of these kinds of variables
in federal surveys and then gave us some directions, I wouldn’t expect
recommendations for standards. Does that make sense?
DR. MAYS: Yes, it does. It not only makes sense, but you just were like a
great psychologist and brought our anxiety down.
MR. SCANLON: That’s why it’s so tough. You would think it would be easy to
standardize age and sex over the years, but it hasn’t been. Everybody does it
differently. I think that would be immensely helpful to us, and we could take
it to a work group. We’ll get the Census Bureau back and all the agencies back.
They won’t have to plow over the old ground. They will be able to say let’s
look at what people think of the way CPS does it. I think some of the witnesses
you had actually compared educational-level measurement and achievement across
surveys, and I think they did the same for income. That gives us a good head
DR. MAYS: I agree. I think we had some very good presentations to help us
move this along. We’re going to be mindful of your time. We really appreciate
you taking a little bit of time to let us know.
MR. SCANLON: I want to thank the committee. That was really excellent. I was
there yesterday and listened to most of today. It really turned out very
nicely. It was exactly what we hoped for, and it save us a lot of original
research and assessments that would take us a very long time to do.
DR. MAYS: I think the next two steps are, one, we need to figure out if we
have any gaps in the information that we need. If we do, I think what we talked
about is having a webinar to get that. Then I think the other thing is the
first thing that we do is schedule a conference call of all the populations.
Let’s try to do that kind of late in April. Do you have a sense of how quickly
we can get the transcripts?
MS. GREENBERG: First, we need to get the transcripts. It’s usually 10
working days. I don’t know if they can speed it up a little. Then based on the
transcript, we can have someone do the meeting summary. They have to do it from
the transcript, obviously.
DR. MAYS: What I am going to suggest then, since it’s about that long, is
that the planning group start identifying if they think there are any gaps and
identifying if there are gaps, who we want to give us the additional
information. Then we should, by the time we’ve done that, be able to talk to
you about where the transcripts are and make sure we get everybody who wasn’t
here up to speed. Then probably we’re looking at maybe by the end of April
having a call with the whole Populations Committee.
MS. GREENBERG: Is that the webinar you were thinking of, or you’re not
thinking you’d need the webinar now?
DR. MAYS: We don’t know if we need the webinar until we check and see if
there’s some depth. I know there were specific things that Susan still wanted.
I just need to find out from here if she still wants those. It may be just a
MS. GREENBERG: For the planning group just one of the co-chairs of the
subcommittee should be involved, Larry because he was here, but if for some
reason he can’t, Sally will do it.
DR. MAYS: Any other suggestions, recommendations, anything else that you
think we should be doing? This is like our agenda for the month of April.
DR. BREEN: I see you’ve got us until late April getting together all the
materials we need, and then assessment of if we need to get any additional
information. In April would this call be to look at a draft letter? What would
that be? I’m just thinking it might make sense to do the planning right up to
June or up to the letter.
DR. MAYS: I was a little worried because we have so many people not here.
What I would suggest is that week two of April we try to determine if we have
gaps. Week four of April would be the webinar. Then by the end of that week or
some other time in that week, we would also have the full Populations Committee
to have a conference call.
MS. JACKSON: That’s kind of late. In order to set up something if you really
want to get the products that you want in April, you kind of need to know by
the end of this month whether or not you’re going to have something set up. A
webinar, from what I understand, is a public kind of discourse. Is that what
you’re talking about in terms of getting information and hearing from people
and setting up a time and date? It’s very time-intensive, and you would need to
know by the end of this month if you’re going to even have that. I’m assuming
that you’re going to be talking among yourselves about gaps and know that by
the end of the month.
DR. MAYS: I forgot we’re in March. I was already in April. We can actually
back up and have our decisions made about if we think we need something else,
who those individuals would be, and when to do that by the end of the month. I
think that by the second week in April would be when we would have if it’s
going to be a webinar, as long as you all can do it within about two weeks
afterwards, because we would have to ask the individuals.
Somewhere near that same week or very close after in week three would be
when we would have a full Populations conference call. I think from the full
Populations conference call should be some of the marching orders of what that
letter should look like. Prior to that, as we go probably to the full
conference call, we want some sketchy idea of what we think the recommendations
would be. I think we can do that by then.
MS. JACKSON: Also, we can talk off line about the specifics of the timing,
but the subcommittee as a whole needs to be involved as early as possible.
DR. MAYS: That is what I am saying. The full Populations Committee would
give us some marching orders of how they would like to see the recommendations
— this is what we’re thinking, and then let them opine on it. Then we have
some time in May. What you have to tell us is I know there’s a step where
things have to go to the executive committee, so I’m assuming that’s mid June
MS. GREENBERG: The meeting is the 21st and the 22nd,
so at least two weeks or three weeks before then to be able to get something
into the agenda book. We’ll come back to that when we see where we are.
DR. LUCAS: Is it possible to get packets made of all the presentations for
the rest of the committee members that were not here?
DR. MAYS: I think what would help is to just maybe send an email out to the
full Populations Committee. Then I’m going to also ask that the things that are
on the little Widgeo site get also moved. Then that way, we’ll have everything.
They can read background. They can get up to speed to whatever extent that they
MS. GREENBERG: I think they would like minutes, so we’ll need somebody to do
that. What’s the earliest we could get the transcript?
DR. BREEN: In terms of prep for the full subcommittee call, is everybody
expected to review the slides and read the minutes in preparation for just a
general discussion of what the recommendations would look like?
DR. MAYS: No, I think we’re back up a little bit more. I think the Planning
Committee has to step in here first before we do that and see the extent to
which it can draft some things. Then that’s, I think, what we would give them.
The Planning Committee needs to come back into this process at this point. I
think Bruce has been gone. We need to make sure that we can get them up to
speed as soon as possible. I’m hoping we can pull all these documents together.
DR. LUCAS: One of the ways that we could identify where some of the gaps are
that haven’t already been identified is to go back and revisit the questions
that we gave to the presenters within the context of the information that
actually got presented and determine whether or not we feel like we got answers
to all these questions, whether or not these are the questions that are still
relevant, or there are different questions that now need to be posed. Then I
think that will help us identify where the gaps are, because I took copious
DR. MAYS: I think that’s excellent. Before we adjourn, I want to really
thank the staff. I’m glad that you loved what happened. But in particular, to
talk about Susan and Jackie and Nancy, it’s a wonder you can’t tell by the
cauliflower ear here in terms of how much they’ve been on the phone and talking
to people nicely twisting arms. I greatly appreciate the assistance to get us
to this. They have also produced a report and some tables, which you’ll get
I want to thank Marjorie, who made sure that I had some guidance. I’m just
coming back into this, so I needed to learn kind of the best way to do it. I
also want to make sure that I thank Bob and Leslie Cooper and also Nancy for
getting the materials out. I hear we had a large number of people who were on
line listening to it. I think they were our colleagues at NCHS and NIH. I’m
very excited about that.
MS. GREENBERG: Thanks to Nicole and Debbie.
DR. MAYS: We are here sitting here being as comfortable as we are. Nicole
was just like give it to me and it was done. She took a lot of worries away,
made sure people had a place to stay. Debbie made sure that everything
happened. So I just want to say thank you to recognize that you played a role
in this. The spotlight needs to be shared. Thank you, everybody.
(Whereupon, the meeting adjourned at 3:20 p.m.)