[This Transcript is Unedited]
Department of Health and Human Services
National Committee on Vital and Health Statistics
Hearing on Minimum Data Standards for the
Measurement of Socioeconomic Status in Federal Health Surveys
March 8, 2012
National Center for Health Statistics
3311 Toledo Road
Hyattsville, MD 20782
CASET Associates, Ltd.
Fairfax, Virginia 22030
- Call to Order and Welcome/Introductions – Larry A. Green, MD, Co-Chair, University of Colorado Health Science Center, Vickie M. Mays, PhD, MSPH, UCLA, Hearing Chair
- Overview: Sec. 4302, Affordable Care Act – James Scanlon, PhD, ASPE
- Purpose and Use of SES Survey Measures Policy – Ernest Moy, PhD, AHRQ
- Defining Socioeconomic Status (SES) – Michael Hout, PhD, UC Berkeley, Dept. of Sociology, Berkeley Population Center
- Panel: Education
- Kurt Bauman, Chief, Education and Social Stratification Branch, US Census Bureau
- Mitchell Wong, MD, PhD, UCLA Center for Health Services, School of Medicine
- Panel: Income
- Connie Citro, PhD, CNSTAT/NRC
- James M. Dahlhamer, PhD, NHIS/CDC
- Tom Selden, PhD, MEPS/AHRQ
- Charles Nelson, PhD, CPS/ACS, US Bureau of the Census
- Panel: Occupation
- Alfred Gottschalck, PhD, Chief, Labor Force Statistics Branch, Social, Economic, and Housing Statistics Division, US Bureau of the Census
- Melissa Chiu, PhD, Chief, Industry and Occupation Statistics Branch, Social, Economic, and Housing Statistics Division, US Bureau of the Census
- Sherry Baron, MD, MPH, National Institute for Occupational Safety and Health
- Thomas J. Plewes, MA, CNSTAT/NRC
P R O C E E D I N G S
Agenda Item: Call to Order and Welcome/ Introductions
DR. GREEN: I’d like to call to order this National Committee on Vital and
Health Statistics Population Health Subcommittee Meeting hearing on Minimum
Data Sets Standard for the Measurement of Socioeconomic Status in Federal
Let’s run the table here with introductions and then also other people in
the room, and identify who we are and ask committee members and others, to
declare any conflicts.
Again, I am Larry Green from University of Colorado and co-chair of the
Population Subcommittee. This morning and for the next couple of days, Dr.
Vickie Mays is leading this hearing. We have a paucity of committee members in
the room, but we have some on the phone. I would like to go to the people on
the phone first, and then we will come right back to Vickie and then continue
around the room. Who is on the phone?
DR. COOPER: Leslie Cooper, NIH. Dr. Leslie Cooper, NIH.
DR. GREEN: Thank you, Leslie.
PARTICIPANT: (inaudible) from CDC Atlanta.
DR. GREEN: Thank you. Jack Burke, are you still there?
DR. MAYS: Vickie Mays, University of California, Los Angeles, chair of the
hearing, and no conflict.
DR. BREEN: Nancy Breen, I work at the National Cancer Institute. No
DR. SONDIK: Ed Sondik, Director of the National Center for Health
MR. SCANLON: Good morning. Jim Scanlon, Deputy Assistant Secretary for
Planning and Evaluation at HHS.
DR. MOY: Ernie Moy from AHRQ and we would love to have SES standards. That
would make my life a lot of easier so I guess I do have a conflict –
(AV sound goes off)
MS. BEEBEE: Suzie Bebee, ASPE.
DR. HOUT: Mike Hout, UC Berkeley.
DR. HOLMES: Julia Holmes, the Division of Biostatistics here at the National
Center for Health Statistics.
MS. CYNAMON: Marcie Cynamon, National Center for Health Statistics.
DR. WONG: Mitchell Wong from UCLA.
MS. GINGOLD: Janet Gingold from the University of Maryland.
PARTICIPANT: Patricia — from OAE here at NCHS.
DR. CORNELIUS: Llewellyn Cornelius, chair of the NCHS Board of Scientific
Counselors and faculty member from the University of Maryland. No conflict.
DR. JONES: Jo Jones, NCHS, DVS.
MS. COOPER: Nicole Cooper, staff to the committee.
DR. GREEN: Did someone just join us on the phone?
MR. BURKE: Hi. I am Jack Burke, member of the National Committee and Vital
and Health Statistics, member of the Population Health and Privacy
Confidentiality Subcommittee and compliance officer at Harvard Pilgrim Health
Care at Boston.
DR. GREEN: Did someone else just join? I think we have the roll called. Our
first order of business is to hear from Dr. Scanlon.
Agenda Item: Overview: Sec. 4302, Affordable Care
MR. SCANLON: Thank you, Larry, and good morning everyone and thank you for
being here. Let me start with a little background and set the stage for what we
have asked the committee to help us with here. Many of you will remember that
the Affordable Care Act just about a year ago, the anniversary coming up this
month, included provisions relating to reducing disparities, reducing
disparities in health and health care.
One of the means of doing that was to improve the data situation and
specifically the Secretary of HHS was directed to adopt a set of data
collection standards that are used in our major federal surveys. The law
specifies specifically what that first set of standards would be. It focused on
sex, race and ethnicity, the primary language, and disability status.
And then the statute also gave the Secretary the authority to adopt
additional standards if she so wished. The Department actually went through a
fairly systematic process culminating in the adoption of the standards I
mentioned back in October. The task was given to our HHS Data Council. The
council viewed all of the — looked at our whole portfolio surveys, including
the Census Bureau surveys, ACS, CPS, and so on, plus our own HHS portfolio, and
pretty much looked at the way we were collecting the data now, what were the
ways that had proven to work in field settings and work well, had some
validity, and so on, and to put forward a preliminary set of recommendations.
The Data Council did that and cleared all of its way through HHS.
Then we published them for public comment. I think it was a 60-day in the
summer, a comment process. And pretty much the public comments supported the
way we were going. In some cases they wanted — I think folks were
misunderstanding what a minimum standard is versus every conceivable item you
might want to ask about disability.
Again, the concept here and I will describe as how we approach this. But
otherwise there was a fair amount of support. In many cases people wanted
additional granularity, but again you have to get this just right. And in some
cases people just thought we were all wet and the Affordable Care Act was all
wet and the administration was all wet, and you should go home and pack this
all up. But that was a small minority.
We made a couple of tweaks based on the public comments and then finalized
the proposal, sent them once more through HHS and our Census Bureau partners
and OMB, as well. And finally the Secretary actually adopted these in October.
They were not sent to the American Public Health Association meeting, but there
was a fair amount of press activity as well. They are adopted and they are now
available on the HHS website including the Data Council website and the Office
of Minority Health website.
No sooner had we adopted that first set of standards and we are now in the
process of actually implementing those standards in our surveys. What we have
asked and you all know how this works, for big major surveys you don’t just
suddenly stop something and change. The way you introduce change normally is
through at the next revision or for any new survey, that is when you can
introduce the change and you don’t make big changes without — it would be
foolish to make big changes just stopping without getting some validation and
planning it into the calendar.
At any rate we are now in the process of looking at the implementation of
all of those standards, first set of standards into the HHS surveys. It turns
out that the Census surveys for the most part CPS and the American Community
Survey were largely getting this level of detail already. In fact, we thought
that was a good place to look for a standard with a fair amount of granularity
and detail. But, again, we are working with Census to see. There are a couple
of tweaks where we might in the future make some additions.
But as I said shortly after we had adopted the standards everyone was asking
what will be the next set of standards that you will focus on. In the
consultation process by far the next recommendation was to look at SES. I think
what they meant there was to look at income, the measurement of income,
educational level. You could look at industry and occupation and profession and
so on. But I think it was meant to be the standard SES type items that we
include in our surveys. The Secretary then asked us to — all right, you have
finished the first set of standards. Now, I would like your best thinking about
the potential for SES data collection item standardization as well.
We put it off for a little bit because we wanted to have the agencies get a
good start on implementing the first wave of standards and I think we are at
that point now. But now we are turning our attention to what might be the
potential for SES type variables. We thought that one of the best ways to do
this would be to ask the national committee or advisory committee if they could
hold a hearing, a public meeting and try to get us a — pull together for us in
essence the state of the art among the current portfolio of federal surveys.
What questions related to SES are being asked now? How are they being asked?
What are the good practices? What has worked well and what hasn’t worked as
To give us a sense of the potential for standardization. As you know
standardization is a double-edged sword. If you standardize too soon, you are
probably going to hurt the measurement and the variation and so on. And if you
standardize too late, you have a lot of inefficiencies.
It is fair to say that almost all of our surveys we ask income and it is
probably slightly different on all of them though many of them are collapsible
to categories that are standard. In addition, I think we always look for the
ability to categorize income into federal poverty level percentages. That way
we can relate it back to policy as well.
I could say the same thing for education. From what I have seen there are
people just asking it in different ways. Again, we don’t know what the
potential for standardization might be. But, again, we have asked the committee
to do this to get the best thinking of the current practice in federal surveys
and hear from other experts who know a lot about this and have done a lot of
thinking about this and then to pull it all together. Number one would be what
is the state of the art or the state of the practice at any rate. And then
number two what are the variables and what is the potential for
standardization? Again, a standard — the level of effort required to adopt a
standard is much higher than simply a good question. In other words, you
wouldn’t just make up one question for one survey and decided that is a
standard. To reach the level of a standard a fair amount of validation and
practice is expected. Again, just because it is an interesting question doesn’t
mean it is going to make a good standard.
When we adopt standards as we did for the first set, it actually requires
all of the surveys that we sponsor to ask the question, include these
questions, and to ask them this way. It is a fairly high requirement once
something is adopted as a standard.
Now there are other instances where all we decide is we need some research,
we need some good evaluation so folks are maybe just trying out various
questions and that is fine. That is part of the research process, the
measurement process. A question here or there is not a standard. If you will
keep that distinction in mind. A standard has a fairly high level of proof and
burden in terms of adopting it. It disrupts in many ways a lot of things so you
have to know why you are doing it and it has to outweigh the variation itself.
That is what our request is of the committee.
I think we were asked to — we are hoping to get some findings back from you
by the June meeting, I hope, at least in terms of the lay of the land. You
could look at this and say the uses of income data are so varied that there are
almost no other than a few principles there is no way to standardize. Or you
could say that if you approach it a certain way, you might think about this. Or
you might just fall back and say there are some better practices, but they are
not the standard yet.
I will make one more distinction and then I will open it up. When we develop
the concept of standards in HHS and you are all familiar with this, we develop
the concept of the minimum data collection standard. What we do is specify the
minimum categories that must be collected. Any agency can ask as much more
detail as they want and need and can justify and estimate for as long as they
can either collapse the other categories into the standard or in the case of
disability, for example, where you have to ask that question. You can then ask
as many other questions as you want. Again, the idea of a minimum data standard
is not to limit the amount of information that anyone can collect about any
variable, but it does lay down a minimum that everyone will use or collapse to,
be able to collapse to, and you can see why you would want this.
We have talked about this. Ernie remembers this as well. Could we agree on a
social and demographic dataset that we would all more or less standardize and
we could include all those questions in our surveys? We sort of have that, but
it wasn’t really standardized. But this is sort of getting at that. Any survey
that we conduct that Census conducts and others would have at least these
items. They would be defined much the same way. You would at least have a way
of relating the surveys in terms of those basic — questions.
Let me stop there and see if there are any questions.
DR. GREEN: Any questions?
DR. SONDIK: Hi, Ed Sondik from the National Center of Health Statistics. I
know there is a session later on defining SES. Dr. Hout is going to lead that.
You implicitly defined it having to do with income and education. But there are
in other countries they certainly look at SES in that way, but they also have
other concepts. And one that is prevalent in many other countries is class
which can be defined in different ways, but social class. I wonder if you see
that as having a role here.
MR. SCANLON: When I mentioned income and education, those are the examples I
think, of what we clearly would look at. If the committee sees other — and,
again, we are looking for a standard now. If it is to try out some questions,
that is a different kind of a recommendation. We are looking at SES related
data collection standards and the current portfolio. But we don’t rule out
anything to be honest. I think we focus — the obvious ones are income and
education and to some extent occupation and industry. You see it collected in a
lot of surveys, but you don’t often see it analyzed that much. It is a very odd
— for establishment surveys they are used a lot. But that is fine. If there
are other — now again it has to be — there are other criteria for what would
make a standard. It would be helpful if there were some use when you are
evaluating those items that someone has collected and it seems to be getting
more or less what you want. The people seem to understand it. But we are not
ruling out any items.
I could see, for example, that the committee could recommend that there are
— I don’t want to presume to speak for the committee, but there are three or
four items that could be standardized along the lines of the minimum dataset.
You could obtain additional granularity. And then there are some other items
that would be interesting to look at. I don’t think they would be a standard
necessarily, but areas for future evaluation. But we are not trying to limit it
to those two or three.
DR. GREEN: Any members on the phone have clarifying questions? It is my
pleasure then to turn this over to Dr. Vickie Mays. We are 2 minutes ahead of
time at this point. And Vickie will lead us into the hearing.
DR. MAYS: Actually what I wanted to do was comment on what Dr. Sondik
raised. I think the issue of social class often is raised within the context of
social status. I think part of what is going to happen during the hearing is
that that is something that we are subjecting some of these other areas to. I
think in particular we are going to be asking the question of those people who
are talking about occupation, is where is the prestigious status issue. We use
income – I think the way in which we have been thinking about it is income is
being used as a marker of where you are in society, what that income can
actually position you to do or not do.
I think what we are going to hear also in the education is kind of the same
concept. Are we still getting that status by looking at educational attainment
or is it that the way in which education has evolved that we need some kind of
modifier of the attainment to give us a better sense of status like are you in
a public school or a private school or a charter school or something like that
to get us there? Do we need to also know something about some other aspects?
I think that what you are raising is probably one of the most critical
issues of the hearing is if we are talking about social economic status, it
really is this position. That is what I think we are trying to ask whether or
not these questions still maintain that and if not do we need to go to
something else in order to be able to get that.
MR. SCANLON: The other criteria — the data items are for — it is in
essence, self-reported. It is a survey situation where you have an interviewer
or self completed or telephone or web based, but in essence it is a self —
certainly for the first set of standards it was a self identification, self
report and here I think it probably would be somewhere.
DR. SONDIK: Dr. Moy may get into this in the next section here in terms of
the purposes. But certainly one of the purposes for all of these variables I
think, is whether or not anyone of them is a barrier in some ways. This is
obvious in terms of race, but it is also the same as income, education, and the
like. It is just something that hasn’t been as clearly formulated in the US
with respect to class, as to whether it is a barrier.
But there is a wealth of literature, I don’t know if it is a wealth, but
there is literature on this with respect to other countries. And certainly when
I think of this and I think of our surveys, I think of it in terms of income
and education. And certainly one of the reasons for standardizing these is so
that it is easier to make comparisons across the different data collection
In the government we are often criticized even on age, for that matter,
well, the groups don’t exactly match. This causes a problem. We certainly have
this multiple times over I think with respect to income. I suppose education. I
haven’t heard that criticism so much. That is another reason, I think, that
standardization is important that will enable us not only to collect it in a
more uniform way, but to actually make these comparisons across the different
MR. SCANLON: There is a nice — Ernie will remember the — in fact I think
AHRQ supported this study at the Institute of Medicine, that looked at the
first set of standards in essence: race, ethnicity, language, not disability,
but I guess it was language. The work group at the IOM went through — the
focus was on quality of care largely. It looked at these factors in looking at
disparities in quality of care. And in fact we relied on those recommendations
for the kinds of standards that we were looking at. They did not look at SES or
others. And again it was quality of care. I think that is about as far as they
wanted to go.
Remember as well though the measures like income are really very important
for not only epidemiologically and research wise and disparities wise, but they
really are important for policy. When the Affordable Care Act provides a
subsidy up to a certain percentage of the poverty level and then another one,
it is almost crucial that the income questions that we ask allow us to do that.
It is interesting research wise, but you have kind of missed the boat. Tying
this back into evaluation and usefulness for policy research. Some of these
variables probably are not policy in the sense that you can manipulate them,
but they often explain a fair amount of information.
One other thing. On the language side and you can look at our website to see
how we came out here. Again, this was for surveys. What we ended up — we
looked at the IOM reporting. We looked at some others. It turned out that what
really made a difference in terms of variations in health status and health
care access and outcomes really was not so much language as English proficiency
on a population basis. Now, obviously for individual treatment and care and
individual decision making the specific language would be important and we
allowed for that in our standard. But it was clear that the concept of English
language proficiency some measure and what we adopted was more or less along
the lines of American Community Survey. That was a measure that actually
explained more variance in some of the important variables than the specific
language itself. Though again we allow for the additional granularity for
DR. MAYS: Thank you very much. I think that you have definitely launched us
on a good focus. The committee is very appreciative of being asked to help out
with this because it is a very important task.
MR. SCANLON: My pleasure. I can assure you that the recommendations will be
— people will be waiting for them.
DR. MAYS: Knocking at the door already. Let me turn now and ask Dr. Moy
– Dr. Moy, our committee knows very well as he in the past had been a
participant in the populations committee when we were doing some of our
population health work. He is with the AHRQ. They, for example, have worked on
looking at SES in the Medicare beneficiaries. They also come to us with a lot
of expertise. Welcome and we are very glad to have you.
Agenda Item: Purpose and Use of SES Survey Measures
DR. MOY: Thank you. I actually have slide sets or handouts. I don’t know how
to access that or else I could just describe it for you. I can start and say
that I am delighted to be the first external speaker because I can then talk
about all the non-rocket science stuff and I think that is appropriate for our
reports because fundamentally I am going to be talking about work that comes
out of the National Health Care Quality reports and disparity reports and our
related products and their efforts to try to summarize information about
quality of care and disparities of care for Congress and congressional staff
and federal policymakers. In other words, making it pretty easy to understand.
And I think that is one of the major benefits of standardization and
particularly standardization of SES. That would greatly help being able to
translate this information for your lay user as opposed to expert users here.
I was going to start off by showing you how very explicitly we have been
charged with dealing with issues of SES. I wrote in the first slide the charge
that we have for producing these two reports. One thing Congress has asked us
to do for the disparities report was to focus on disparities related to racial
and ethnic factors and socioeconomic factors. We have a direct charge to look
at SES as it relates to disparities in health care.
And I also have on that slide some of the guidance that we received from the
IOM. First, in 2002 before the first reports came out and again in 2010. That
was actually related to the race, ethnicity and language standardization,
recommendations that the IOM produced in 2010. But here again they gave us a
very explicit suggestions for looking at SES disparities, number one. And
number two, highlighted in red one of the things they told us is you need to
stratify disparities by SES. That is that race and ethnic disparities are
really not understandable or actionable without stratification by SES. We have
these two very strong guidances for SES and its inclusion in our reports first
of all as primary disparities to examine and second of all for stratifying
other kinds of disparities.
I include at the bottom there a link to a document from our 2002 IOM
guidance because it might be of interest to this committee if they haven’t
looked at it. Marian Gornick produced a whole chapter that was related to
measurement of SES and how that relates to examining disparities in health
We got this charge and we are supposed to do all this stuff with SES. How
are we going to operationalize it? These are some of the early decisions and
constraints that we encountered using SES in the reports. One of the first
things that we realized was that for the reports we wanted to summarize the
quality of care for the nation. That is a very broad undertaking. We tried to
summarize across all different kinds of settings of care and types of care and
conditions. We didn’t really have the luxury of collecting extant data nor the
resources to go and tell federal survey people or other data collectors please
collect it this way because we would love that — was using extant data only.
But one of the first things was it is going to be really hard to talk to
Congress or to federal policymakers if we have to have a hundred footnotes
under every table and figure display. For this figure by poor we mean this. By
less educated we mean this. Our first notion was that we had to standardize it
just to make it understandable, just to make the reports readable and usable to
For this we created an artificial SES hierarchy. We developed this with the
help of our interagency work group with representatives across the department.
And this is why rendering of the initial hierarchy that we used for SES
measures. You see on the top we have household income then individual education
then insurance type which we thought was relevant as a proxy for SES for health
care and then lastly area income.
We also had to create some kind of thresholds and we tried to stick to
Healthy People 2010 whenever possible. It was available. Most of the surveys
could support those kinds of contrasts. Next to each of the items I have
“standards” that we created artificially for use in our report. We
actually pretty much stuck to these. For income we use it relative to federal
thresholds and we break it up into the four groups: less than 100 percent, 1 to
199 percent, 200 to 300 percent, greater than 400 percent of the federal
poverty thresholds. For education we break it up into less than 12, 12, and
more than 12 years of education.
And for insurance type for which there was not a Healthy People 2010
standard we created one that is commonly used both in our MEPS surveys as well
as in the NHIS. We break up under 65 with the older 65, the under 65 we break
it up into people having any private insurance, public insurance only and no
insurance and for Medicare where more than 95 percent of the people have
Medicare. For 65 and older we have Medicare plus Medigap, Medicare plus public
typically Medicaid and then Medicare only. Those are categories that we used.
Why do we not have more detailed insurance? I think beyond this the feeling
from MEPS and NHIS was more granular insurance information typically was
unreliable. That is why they are clumped into these larger categories.
We also encountered a lot of databases where there aren’t surveys. We used
things in our reports that are vital statistics or coming from providers or
hospitals. And for that we had to create alternative SES categories. We used
area income as one of the things that we looked at typically dividing it up
One of the earlier decisions we also made was not to do a composite. And the
rationale was pretty much straightforward and two-fold. One, we didn’t want to
have to explain a composite to federal policymakers in Congress. And number
two, there was difficulty developing some standards for what we did have. To
develop standards for a composite would just be all that harder to do. We
actually make the explicit decision not to use composites.
I will give you some examples of how we typically show SES in the reports.
The first is a typical rendering that we have tracking a particular measure. I
can’t actually see on my slide what measure it is, but tracking over time for
race and ethnicity up on the top half of it and by self-reported income on the
bottom half of it. You see the typical gradient that you see. This is very
commonly what we see. Low income people get a lot less care than higher income
people. It typically tracks — over time and this is a standard presentation of
disparities, by race ethnicity, and SES on the bottom that we show in the
As requested by the Institute of Medicine, we do include stratified
analyses. And all of our tables are stratified by race ethnicity and SES so
people can get that exact SES by race ethnicity component that we are
interested in. This is, again, an example of a typical display that we will
In this particular case, race on the left and ethnicity on the right,
stratified by education. This is for a measure of adults reporting poor
communication with their usual provider. You see what we typically see as well,
but it varies by SES status. The magnitude of the racial effect. And the flip
would be true. If you are focusing on an educational effect, it would typically
vary across race ethnicity. I think this highlights the importance of
stratifying by SES. If you really wanted to develop policy that targets a
particular group knowing the intersection of race ethnicity and SES allows you
to identify that population and greatest risk and make policy initiatives more
And then the last thing — somewhere along the line people complain that we
should include some multi-varied analyses. Of course we have to do multi-varied
analyses to get the independent effect. This is just how we typically include
it. Now we can show the independent effects in this particular case as adjusted
percentages for race ethnicity, income, education, et cetera kind of effects.
This one doesn’t include insurance because the dependent variable here is
actually percentage of people under 65 that are uninsured all year. And, again,
this highlights the importance of — there are independent effects. They
typically are different. And, again, this probably potentially could help with
targeting policy initiatives.
We want to do all the standardization. We actually could have done all that
and just have lots of footnotes to say what each of the different categories
meant. But I think the real advantage with standardization is allowing us to
now summarize over large numbers of measures or measure sets. This is a typical
rendering that we have. It highlights parts of the two reports. On the left —
it is probably too little for you to see, but there are a bunch of different
kinds of contrasts.
Going from the left to the right and I have to look for this. There are a
bunch of Asian contrasts, Black/White, Asian/White, and American Indian/White.
There is a contrast to Hispanics versus non-Hispanic Whites. And then the right
is the contrast between low income and high income populations. These simply
try to identify for a measure set. In this particular case our core measures of
quality of care.
The proportion of those measures where the indicated population, for
instance, low income does worse than the high income population and that is
labeled in green. The proportion of measures was roughly the same in blue and
the proportion where the disadvantaged population does better than the
advantaged population in Black. For this particular thing you have quality of
care measures on the left and access to care measures on the right and you see
what we typically see which is SES or income typically has the largest effect
of many different contrasts that we might otherwise want to examine. But we
wouldn’t be able to do this without standardization. It is only by
standardizing in some way that we could actually compare contrasts across
different measures coming from different measure sets.
The other thing that we do is we do state snapshots. And the focus of this
particular product which was a web-based product and developed off of the
quality report and disparity report measure set, but focuses on something
different. Instead of showing national trends here we look at state
performance. The focus here is comparing one state versus other states. And
traditionally we have looked at quality of care issues. That is pretty
straightforward. But we have gotten a lot of requests for looking at
Now for a couple of years we have looked at racial contrasts and income
contrasts again. This allows us to isolate not just how a state is performed
and compared to other states, but how the gap in performance between our low
income and high income population is doing in a state compared to the national
gap. For many states this has been valuable to them and I would encourage
whatever standards we pick to support such analyses because for some states
their quality of care can actually be pretty good, but they can have large gaps
in their state. And this then helps them target efforts towards reducing that
gap — necessarily trying to change quality which is already at a high level. I
think it directs them to focus more on the disparities issue than the quality
I will start off by saying we grappled with the issue of occupation and we
don’t know what to do with it. But this is one of the observations we have had.
One of the things that we were asked to track was workforce diversity. And we
noticed that there were “natural” social hierarchies for certain
occupations. On the left here are shown nurses and how the nurse workforce is
distributed in terms of race ethnicity. And on the most leftward one are
registered nurses and on the middle one are licensed practical nurses and
licensed vocational nurses so slightly lower on the occupational hierarchy. And
on the right is the US population. Maybe not rocket science. Whites and Asians
are disproportionately represented among the RNs and Blacks are
disproportionately represented among the LPNs and LVNs.
On the right we have members of the dental workforce. They are going from
left to right dentists, dental hygienists, dental assistants, and then the
general population. Again, there is some natural social hierarchy there. We see
something similar which is Whites and Asians — over represented among the
dentists and dental hygienists. And Hispanics are over represented among the
dental assistants. I think occupation is interesting when there is some kind of
natural social hierarchy that can be established. We are less clear in terms of
what to do when we are looking across all occupations and that kind of
hierarchy of power does not naturally exist.
I mentioned that we didn’t do composites and we don’t do composites in our
report. But I am going to switch hats over to my research hats. We did have a
project that was actually begun at CMS and that we kind of inherited. But one
of the components was looking at a composite. This was very specifically first
of all a composited SES for Medicare beneficiaries, number one, and was based
on area, a sense of area information at the block group level. This just shows
the different components that went into this particular composite and the
principle components loading. And I am told that those numbers indicate that
these components were all important and valuable as part of an SES composite.
We have done stuff with this on the research side and it seems like it
functions very well.
But again this is nice to have for research purposes. It is not something
that we actually use in our reports, but I thought I would mention it since we
were asked to talk about.
This is my summary. My overall summary for our experiences working with SES
for the purposes of our report is one that most federal surveys do collect
income, education, and insurance in ways that can be standardized. Are they
exactly the same? No. They are not exactly the same. But I think they are
issues that I consider on the margins. For instance, we have encountered issues
for education. What do you do with GEDs — highest graduation? How do you deal
The other issue that there is some variation is, what is the minimum age at
which you start determining educational attainment. It is at 18? Is it 24 or
26? Exactly what age is the cut point? There is some variation related to that.
For income, some of the things that we have encountered or — does it
incorporate different kinds of governmental transfers and benefits? Is it just
the cash income that they are getting? But these I think at least in my level
there are things that are on the margin that in general a lot of these issues
The flip side, of course, is we are not collecting information and a lot of
other things that might be important in wealth, childhood SES. These are
probably really valuable, but they are not generally available. But against
this we have to balance the fact that we see declining response rates in our
surveys. People say they are too long. Income is already something that people
don’t like to respond to and trying to get wealth or childhood — these other
wonderful things might be even harder to do. We would love to have it. Is it
gettable? I don’t know.
Another observation that we have made at least for health care purposes I
think when you talk about social economic factors, we are really potentially
dividing them into two different kinds of buckets. One is factors that relate
to financial assets and overcoming financial barriers, things like income or
wealth or insurance. And there is another bucket which might be separate and
discrete. And I think a lot of times we observe they have different kinds of
effects on disparities and these relate to social assets. Education might be
important to help people understand disease better or to comply with treatment
recommendations and there are other issues that might be related to that like
health literacy or spouse’s education. I would probably encourage people not to
— if you are going to composite, don’t necessarily lump those two together
because I think they might be capturing different concepts.
The third is occupation. I don’t know what to do with it. It is clearly
valuable in I think when we are looking at stratified hierarchies, but overall
I am not sure what to do with it. Those are my comments.
DR. MAYS: Thank you. Questions?
MS. QUEEN: Susan Queen. For the disparities across states, what data are you
using for that data for all states?
DR. MOY: We use a couple of different things for that. We use hospital
claims data. That is one of the contrasts that we look at across states. And
then we use BRFSS to look at contrasts across states. We wish there were more
things because this really is an area of huge interest by state policy makers.
There is a lot of activity going on right now to focus on the issue of
disparities at states and they are looking for data and finding that they
either have link to data or they have data in their own state, but they don’t
have any benchmarks against which to compare. I think this is an area of growth
let’s say in the future.
DR. BREEN: Nancy Breen, NCI. Ernie, I don’t know if you think about this
because your mandate is to just to get the data out. Have you thought about
whether you think that the data that you collect and report are adequate for
purposes of policy and programmatic interventions or is more needed? The books
are big and they are long. The reader would need to do a lot of integration in
order to come up with some recommendations. I am wondering if you feel they are
DR. MOY: I think the answer obviously is no. That the amount of information
that you need to generate goes well beyond what is in the reports, but I will
flip it by our standard response. But I also think it is true which is that we
think of ourselves more as a primer and a motivator, but really for people to
develop initiatives to improve quality or to reduce disparities in their place.
They need to actually look at their place and they need to get data about where
they are and what are the specific quality issues that they are encountering
and what are the very specific disparities they have because it varies so much.
If nothing else the state snap shot is focusing on quality and disparity shows
us how much variation there is at the state level and if you drill down to the
local level, that variation is even larger.
I think we as the federal government can’t give them the information that
they need to develop a policy to develop a quality or disparities issue. We can
only get them motivated. At some point they have to take responsibility for
going and getting that data, looking at it, and developing the solution because
it is not a one size fits all kind of proposition.
DR. BREEN: What you just said actually is something that the committee has
been struggling with and that is that we are collecting data and reporting data
at the federal level and that is our mandate. But you are suggesting that the
policy prescriptions and the actions need to take place at the local level so
that support for that kind of data collection and possibly even those policies
might be needed even at the federal level.
DR. MOY: Yes, absolutely. And I do think one of the most important federal
things that we can do is develop standards because the states and the
localities don’t want to have to figure out what the best way is to collect
something. They don’t have the resources to do something like that. And in
general they might grumble if we make them change something, but overall they
generally like to be told this is a good way of collecting it. You don’t have
to do it this way of course, but this is a good way and then if you do it this
way, you then have comparisons and benchmarks and some understanding of what
you are collecting.
DR. MAYS: Let me ask you the question about — I like the way that you put
this into different buckets. And one of the buckets that if possible to hear a
little bit about is the approaches that you all took in terms of deciding
social assets because that is really getting at it. It is one thing to say
income education, et cetera. Those other variables stand by themselves. But
when we want it to be social economics status that is exactly what we are
trying to get at. Can you talk just a little bit more about the kinds of
discussions you all had about the social assets or social prestige or social
DR. MOY: I can’t say that we have a lot to work with. Pretty much I think we
have had just education. What we have reflected is often for some quality
measures we track. We will see strong income gradients and for some we will see
strong education gradients on both, but they just don’t necessarily track
together and that is suggesting to us that they are slightly different
All the work that has been going on in the outside world about what it takes
in essence to get high quality care suggests that the social assets are really
important. We have a whole initiative looking at health literacy, trying to
promote health literacy, documenting how infrequently providers think of health
literacy when they are telling their patients what to do. It is confounded of
course even more by language and translator availability issues and stuff like
that. I think we have simply talked about this as an issue that is probably
discrete from the financial aspects and financial barriers of health care. This
is a very early conversation I think in this particular area. We don’t know how
best to measure this. We don’t know what this particular dimension would look
like going beyond education. But we think it is certainly worthy of
DR. MAYS: Very helpful. Let me turn to — because we have some members on
the phone, and see if there is anyone that wants to ask a question before I
turn to the audience. Anyone on the phone have a question?
DR. SUAREZ: This is Walter Suarez. Can you hear me?
DR. MAYS: Walter is one of our committee members. Yes, we can and welcome.
DR. SUAREZ: Hi. Good morning everyone. First of all, I am Walter Suarez with
Kaiser Permanente, member of the committee and no conflicts. I wanted to ask
about your perspectives on how its methods and categorization of socioeconomic
status indicators of a fair way of — or compared with some of the
international standards that are being used to capture socioeconomic status in
population surveys, perhaps in other countries or may be standards that have
been made available by WHO or by other international bodies. Do you have any
perspectives on that or any sense of how different or similar these
categorizations are when you look at international data?
DR. MOY: I think our knowledge of what is being collected internationally at
least my knowledge is somewhat limited. We do work with the OECD. They do some
disparities contrasts actually, but I can’t say that I have done explicit
categorization or comparison of our approach with theirs. I think some of the
things that are commonly used in other countries we just can’t — would have to
be translated in a large way for our country. I know in England they have
looked at class a lot because they have a whole classification system. I just
don’t know. I kind of feel that is probably not applicable to our country.
MR. BURKE: It is Jack Burke calling. I was taken with one of your remarks in
your summary slide about the general unavailability of childhood SES. Are you
familiar with any other innovations here or other countries that would help us
shine light on that one?
DR. MOY: Which one?
MR. BURKE: Childhood SES.
DR. MOY: I think and I will look to the pediatric people to help me. As I
understand it, doesn’t child survey, the big NIH study, collect some
information tracking forward about childhood in common and stuff like that.
MR. SCANLON: The National Children’s Study. That is a longitudinal study
really from birth almost to adulthood. I am not sure what they are asking,
Ernie, in terms of — I think to some extent children’s SES is a function of
the family’s SES. I think we could probably — we don’t analyze it separately,
but we have. I could see where some of the measures of parental education,
parental income, and other factors would help explain. I think that is where
that particular measurement comes from. I think Ernie was saying we don’t
usually publish that way.
DR. MOY: I think there is research done out there that suggests that adult
patterns of health care are determined by their childhood circumstances. But I
think this is at the research level, not at the routine data collection level.
MR. BURKE: Insight about childhood SES from parental SES.
DR. SCANLON: That is one way to do it. But there are other surveys.
Periodically we conduct the health interviews. All of our surveys include
children pretty much. The HANES program, the Health Interview Survey and so on.
They all include — they are household based so they include all sorts of
measures about the family and the household. I think what we haven’t done
though, Jack, is we probably haven’t — unless there was a focus on the sample
child and I think we do have that in the Health Interview Survey. I am not sure
we ascribed — there wasn’t a separate SES for the child. It was derived from
the family I am guessing.
DR. SONDIK: Well, I think that is true. I was just going to bring up —
there was a report that is produced by consortium of federal agencies on the
well being of children, key national indicators of well being. That has some
key measures that it publishes each time, each year on children in poverty,
education. But those are overall measures and they are not linked in some way
to other measures that you might be getting. Although sometimes some of the
other measures in that summary report are broken out by of course education and
family income. But I don’t know of anything else that goes beyond that in terms
of children’s SES unless we go afield from SES and get into the other aspects
of social status.
DR. MAYS: The only thing I know that is, again, coming up is this was
discussed. I don’t know what Ashley got in, but this was discussed for the
national childhood study. I think this may be something that is to come and
that they are working on.
PARTICIPANT: We can look at the measures there.
DR. MAYS: I think we have time for one more question in case anyone over in
our audience over here wants to ask a question.
DR. BARON: Sherry Baron from the NIOSH, the National Institute for
Occupational Safety and Health. I was interested in your comments about
occupation. I am wondering if you ever looked at the issue of whether people
have employer provided, health benefits, or employer provided sick leave which
is a standard question included in the National Health Interview Survey.
DR. MOY: The quick answer is no. We haven’t. We have taken that whole
private insurance group as a whole, but that is an interesting notion.
MR. SCANLON: We might mention that when the focus of the study is on work
then, for example, all of the health workforce studies as Ernie was indicating
we clearly go into a lot of detail. I think we have 15 or more health
professions and we try to collect data and trend data, everything from
physician to technicians and so on.
When the focus is on health professions per se, and trends and supply and
demand and distribution then they are at least on the health world there you
focus on the specific occupation. I think there we could clearly use more data,
but there I think we sort of know how to do that. It is in a general household
sort of thing that it is a little more difficult. You all have had the same
experience. I think there is a set of nine from the Census. Then you can
provide more detail. But it is a little tricky in terms of what you can analyze
exactly on occupation.
One more thing, multivariate. This is tricky. I think my group was actually
urging Ernie to do — AHRQ to do the multivariate. When you take one or two
variables at a time, so you do age, sex, and race ethnicity. That is what you
have. It is hard to get another variable in there. Or you take any other
variable, health status, quality of care, health care use, insurance coverage,
whatever it is. If you do that kind of a table that is the level you deal with.
When you try to break it down into finer distinctions, you run out of cell
The idea was is there some way to include, for example, income in there or
educational level as well at least as one more factor, potentially explanatory
factor. It becomes very difficult because of sample size. There are
multivariate techniques. But what we found is the more — the techniques
sometimes don’t provide a transparency that people want to see and they don’t
understand what exactly we did.
I have the wonderful example where one of our programs did — this was to
determine medical supply in local areas. Larry, you will remember this. This
was designating local medically underserved areas. For a long time there was
basically the basic manpower kinds of statistics used in designating those
And then it was thought that a more sophisticated, multivariate approach
would actually be much more sensitive. They actually did a factor analysis and
revised their regulation based on it. And when they published it, no one knew
what they were doing. They literally had to redo the whole thing because no one
understood. People wanted to see what exactly were you counting, number of
doctors per thousand, and number of nurses, all of this per thousand. This idea
of transparency, again, not that you will never use indices and multivariate
techniques, but it has to be done in a way where it is pretty clear how you got
there. And if you are using it to withhold money or spend money or declare one
state or one area in a preferred position to another, you better be very
careful. These things we learn the hard way.
DR. MAYS: Thank you very much. Dr. Moy. It was a lot of insight for us in
terms of what we face.
DR. GREEN: We will take a break here. Be right back at 10:15.
Agenda Item: Defining Socioeconomic Status
DR. MAYS: Welcome back everyone. We are going to get started again. We are
ready to do our presentation in which we are going to talk about social
economic status in terms of concepts and measures. I have my colleague here
from UC Berkeley, Michael Hout, and Michael is also with one of the population
centers that is funded by NICHD and is a sociologist by training. I will turn
it over to Professor Hout.
DR. HOUT: Thank you very much, Vickie. Thanks very much. I hope to address
some of the issues that have already come up as well as some things that
perhaps you haven’t already thought about. Along the lines of some of the
issues that Dr. Moy was just talking about I am also thinking in terms of
occupation and socioeconomic status in terms of barriers as well as resources.
On this slide I address it in the language of advantages, ability, and
privileges. But think of the privileges along the same lines as the barriers he
was talking about.
People who are in a privileged position are in part advantaged and in part
have abilities that we are trying to capture in their educational attainment or
the kinds of work they do so that we think in these terms of what got them
ahead in one sphere giving them advantages elsewhere so that the famous links
in the labor market or the marriage market, doing well in school improves your
chances in both of those. But it also might be the case that educated people
have better health outcomes as it was just referenced because they understand
what is going on. They understand instructions. They follow instructions and
they get better outcomes for that reason.
Also, the money might help buy things on the market that they can’t get from
more common sources. But I am going to be introducing occupation into this mix
as well because it comes along with a number of advantages or also
disadvantages that might be linked to it.
We also have to start thinking in terms of barriers that occupation and/or
education and income might also represent. I will have more to say on that in a
When we talk about occupation, we are thinking about in the research
literature as well as in common place everyday thought in terms of the working
conditions that come with the job, so that some jobs are more dangerous than
others. Some are cleaner than others. Some are just easier and more flexible.
And these things might also be related then to health outcomes. What we need
are the kinds of details about occupation that can capture some of the nuance
of that and also the typical way then that is used is that first of all, the
data are collected at a fine degree of resolution. You get something like the
detailed occupations that the Census Bureau uses to code the data from the
Current Population Survey or the American Community Survey. And then you
aggregate jobs according to some dimension or another, these occupations
according to some dimensions or another. You can aggregate them on pay. These
are well compensated jobs. Or you can aggregate them on education. These are
jobs that require high credentials. These are jobs that don’t.
But you can also aggregate them in other ways scoring them according to;
these jobs come with benefits, these don’t. These jobs come with danger and
these don’t. These jobs have a high disability rate and these don’t. It is
important, I think, to collect the data at a level of detail, subsequently
score the occupations, the detailed occupations on the dimension of interest to
the particular project or the particular policy analysis that is going on.
But the drawback is then of course somebody actually has to do that work.
And I want to tell you about a project, an NSF-funded project that some
collaborators and I, are just now beginning. In addition to being at Berkeley,
I am also one of the PIs on General Social Survey, which is a semi-annual,
nearly annual survey of American adults, that has been going on since 1972. We
have had the various census occupational schemes over that time, phase in and
phase out. But what we just got money from NSF to do is redo occupation in the
General Social Survey, go back to the original questionnaires, the verbatim
answers that people gave us about their occupations and code them according to
the new 2010 scheme that the Census Bureau is now using and all of them back to
’72, as well as going forward.
And then in conjunction with that we are going to score those occupations on
a number of measures that we have within the General Social Survey, but also
make use of the American Community Survey and the Current Population Survey to
attack some of the scores I have been referencing to each of the detailed,
three-digit occupational code from the US Census Bureau. And those data will
then be an NSF-funded project. We will have a website where we give it away.
And you can then attach if you have a three-digit code on occupation that is
based on the 2010 protocols, you can then attach any of these number of scores
and we anticipate at least 40 were in the proposal of ways of ranking and
characterizing these detailed occupations. That is going to be a resource that
is available to the research community, but also to policy community and I
guess people in other countries as well.
But the main dimension and the one that has drawn the most attention and I
think would be most relevant to this discussion of socioeconomic status is a
line of work that goes back to the late 1940s and indeed there were precursors
of it in the 1920s even. And that is the notion that some jobs are just good
and some jobs just aren’t.
It is just one of the most robust findings in social science and sociology
that you can in fact ask people in various ways to rank occupations as jobs
that are desirable or undesirable, good or bad, excellent, good, fair, or poor.
The exact question actually doesn’t matter a whole lot. People generate rank
orderings of occupations that have a very high degree of correlation one with
The rank orderings reproduce themselves with an amazing degree of robustness
and regulatory. Studies of occupational intelligence. How intelligent does a
person have to be in order to do this job from the 1920s correlate almost .9
with a study done in the 1960s asking people is this a good or a bad job. And
the proportion of people who said it was a good job correlated very high with
the proportion that a group of experts said you have to be intelligent to do
Don Treiman, a UCLA sociologist, recently retired compiled ratings from over
50 countries and in the late 1970s and correlated the rankings over time and
across countries with those and found that on average there was a correlation
of .8 between each specific rank ordering and the awaited average of them. And
within the United States the correlations over time from the 1940s, ‘50s,
‘60s, up through the 1990s have all been over .95.
We are in the field right now to gather data on the contemporary US and I
will be stunned if it is any lower. But we are using as the basis for this new
study the 2010 protocol that the Census Bureau has developed. We will have
updated scores on socioeconomic prestige or occupational prestige I should say
for the new 2010 protocol. And that will be the first set of scores we
generate. It will be this very robust rank ordering that is a standard of
socioeconomic analysis and sociology and demography.
Bob Hauser did a study around 1990, where he took also 19th
century data and showed that U.S. 19th century data also correlates
very highly with what was available then in terms of 1980 prestige scores. And
all of this work points to then a very robust rank ordering of American and
indeed any kind of post-industrial economy’s occupations in terms of a scheme
that is widely recognized as these are superior occupations and these are
inferior occupations. These are occupations I would want my kid to do. These
are occupations I would hope my kid didn’t have to do. These are occupations
that require a high degree of occupational intelligence. And these are
occupations that don’t. All of these various criterion measures generate pretty
much the same rank ordering of occupations.
The rank ordering is easily predictable it turns out from the occupational
education and pay. The occupations that people think are good ones are the ones
that require a college education and pay well. And that is also a very robust
finding first published in 1961 by Otis Dudley Duncan and it has been
replicated dozens of times ever since. In fact, in a lot of criteria studies
that have looked in it what does occupation predict. The Duncan score, the
average of occupational credentials and pay actually predicts these outcomes
better than the public rankings of them so that this robust socioeconomic
dimension to occupational desirability or occupational prestige is actually the
more reliable component of it than the public’s perception of what are good and
And that tends to be mostly a function of a kind of public romanticism about
certain jobs. People think that farmer is a really good job. But it doesn’t
take a lot of credentials. Some farmers I wouldn’t want to try being a grape
farmer and making wine without credentials and training, but for the most part
farmers are not as highly educated as say the people in this room nor are they
as highly paid. But people think this is a fantastic job. It tends to get a
very high rating as does clergymen.
Clergy are very well educated, but they are not very well paid. And their
outcomes in health, the education of their children and so on correlate better
with the average of their education and pay of that occupation than does the
public ranking, the very high public ranking of that occupation. Those are just
two examples to give you why it is that this socioeconomic ranking of
occupations, the predictive score for regression on pay and credentials
actually performs better empirically than the public rankings that are the
first step in constructing these indices.
There is another dimension that I thought I had on this slide, but I don’t,
that I am going to raise with respect to questions that have come up in the
discussion in the previous session there was a discussion of we have education.
We have income and we know what to do with those and we don’t really know what
to do with occupation. I guess my first recommendation is one thing you do is
come up with a criterion and you can score it pretty much any way you want.
Ranking ordering the occupations according to the index you are most interested
But there is also another way to think about this that I want to bring to
your attention and that is a temporal dimension to socioeconomic standing.
After about age 25 or the person who is getting a PhD in sociology or history
maybe 30, there is not a lot of change in educational level. For most of the
respondents to the National Health Interview Survey or whatever once they are
22, 23 their education is to a first approximation at least fixed about them.
Meanwhile their income — there is a lot of churning on income.
Greg Duncan has dated from the PSID indicating that if you divide the income
distribution into quintiles in a 5-year window, almost 20 percent of Americans
have changed the quintile over that 5-year period. They have been in two or
more quintiles in a 5-year period. Income is something that fluctuates a great
deal. And occupation stands between those two of never changing and turning
over relatively rapidly as a more robust indication of people of current
socioeconomic standing. Part of the value we get from it has to do with the
specifics of the occupation and the privileges that some occupations bring
along with them. But part of the leverage we get out of it is just its slower
pace of change. You get a more robust picture of a person’s position in society
than you do from knowing either their education which doesn’t change after
their mid-20s or their income which is more volatile.
Common usage is just to throw — there was one mentioned of multivariate in
the last session. Common usage is to just throw one score from occupation into
a regression along with everything else. And you get the kinds of results that
are publishable in an academic journal, but are very hard to communicate to a
lay audience or even sometimes to your colleagues in the same department.
For that reason there is some demand then to take some combination of income
occupation and education and combine them into some sort of a socioeconomic
index. I am a big fan of this kind of work that would take the components and
try and summarize them for inferential work that people can remember a little
better than the regression coefficients in a long table that has everything but
the kitchen sink in it.
Unfortunately there isn’t a general academic consensus on this or what those
weights would be. In a world in which the charge is to generate standards, I
think that what could be specified would be the standards for the measurement
of the three components: education, occupation, and income. But the relevant
weights might have to be project specific. You could pick weights that are
optimized with respect to what your outcome variable is, whether that outcome
variable is insurance coverage or a positive health outcome or the birth weight
of somebody’s baby or whatever. You might choose different weights for the
three components for different studies or for different purposes. But if you
had a standard that says we will always have these three variables and there
will always be measured in this particular way then at least you can always
start from the right baseline.
I think I made this slide before I realized there would be other people
talking about measurements. But I am going to say a few things about it anyway.
I am not going to say anything about education. I think those are going to be
addressed by others. But as to occupation, how hard is it to measure
occupation? People in our experience any way, at the General Social Survey if
you ask people what is it you do for a living. You told me you were employed —
one of your main activities last week was working at a job. What is your job
title at that job and what are your principal duties there? Most people can
come up with a rather coherent answer that generates enough detail for the
occupational coders to be able to assign them to this three-digit code.
It worked better in the General Social Survey context because that is an
interviewer driven survey. The interviewer knows that I work in a shop is not
sufficient. But if it is a self-administered question or somebody is going to
write that down. I work in a shop and you are going to be out of luck. You do
have to be sensitive to whether it is self-administered or interviewer driven.
But when it is interviewer driven people can generate enough detail with
prompts in order to a rather fine level of detail.
And then the scoring of the occupations is actually something done in the
back room of the research office, where you use some kind of — from the
three-digit code you generate all these other dimensional codes, percent
covered by health insurance, percent who have a college degree, percent who are
paid below minimum wage, et cetera, as a way of attaching scores then to the
three-digit code. So that the respondent burden is not to come up with are you
in a well-paid or poorly-paid job. Are you in the kind of job that requires
credentials or that doesn’t require credentials? You don’t put that burden on
the respondents. That is a score that is attached to the occupation after the
person describes the job to the interviewer. Then some post-interview coding
generates a three-digit code on that for that occupation.
We actually have it programmed a little bit into the laptops that the
interviewers are using so that if the verbatim generates a specific code, it is
already taken care of. They sometimes get a little prop thing that more detail
is required. Is that an LPN or a registered nurse or something like that? For
the most part this is a relatively low burden for the respondents and a
somewhat heavier burden than most items for the data processing office to
There is one other dimension along these lines that I also wanted to raise
and somehow forgot to raise on the previous slide and that has to do with
wealth. In thinking about the timeline and the temporal dimension of occupation
— of socioeconomic status that I was referring to earlier, education doesn’t
change. Income changes a lot. Occupation is a very good midlife measure. But
once people have retired — first of all they can remember what they did
usually. And they will tell you about the last job they had or the job they had
for most of their life.
But it is becoming less relevant as they age as in income. For people who
are living off savings, their income is less important than that nugget that
they are chipping away at. For older respondents I think wealth measures have
to be thrown into the mix here. It is not a topic of conversation for today. It
is not my area of expertise. It is really tricky because most people’s assets
are actually pieces of paper that don’t have value, but are claims on value
when they cash them in. I think I know what my house is worth. I also think it
is not worth what I thought it was worth 5 years ago. There is something tricky
to this wealth measurement. But for a retired person they probably have a much
more focused notion of this. They might even have reverse mortgage attached to
it that would supplement the other three measures once those measures are less
relevant to where their standard of living is coming from. That is a side issue
that I am tossing in here in the middle of a different discussion. But I think
it is relevant and really important.
The other end of the life cycle was mentioned in the previous session as
well. What about kids? Eight year olds don’t have real jobs. We ascribe the
socioeconomic status to them by the socioeconomic status of their parents. But
contemporary American social life creates complications in this enterprise.
Some people have two co-resident parents. Some people don’t. Some kids have two
co-resident parents and some don’t. How do you combine the educations of the
co-resident and absent parent? And a lot of kids don’t actually know the
education of the non-co-resident parent.
And if you are basing it on a household survey that isn’t enumerating all
the members of the household then that father typically is absent from the
household. How do you even access that information? There are a lot of
challenges around collecting the full array of relevant data for kids who
aren’t resident with all their parents.
And often there are other adults present in that household who aren’t their
parents. It might be a step-parent in which case we can just — forget they
said step and just take that person’s occupation. They could also be other
people, a grandmother or grandfather or an aunt or an uncle, who is
contributing to the well being of that kid.
The interaction between socioeconomic status and family structure
complicates the assessment of socioeconomic status of kids in a way that is not
— the tractable solution isn’t instantly self evidence nor is there a great
consensus in the literature right now on how to do it, but there are some
promising approaches particularly with advances in the handling of missing data
and imputational methods. Let’s impute the values that would be appropriate for
this absent parent, et cetera. There are procedures that are on the table, but
they are relatively untested at this point.
I can take more questions on that, but I want to also get — in the list of
questions I got there was also a concern with neighborhoods and larger contexts
beyond the immediate household. How do we assess the socioeconomic standing of
a person in their neighborhood contexts? I want to address that a little bit.
Neighborhood effects were once highly disputed. I think now the debate is
settled on the side of yes the neighborhood context is important. The
breakthrough that led to the new consensus has to do with the introduction of
exposure measures. Some people are passing through a troubled neighborhood or
passing through an affluent neighborhood and don’t show much effect of that
tangential experience. But the long-term residents of either of those
neighborhoods do show those effects. And once the research literature had
access to people’s residential trajectories instead of their current residents
as a proxy for wherever they have always lived, we began to see the power of
neighborhoods that everybody suspected was there, but nobody could find before.
Basically the water was muddied by the people passing through.
That puts a big burden on data collection that aspires to taking
neighborhoods seriously. If you are going to take neighborhoods seriously, you
can’t just pick the address off the questionnaire. You do have to ask a couple
of questions at a minimum how long have you lived at this address and what was
your previous address as a way of getting an exposure measure for that family.
Now there is an assessment that neighborhood is important. Most of the
studies that have established this had literally used dummy variables for
neighborhoods. These kids are in the same neighborhood. These are in a
different neighborhood and they are getting different outcomes.
What dimension of that neighborhood that is relevant is still highly in
dispute? I have sort of clustered them here in an arbitrary way that would
probably start a fight at PAA, Population Association of America session on
network effects. I am going to hang this out here for you folks with that
caveat on it. The one set of things have to do with the kinds of things we have
been talking about: poverty and educational performance and so on. But I have
also thrown crime in there because that is something that often packages
together with these other variables.
But there are other aspects of neighborhoods that might be correlated with
income, poverty, education, or what have you, but are not perfectly correlated
with them. Air quality and proximity to hazards are the main things that I am
thinking of in this regard. But they are also access to positive things like
hospitals and food that are important. And these things also contribute to the
significance of neighborhood differences, but are often unmeasured and
certainly are not measured in any kind of standard way. This set of issues,
this entire line of conversation may be just big enough to be beyond the scope
of the conversation you are trying to have. Socioeconomic status is tricky
enough without getting into neighborhood effects. You raised neighborhood
effects in the list of questions that I got. There is what I had to say.
I will raise the issue. The more specific you are about the features of a
particular neighborhood the easier it is for somebody to figure out which
neighborhood that is. On anything that is geo-relevant there are issues of
disclosure that have to be taken into account. The data have to somehow be
anonymized. And the usual way of anonymizing it is either to suppress some
detail about the person or some detail about the place so that you can’t tie
person and place together.
And then for the research community or the analysis done for a specific
project you then have to have in place a way of getting access to the entire
data array. NORC now has its data enclave. There have long time been
arrangements between individual researchers and the Census Bureau. And many of
our campuses now have restricted access data centers that allow us to get at
some of the geographical detail that the Census Bureau suppresses from their
public use datasets. Some kind of arrangement of that sort would have to be
devised here at NCHS as well if the neighborhood data were to be exploited by
researchers with any kind of — with the kind of activity that would justify
all the effort of collecting it. You have to have some sort of idea in mind of
how to make the data available once you have collected it recognizing if you
know the person’s age, occupation, and housing values in their neighborhood or
two more things about that neighborhood, you could probably say who that is.
And we promised our respondents that we won’t do that.
The final issue I want to address is a bit of having to do with your
question about social class. How do we assess social class in US context or
internationally? And the answer is what the English are doing anyway right now
is they are calling it social class, but basically they are categorizing
occupations. They take a detailed occupational classification scheme much like
the one our Census Bureau has and then simply recode it into seven categories.
Sometimes they only report five of those categories. But it is basically a
recode operation that takes a lot of detail and makes some really broad ones.
We could do something along those lines.
Americans are generally less comfortable with that social class language.
You could almost accomplish the same thing with socioeconomic quintiles
assigned. Assign these scores using one or another the socioeconomic indices
and then have five quintiles of socioeconomic status and accomplish
analytically pretty much the same thing that they are accomplishing with
respect to social class and without the baggage of that language that makes a
lot of Americans nervous.
The cutting edge by the way in class research and sociology is what are
called micro classes. What is it that makes engineers different than other
academics in that kind of work? We are making super fine distinctions in the
research community, but these kinds of broad useful classifications I think can
in fact be brought into the publications perhaps skirting the class issue and
just calling them quintiles on a socioeconomic status or something like that. I
am sorry. That was a bit of ad lib. I hope it came across clearly. I think at
this point we should take questions.
DR. MAYS: Let’s open it up for questions.
DR. GREEN: Anyone on the phone have a question?
DR. MAYS: We have some questions. We usually do the members and then we will
start over there. Dr. Sondik. Dr. Green.
DR. SONDIK: On one of your slides you had the slide that shows the diagram
education, occupation, income, inputs to SES and then race and gender. I was
wondering and you might have said this and I missed it, but when I look at, it
raises a question for me that if I think about a set of variables of interest,
what does each one of these add to the exclamatory power?
In other words, if I am interested in health at a level of risk factors that
people have or behavior or the ability of them to change and so forth. You have
five variables here, but let’s say you had three. The education, occupation,
and income. Which two do I need?
DR. HOUT: You would do better with three or with the weighted average of the
three than you would with any one or two individually. Maybe a way to think
about is to think about two people who graduated from high school, didn’t want
any post-secondary school. And one of them is having better health outcomes
than the other. And what you don’t know about them is that the one person who
is the high school graduate is restocking shelves at Walmart and the other one
is working in an auto factory and has a lot of job security and an insurance
plan. You can’t see that from their education. You possibly could guess it from
the difference in their income. But you would do a lot better on that dimension
if you can tell the difference between the skilled blue collar worker and the
under skilled service worker. That is what the occupation brings to
characterizing that person and those distinctions add predictive power to those
kinds of models.
The weighted average of the income education and occupation does just about
as well as putting the three of them into the analysis at once. For a research
journal the three coefficients is really great to have. Editors want it.
Readers read it. But for the kinds of publications that NCHS is going to make
you are probably just going to want to combine the three and then construct
quintiles or some manageable number of groups from those kinds of data. But you
will do better with the three than with two.
DR. MAYS: Dr. Breen then Dr. Madans.
DR. BREEN: You mentioned that you were working with colleagues at NORC on
this rank ordering of occupations with the 2010 Census. One question I had was
when do you anticipate completing that because that sounds like that is going
to be important for us going forward in terms of using occupations?
DR. HOUT: We would hope to have something available next year at this time.
We are in the field right now collecting the rank — represented sample of 3000
American adults are busily sifting and sorting occupations as we speak. And
individuals only have to do 24. They don’t have to do the 580 or something
occupational titles. Nobody is doing all 500. Everybody is doing about 24 and
then we get the full rank ordering because there is overlap. That is what is
going on now. The data will come in the house sometime in June and then we will
start the analysis, cleaning an analysis of it, and basically a year from now
we should have it ready to distribute.
DR. BREEN: This question is a little more nebulous. But let’s just say that
we were going to add occupation. Currently we use education and income. Those
predict health pretty nicely. But we don’t do much with occupation, hardly ever
in health except for maybe occupational related health. One thing we do use is
insurance which is kind of a proxy for occupation in terms of benefits from
occupation. I don’t know if you want to comment on that how well that might do
as a proxy. I don’t know if you have thought about that or not when you are
looking at health outcomes particularly health service use. It is very directly
associated data in my own — tend to use it instead of income sometimes. There
are caveats for that too.
You kind of did some magical hand waving when you said standardize how you
collect these, but then you are going to have to weight them. Of course there
is categorizing those occupations. And you suggested that is quite labor
intensive. I don’t know exactly if you would want to give us some references or
if you would like to talk about that a little bit. That seems like it is a
pretty big task.
DR. HOUT: The initial coding of the occupations is the really big task. Most
surveys that I am aware of currently do ask and code the data. That is already
in the house. The question is now how do you make use of those. You expended
the effort. But now what you have are 580 highly differentiated qualitative
distinctions. How do you make use of them? And the first move I would suggest
to you is to start assigning some of these scores. The first one being this
average of the pay and credentials, socioeconomic index. But you could make a
more refined set of scores using one or the other of them. And use either or
both of them.
The other thing you can do is collect — once you have the occupations in
that detail, you can generate additional scores like the probability of having
health insurance given a certain occupational title and use that as part of an
index instead of the socioeconomic score or in addition to the socioeconomic
score. I can send you the reference.
The thing I have my students read in order to inspire them on how to do this
— it is not a cookbook, but it is a for instance on how to do it. It is a
paper from 1997 by Bob Hauser and Rob Warren in a journal called Sociological
Methodology. It is daunting looking when you download it from Jay Stewart. It
is about 108 pages long. But the text is mercifully only about 27 and the rest
of this is huge compendium of dozens of scores for each of the occupations in
the 1990 detailed occupational classification. Nobody would do that anymore.
Nobody publishes tables like that anymore. But back in ’97 print was still the
way to go. The article itself is shorter. It is an indication of how to make
use of these data.
DR. MADANS: This is more of a comment than a question, but I did want to
thank you for your very clear description of these issues. It takes me back to
my sociology days. But I think from the point of view of what do we want to
collect on these very large surveys and how do we go about it. To me the take
home message is if we do a good job in collecting education, occupation, and
income and perhaps some geography which we do now and do have those access
mechanisms. But that gives us a huge number of degrees of freedom and how we
want to use the data later. But we don’t really have to make the decision on
how we are going to deal with occupation down the road.
If we collect it correctly, we have a whole bunch of things we can do from
the exposure because we know some occupations especially with industry which we
also collect a very clear health related exposures to these more somewhat
ephemeral things of prestige and standing. And that the key is really to get
these — as many of these building blocks as we can get them standardized. We
can move with the field down the line in terms of how we want to analyze them
DR. GREEN: I, too, want to thank you for a very stimulating presentation.
You used the word security a couple of times three fourths of the way through.
I would like to ask you to scan back out from the specific, a little bit to the
model you have on the screen, our basic label for this part of the hearing that
defines socioeconomic status. Could you just talk to us a little bit more about
the relationship between social economic status and perceived security of a
person or a family and whether or not that matters?
DR. HOUT: This is an area that is emerging as a research area. I don’t think
that there is — I couldn’t give you the sociological consensus the way I can
on occupational scoring. But one finding that I contributed to in an analysis
of the general social survey back in the early 2000s was to note that Americans
worries about job security have increased. As I have mentioned, this is a
project that runs from the 1970s and is still ongoing. The trend from 1970 to
2000 was that the fraction of Americans who felt that they might lose their job
in the near future. There was a sector of upward drift in that fraction that we
could to a large extent predict the fraction of Americans who would say they
were going to lose their job from the current unemployment rate and roughly
every uptick in the unemployment rate produced a two point uptick in the
fraction of Americans who were worried that they would lose their job. As a
baseline just look at the unemployment rate and figure that there are two
people who are worried about their job for everyone who is currently
But there was also an upward trend in that percentage over and above what
was predicted by the unemployment rate. We don’t really understand where that
is coming from. Some people have said that it is existence of temporary
contracts and other kinds of things that people in specific occupations don’t
feel the security that those of us — that GM workers once felt they had tenure
on the assembly line and have lost that and the average American is feeling
that as well. But we really don’t have the material to get anywhere beyond this
one simple minded question you think you are likely to lose your job in the
next 6 months.
DR. MAYS: I want to explore just a little bit the fact that you are part of
the GSS. In terms of education, occupation, and income say over the last 20
years, what have you had to change the most and what do you worry about in
terms of those three questions, in terms of the quality of the data?
DR. HOUT: First of all, the model of the general social survey is if you
want to measure change, don’t change the measure. We almost never change
anything. We add. What we have added in the realm of education is a lot more
detail about post-secondary education. And for the first time we are really
slow on the uptake here. For the first time in the 2012 survey we are asking
people to distinguish between a high school diploma and a GED. We should have
been doing that for the last 10 years, but we haven’t been. But we are getting
more details about community college versus some other kind of post-secondary
education or a four-year degree granting college. And we have been doing that
since 2008. Those are the changes on the education part.
On income, we still use the same stem question that was used back in ’72.
But GSS actually uses a show card to gather income and gets a slightly lower
refusal rate than the CPS does by the simple trick of giving people a card. I
don’t care about the dollar amount. Just tell me which letter corresponds to
your family income. And there are every year 100 people who will give us a
letter who wouldn’t have given us a number. It is cheesy, but it works. With
inflation you have to change those income brackets. That is all we have ever
changed on income.
We have dabbled in wealth data collection. There have been several modules
that have been headed up. Robert Moffitt gave us a lot of advice on that. We
had several experiments in the decade of the 2000s on wealth, but we don’t have
a core wealth measure yet because we are not satisfied with any of the ones we
have gotten so far.
We have never changed the occupation questions. We switched from the 70
protocol to the 80 protocol and that was so disruptive for the user community
that we did not upgrade to the 90 protocol. But we are now not only upgrading
to the 2010 protocol. We are recoding all the questionnaires going back to 1972
to the 2010 protocol. Probably a year and a half from now we will release a new
version to the data where every occupation is coded according to the 2010
protocol including their parents’ occupations because we get mothers’ and
fathers’ education, mothers and fathers occupation. We don’t ask about what was
your income growing up. But we do get the other two socioeconomic measures for
people’s childhood. And we are recoding all those occupations also to the 2010
DR. MAYS: Thank you. Can you come to the mike please?
DR. ROSE: I am Debra Rose. I work at the Research Data Center here. You
talked very briefly about what you would have to do to release data. I just had
a comment and a question. The comment is that in the United States we have
several decades’ old tradition of creating separate public use and restrictive
use datasets and in establishing data centers where approved researchers with
approved projects can have access to the restricted use datasets. Other
countries don’t always have this. They have more non-transparent relationships
between government agencies or universities. I was wondering if there is a
model for other surveys or research. We don’t know about it here – data that is
paid for by federal grants.
Is there a model or a system for creating the two kinds of datasets and then
releasing the data or making it available for analysis which is different?
DR. HOUT: I was just trying to throw out the cautionary flag saying that
these disclosure issues would kick in if you provided more geographical detail
as part of the socioeconomic core items or socioeconomic standards. I am not
actually aware of how you handle it here at NCHS, but I have been a user of the
Census Bureau’s restricted data. And the General Social Survey, by the way,
also has restricted data licenses. Our data aren’t as sensitive so we basically
respond on a case-by-case basis. People write us a letter and we sign a
contract with them on releasing the data. Although now NORC has this data
enclave and we might just make everybody go to Chicago. The bottom line is — I
definitely appreciate the access that we have been able to get. It is not my
field. I don’t have any particular —
DR. MAYS: Let’s thank Dr. Hout for a great presentation. I think it really
captured for us some of the struggles that we are going to have, but also gave
us some insights. Thank you.
DR. MAYS: We are going to turn to our next presentation. Is Dr. Bauman here?
He is here. We are about to begin the talk about the issues of education. Dr.
Bauman who is with us is the chief of the Education and Social Stratification
Branch in what is known as social economic and housing statistics division at
the US Census Bureau. Dr. Bauman, welcome and thank you very much for agreeing
to do this.
DR. BAUMAN: Thank you. I am going to quickly throw in a couple of things.
Just an observation from Mike Hout’s presentation. Just react to some people —
things that people said before about using SES as a measure and the problems
that come about with having to explain that. From experience, I have known that
the Department of Education has always used SES as a measure. It has been an ad
hoc measure that they put together from various things like whether you have
books in the home. It doesn’t correspond to what sociologists call SES. But
they used it. It has been in wide use for many years. I have never heard
anybody challenge that. I think that there may be a possibility to use
something, create something called SES, or make up a new name if you prefer,
that could be accepted.
The other thing is that I think that there is an opportunity here to use
occupational scores in a way. This is a good opportunity and it should be
really thought about because it is important when you do this kind of social
background analysis when you want to look education, when you want to look at
income, bringing occupation in is very powerful and if we come up with a system
for coding, that really helps a lot.
But I am not going to talk about that really. Forget that. I didn’t say any
of that. I just wanted to talk about measuring education. I will try to move
through quickly, even though I have a bunch of slides.
We at the US Census Bureau used to measure education. Somebody asked me at
the outset, what do you know about literacy? In the 19th century we
asked about literacy. Then we forgot about it. And then starting in 1940, in
the 1940 Census we started asking about education. At that time we asked about
years of schooling completed. And through 1991 in the current population survey
this is what I have here. We asked about what is the highest grade or year of
regular school that you have ever attended and did you complete that grade?
That was our way of getting education recorded. It was very simple. You just
got one number and got whether it was the last attended or the last day
completed and you move right on.
We discovered in the ‘80s that there was a divergence. People were
taking lots of different lengths of time to get through different degree
levels. And also you missed some things. Like if somebody had 14 years of
education, that is 2 years past high school, did they get an associate’s degree
or did they just go to college a couple of years? That is what made us move to
what you currently see in the ACS which is a system which really focuses on
degrees. What is the highest level of school years completed and highest degree
you have received? That is what we use now since the 1990 Census and since the
1992 CPS, and in all of the ACS and all our other surveys.
This is the ACS question. It gets more and more complicated as Dr. Hout was
saying, as we move along in time. We allow for a complete recording of lots of
detail about educational attainment. That isn’t everything you can put. This is
from our paper questionnaire. We have done something here that I would not
recommend you do. We put titles to the sections. We tested that out. Somebody
got themselves very famous by publishing an article in a survey research
journal about how that is a terrible idea, and it is a terrible idea. Don’t use
the titles. Once we test it, we are stuck. We are very careful. Once we do it a
certain way we are going to keep doing it.
As you see we make a lot of distinctions here. There were very important —
for pulling out some of the things that we think were easy to pull out earlier.
You say high school, what does high school mean? Look at this right under high
school graduate – starting from right above high school graduate –
12th Grade – No Diploma. You could have finished high school
but not gotten a diploma. You got a certificate of completion or you satisfied
the requirements to pass the state test, didn’t take that health class you were
supposed to take. Whatever it was you didn’t get a diploma. Maybe you went
right on to college or something. 12th grade year diploma, regular
high school diploma, GED, some college credit of less than a year.
Now in our old system this was all – but these are new and very
different things and they have different outcomes. So we are trying to start
phasing that out.
We have also had a lot of detail at the very high end that you could be just
swept basically beyond bachelor’s degree, and those things have very different
outcomes. We will see a little bit of that in a little bit.
I looked at some of the surveys that the NCHS is involved with. As you can
see in some of the cases they have adopted “modern” standard of
educational attainment is – where you talk about degree level. That is in the
red there. And then some of them — really haven’t changed from their approach
taken before. I think at a minimum we can think about moving all surveys to
something that is a little bit more degree and credentialed focused.
This makes the same point again. Here we are talking about some various
surveys, and some of the detail that we collect. I don’t know that there is a
great value in separating out first, second, third or fourth grade, but we do
that in ACS just because of the approach we take. On the other hand, in some of
the higher levels, there is a real value in getting out some of the
Notice the numbers. The numbers look confusing. And that is one thing that
we have discovered — not to number things 1 through 20 because 12 is no longer
a single thing anymore. You get field reps and respondents confused when total
something when they think 12 means high school and it actually means something
else. And we move to a system where we use higher numbers – anything other
than one through 20.
Another thing, there are a couple of stars here under one or more years of
college, that is because the numbering system there is with our CATI/CAPI
questionnaire which I didn’t show you before. I showed you the paper
questionnaire. You don’t get people offering you how much college credits they.
In that category you get some college credit — that requires a follow up
question in CATI/CAPI because people are going to offer that right off the bat.
There are a lot of things you are going to have to probe to get. A danger in
all these things is you are going to start introducing mode effects and start
doing follow ups. We do have mode effects in ACS. They are not very troubling,
but they are there. When you start thinking who you are going to ask if they
might have gotten GED. Who you are going to ask if they might have gotten one
or more years of college or had any attendance to college, to reclassify
people, and that has had an effect on our data. You have to be very careful
about how you go about that.
We make a distinction between academic and occupational associate’s degree
in the CPS. That is probably not a very useful distinction because that is one
place where the reliability is very low.
Speaking about reliability we have done many reliability studies on this and
this shows the reliability level in terms of what we call the index of
inconsistency. A high level of inconsistencies, bad reliability – then I
talked about unreliable along the side. What we are talking about a high index
of inconsistency is a very unreliable item.
We have, for example, a very highly reliable item is master’s degree. If you
look across there, it has low index of inconsistency — when you ask somebody
again if they have a master’s degree, they say yes, I have a master’s degree.
On the other hand, if you get somebody and say they had some college credit and
less than one year, that is high on reliability, high index of inconsistency.
That means they are not going to tell you something else.
Most of our items are in the moderate range of reliability which is pretty
standard for a Census Bureau survey. I want you to notice one thing about this.
As you go up through the grades — at that point we did group the grades
together and you see where they were grouped together, a four-grade range, so
it has reliability index inconsistencies for 21.8. When you get down to the
two-grade range, it climbs a little bit to the 30s, and when you get single
grades it gets higher. Why is that? Well, because most of the misreporting is
just one square off the diagonal. If you said 9th grade the first
time, you might say 10th grade the second time. That is where your
unreliability comes, to a large extent.
As you can see two of the items with really high levels of inconsistency are
right around the high school graduate level. That is 12th grade no
diploma and some college credit less than 1 year. Those are things that are
hard to pin down even though they are very meaningful in terms of educational
I also want to talk about GEDs. I think there is a reputation in Census
Bureau surveys that we do a very bad job of capturing GEDs. I am here to defend
ourselves. Back in the bad old days we used to ask yes or no, did you get a
GED? It turns out that was a really lousy question. People just didn’t give you
an answer. What we did was we switched to was an answer that said, there are
two ways you can graduate from high school, you can either get a diploma or you
can get a GED. When we switched to that question, we really decreased the index
of inconsistency. We had much consistent responses there and our numbers are
beginning to track — even though we are a little bit lower in terms of the
numbers of GED. It is not as bad as it was.
We were asked about groups. In asking about educational attainment across
different groups, you can see most every group here is pretty close to the
same. What are the ones that stand out? The other single race group. What is
that? That is people who said some other race. That is actually a subset of the
Hispanic population for the most part, though it is not strictly that. It is
mostly Hispanics that have trouble saying race. I don’t have a race. I am just
Hispanic. They classify themselves in this way. What it does is it eliminates
the Hispanics who immediately say I am white. You end up having a lousy look at
the Hispanic population. That also overlapped a lot with the foreign-born
Now, there are a couple of things I want to say about the foreign-born,
intrinsic problem in capturing educational performance, school systems are
different so you are going to have a problem with matching. The other thing is
that there are real language problems. If you just put out a standard
questionnaire, we have just done some cognitive work and we have reformulated
our questionnaire trying to be careful as we can, when we say something like
(foreign phrase) it means different things in different countries. In some
countries it means a bachelor’s degree, and in some countries it is a high
school degree or even less. The exact terms you use are going to make a lot of
What does education have to do with anything in the world? Here is how
education lines up against occupations. This isn’t just a one-digit old
classification of occupation and how things look. There are a couple of things
here. First of all, it is like income. It is like anything else, the higher
levels of degrees advanced degrees are across the top there. You can see in
professional occupation we have 34 percent of people having advanced degrees.
There is a really strong correlation there. It is not a perfect correlation.
There are perfect people in each occupation that have different levels of
degrees. Some may be because of detailed occupation, but part of it is just
that there is heterogeneity within every occupation. Just the occupation
doesn’t tell you everything you need to know, just as education doesn’t tell
you everything. They operate at different levels.
Also, it is not inconsequential what level of education you use to define
your occupational score. You can see each education level has a different
relationship with occupation. You need to think about what you want to do.
We are getting increasingly to a high education society. We have now 30
percent of the adult population now has a bachelor’s degree or higher. That
speaks to what I said before. You used to be able to just write off the top two
percent as being a little small category. Now it is a big category – now
Finally, different levels of education do have impact across other realms,
unemployment – here are people with less than high school or less than
GED, have the highest unemployment rate, most strongly impacted during the
recession in terms of unemployment. Here, we are looking at different levels of
education and how they payoff in terms of earning. This is another point where
we are separating out the different types of advanced degrees, really it does
make a difference in competing with professional degrees, really stand out in
terms of high level. Really that is what I wanted to say about measurement of
income. I am happy to answer any questions or we can move on to the next
DR. MAYS: What we are going to do is actually take questions after the next
presentation. If you would, feel free to join us at the table that would be
great. Dr. Wong, do you want to present from there or go up there? Thank you.
I am going to introduce Dr. Mitchell Wong. He is a general internal medicine
physician who is at the UCLA Center for Health Services research. I should say
the UCLA Division of General Internal Medicine and Health Services Research
since I am in health services research, but he is in the general internal
medicine health services research. For those of you online he is just getting
miked up. He is on his way up there. Thank you, Dr. Wong.
DR. WONG: I want to start by acknowledging NINHD who sponsored some of the
research that I presented today. I will skip through some of the first
beginning slides because much of it was just covered in very good detail. But
to start out with talking a little bit about what are the components of the
measurement of education. The potential components could be years of education
whether you have a high school diploma or a higher degree, and then also
potentially measuring achievement, preparation for future schooling/occupation,
and it was mentioned, by literacy.
Skipping through these slides. We just heard a little bit about some of the
minor variations in the way in which education is measured and the way in which
the outcomes are coded. There is a little bit of discrepancy in terms of
degrees of whether it is associate degree for an occupational, technical,
vocational reason or whether it is an associate degree like a 2-year community
college that will lead to a higher degree. It is not clear that these
distinctions matter a lot.
But it is very clear that the way we currently measure education is really
focused on two components: years of education, diploma and degree, but not so
much about achievement or preparation for future schooling or occupation.
These are some slides that others have produced looking at the relationship
between the traditional measures of years of education and health outcomes. We
see current smoking, obesity, exercise activity, and drinking more than five
drinks, numbers of days where you have five or more drinks in the past year. As
you can see, there is obviously a really strong relationship between education
and health, when you measure education of years of education. There is some
variation across these different outcomes, but for the most part there is not a
lot of difference between getting 0 to 12 years, but then there is a big change
once you get your high school education or more.
There is perhaps more gradient when you look at actual receipt of health
care. Getting a colorectal screening exam, a mammography, having a smoke
detector in your house, seatbelt. There is more study gradient in years of
education between 0 to 12, actually, there is a distinction there.
There are a lot of things missing in measurement of education if you just
measure years of education and degree. To start out with, when you ask students
these days what is school, there is a whole wide variety of what school means;
public, private, charter school, home schooling is actually very common,
independent study, vocational schools, and now what is called blended learning
or distance learning. There is a wide variation in all these different types of
schools in terms of the outcomes of these schools and what it really means for
socioeconomic status and downstream what it means for occupation and income.
Obviously as an example you can graduate from Sidwell Friends and that means
a lot different than graduating from Crenshaw High School in Los Angeles. It is
probably as wide a variation as if you graduated from UCLA versus USC.
What you learn in school is very different than whether you graduated from
school and how far you get. What we are talking about a little bit is
educational attainment versus educational achievement. As I said, there is wide
variation what a high school diploma means. A high school diploma means that
you took certain courses and you pass those courses, and what passing means can
mean different things, but also the purpose of the education itself. There are
some movements to taking some kids out of a traditional high school and putting
them in a vocational school and what does that mean for their future
socioeconomic trajectory. I think most people would agree that there is a weak
correlation between attainment and achievement.
Let’s talk a little bit about what may be alternative measures. In
California we have A through G requirements. This is basically did you take all
the courses that would allow you to apply and get into a University of
California school or one of the Cal State universities. There are certain types
of courses that you have to take. You have to take up to 11th grade
English. You have to take Algebra II, et cetera. That would be one potential
alternative measure, although there are lots of problems with them because how
you do in those courses means different things in terms of are you really able
to step into college and do college without remediation.
And of course, preparation for occupation and future schooling. What are the
things that you are learning? If you are in a vocational school, how does that
affect your future socioeconomic directory.
But where we are kind of at now in the last 10 years since No Child Left
Behind is standardized testing. Standardized testing is not just the kind of
standardized test that you see more and more in our high schools and secondary
schools, but we have had standardized testing for a long time, SATs, ACTs, and
advanced placements. Obviously the challenge is that many students don’t take
Many states have exit exams which do measure some level of competency, but
the bar is fairly low. And now we have standardized state level tests that
required through NCLB. One particular challenge is every state has their own
set of standards. Some states have really high standards. Some have really low
standards. It is difficult to compare across states. But at least we have a
measure of achievement that all public school students take.
And I will also mention the National Assessment of Educational Progress,
which is a set of questions that are inserted in the standardized tests across
states so that you can actually do some comparison between states and know what
is a test in Arkansas compared to a test in California and how do the standards
differ. Is one harder than the other?
In the next few slides I am going to just talk a little bit about what does
educational achievement mean, not attainment, but educational achievement in
relationship to other socioeconomic measures. In this case, it is income. These
are slides produced by Sean Reardon at Stanford.
This first slide just shows that the trend in family-income inequality over
the years. If you just focus on the very top line, the solid line, which shows
the income ratio for the 90th percentile versus 10th
percentile from 1970 to 2010. And that line is increasing meaning that the gap
in income has risen dramatically, two-fold in the last 30 years, 40 years.
But not only has income disparities increased, but the educational
achievement has also increased. It has increased greater than income
inequality. This slide shows you the achievement among those who are in the
90th percentile of income, and those who are in the 10th
percentile of income. And the different data points are aggregated from
different studies. This shows the aggregated line among all the different
The point here being that if you look at the achievement gap in 1940 between
those who were in the highest level of income and the lowest level of income,
there is some achievement gap in that. But as you look in the birth cohorts
since 1940 up to 2000, the educational achievement gap has been increasing.
This is another slide just showing you how does parental achievement
compared to child achievement. The authors argue that it has been fairly stable
over the years that childhood achievement has been pretty much stable in terms
of parental achievement. One of the questions we want to ask is why is
educational achievement increasing. There are lots of different reasons for it.
One reason may be that there is a gap between what is a high performance score
and a low performance score does increase over time. But in fact if you look at
the educational gap over time, most of the educational gap is established by
Another hypothesis is that what we see today is that high income parents are
investing more and more in their children at a younger age. Even by
kindergarten or the beginnings of elementary school, educational achievement
gaps have already been established.
The last few slides that I will show are results from a study that I
mentioned, is funded by NIMHD. What we are interested in looking at is what is
the relationship between educational achievement and health This data is
collected in low-income neighborhoods in Los Angeles. And the sample is
actually taken from students who applied to one of three high performing
charter schools. It is a little bit of a different sample than the general
population and it is certainly not representative of the general population of
students who live in poor neighborhoods. It is a selected sample of those who
had applied to a high-performing charter school. But the reason or that is that
we were very much interested in using the charter school’s lottery as a
mechanism for a natural experiment. So kids applied to the school and some get
in and some don’t.
And I won’t go through too many of those, but I really wanted to show what
is the relationship between achievement and health? Here we looked at four
different health behaviors that are common among high school adolescents:
smoking, alcohol, marijuana, and sex. These are all behaviors in the last 30
days for smoking, alcohol, marijuana, and sex in the last 90 days. This is a
much larger model than the variables I showed here, but I am showing just the
socioeconomic variables. Parental education, whether the parent is employed,
whether the family owns a home, annual income, and then the very last set of
variables is the standardized test score of the student, individual
standardized test score. We separated the students into the lowest tertile,
middle tertile, and highest tertile. The highest tertile being the reference
You can see that there is some variation in the relationship between
educational achievement of the student and the behavior that in general those
who are in the lowest tertile of achievement are at greater risk of these
behaviors. And in particular alcohol and sex, being in the lowest tertile of
academic performance is predictable of the risky behavior.
Just to note we model for a lot of other different things including the
student’s gender, grade, race and ethnicity and language, parental supervision,
school engagement. We actually have models that also incorporate the behaviors
of the parents, the behaviors of the peers of these students. These results are
We wanted to add one more thing on here because I think this is something
that we could actually add to current surveys, which is looking at the school
or the environment in which these students are in. The last variable here that
I have added in this slide is the academic performance of the school. The API
score is in California the Academic Performance Index, it is a measure that
ranges from 0 to 1000. The average score, the median score in California is
around 600 or so. Any school that is above 660 or so, is doing better than
average. A fantastic school would be in the 800s.
As you see for some of the behaviors, actually the opposite of what mattered
for the individual test score, the school environment has a strong influence on
smoking behaviors and marijuana use. And in fact the influence is much stronger
than individual test scores.
I relate this to looking at personal income versus neighborhood income. It
is not only your personal socioeconomic status, but also the socioeconomic
status of the people around you. In this case it is the educational achievement
of the people around you.
There are limitations to using standardized testing. For individualized
standardized testing, it is something that is available out there for every
public school in every state. It would not be possible to obviously get this
data in self report to standardize surveys that we have.
The other problem with standardized testing is that you can’t capture all
students. Students who are in private school or parochial schools or
alternative schools, may not be taking the standardized tests.
And some of the other problems would be comparability and equivalency
between states. The tests differ from state to state. There are efforts to try
to create a standardized way to measure educational achievement on the
standardized tests so that you can know what is a comparable score in two
different states, but there are a lot of challenges to doing that.
Educational achievement is not easy to obtain in self-report, but it is easy
to obtain because it is done in every state. It is important because it has a
poor correlation of achievement and I think it has a very strong and
influential effect on health.
There are ways and I will go off this slide for a little bit, in terms of
ways in which we can get educational achievement. One example is what is being
done in the California Health Interview survey where they actually ask every
student, every respondent under the age of 18 to name the school that they go
to, and by doing so you can map the school achievement score to that
Obviously it is confidential, through that information you can identify the
person, but it provides an extraordinary amount of predictive information on
behavior and we will see what happens with these individuals down the road, but
we can look at how their achievement scores predict how they do in the future.
It is something that we wouldn’t be able to do directly through self report,
but you can get enough information to be able to link that student to state
level data and be extraordinarily useful.
DR. MAYS: Thank you. What I would like to do is open it up for questions at
this point. Let’s start by checking if anyone online has questions. Let me
start in the room then. Bob Kaplan. Can you introduce yourself and state you
have no conflicts.
DR. KAPLAN: I am Bob Kaplan from the NIH. I don’t have any conflicts. I am
just curious, education is a proxy for all kinds of stuff. Education is a proxy
for all kinds of things. How do you separate that out other than methods that
were talked about earlier?
DR. WONG: I think that is probably true for a lot of the other measures we
talked about. There are proxy measures for SES and there are different
components to that. One way I think about it is just in terms of what does it
mean for their future socioeconomic success and how it relates to other
occupations. How you do in high school obviously influences whether or not you
go to college and then that influences what kind of occupation you might have
and then subsequently your income. It is early precursor what your
socioeconomic structure is.
DR. KAPLAN: What I was getting at — I know you have been interested in
this, but this idea that if you want to produce health, it may be better to
invest in education than to invest in medical care. How do you begin to get at
DR. WONG: It is actually a tough question. It is tough to do it in
observational studies obviously because students who end up investing more in
education have parents who are investing more in education which takes
resources. It takes income. It takes motivation. You are measuring something
that maybe part of socioeconomic status of the parent, but it is also something
maybe extrinsic to that particular measure.
What we are trying to do in our particular study is to get rid of these
externalities and taking a group of individuals who are on the same playing
field. In our study we took students who are all living in poor neighborhoods
and who all had applied to a charter school. We wanted to look at whether the
students who got into the charter school and were exposed to a very
high-performing academically intensive environment where graduation rates are
close to 90 percent and compared kids who went to a traditional public school
where graduation rates in Los Angeles are probably closer to maybe 50 percent
at best and some estimated maybe as low as 30 percent. It ends up being a
really hard thing to do because kids who don’t get into the charter school that
we looked at their families were motivated and they ended up in another
There is also a degree of choosing even after they applied to charter
schools they can get accepted or they can decline or they can decline and find
some other place to go. It is very difficult to distinguish. What does it mean
to give really good education to a kid who wouldn’t otherwise have gone? We can
say that with the caveats that maybe there is still some selection bias of kids
who are more highly motivated are coming from a more supportive environment
going to better schools is that obviously going to a better school matters and
it matters a lot in terms of the few behaviors that we looked at.
One of the things we want to do is — we are actually funded by NIDA to
start a new study in a few months. We will try to do a much better job of
sampling kids and distinguishing those who are on the same boat, but who end up
in a high-performance school and those who end up in a low-performance school
and try to adjust a way for some of the selection bias. Preliminary studies
show that it actually makes — the environment being in a high-performance
school in addition to higher educational achievement does lead to better health
behaviors. And the health behaviors that we looked at we felt were really
important because there are other studies that show that they count for as much
as 50 to 60 percent of mortality and 50 to 60 percent of the disparity and
DR. MAYS: Okay. Dr. Hout, do you have a question?
DR. HOUT: My question was actually along the same lines and also for Dr.
Wong. Your last slide was a really strong statement about achievement is better
than attainment, and yet in your answer just now you said that part of why
achievement is good is it predicts future attainment. I think that perhaps your
conclusion is a little too strong there. I was wondering if there were data
that you hadn’t shown us. This is new data, right? You don’t have achievement
— you don’t know how far these kids — the higher achievers actually went.
DR. WONG: No, we don’t know where they go. I can tell you right now that if
you look at most of these kids coming out of the schools that we are looking
at, even those who are performing well on the standardized test scores a good
proportion of them will need remediation in college if they get to college and
only a small percentage of them will get to college because they are coming
from very poor families and won’t be able to make it.
I think it is true, I may have overstated it. I don’t know how many of these
kids will actually get to college, complete it, and then change their
socioeconomic position in life and that has yet to be seen.
If you believe that these health behaviors are important, and there is lots
of data to show that they are. For example, smoking alone, 9 out of 10 —
either 8 or 9 out of 10 smokers will start — long-term smokers will start in
high school. In these studies going to a high-performing school the smoking
rates are half as if you went to a low-performing school. There are a lot of
different factors here that we try to control for as best we can tell. It does
matter. The environment matters a lot. If you project out what that might mean
— if you are preventing these kids from smoking that has huge ramifications
for the long term —
DR. HOUT: I was just contesting the implication that we might not be
interested in their educational attainment just because we had a measure of
their — high school economic educational achievement. I think attainment and
achievement are both important. We can’t get away without measuring attainment.
That is absolutely important. Not only is it important, but it is also easy to
measure so we should be measuring that. The point being that even among these
students they haven’t finished high school yet. We don’t know where they will
end up. But we can tell you right now just on the achievement alone before they
have even completed their education and we know where they are going there is
evidence to suggest that there are big differences in their health behaviors
which have long-term implications.
DR. MAYS: We are going to get a couple of other questions in here. Dr. Green
and then Dr. Breen.
DR. GREEN: I have a question for both of you. Pretend that you get to decide
what the minimum standard that would be used to collect either educational
achievement or educational attainment across federal surveys. What would it be?
DR. BAUMAN: I think basically that was what I was speaking to. I think it is
important to get detail — to be able to make important distinctions. On the
other hand, it is very difficult to get those distinctions. And also in terms
of outcomes a lot of the small distinctions don’t end up playing out as
strongly. I didn’t get a chance to get into it. For example, there is interest
in measuring vocational certificates and career tech, full education. They are
short-term things. They do have a pay off in terms of income. We haven’t
studied other things. But they don’t have a huge — it is not a huge pay off.
Getting a technical certificate after high school gets you a little bit more
income, but not a lot more income than high school. Missing some of these
distinctions is probably not horrible either.
I am more interested in the distinctions so we can make it clear where the
lines are drawn than into necessarily making fine gradations in terms of a
In terms of achievement aside from what I have learned in grad school you
can read my dissertation, but leave that aside. I don’t really know anything —
DR. WONG: I think for what we know we can do in these kinds of surveys and
measure through self-report I think the measures that you mentioned and the way
that they are currently being measured is about as good as we can do easily.
The one thing I did mention is what we are doing in CHSS is a big step forward
which is finding out what school they go to. It gets that issue certainly for
the cohort who are currently in school. You could ask that of people who are
older and ask them what school they went to, but the data on how that school
performs is a bit more difficult to get.
But certainly for the young cohorts now for as long as we have had
standardized test scores and we can rate schools in terms of their performance
it matters a lot. As I said, graduating from Sidwell is different than
graduating from Crenshaw High School. That provides a lot of information and
there maybe ways. Just like occupation on the back end you can grade schools in
terms of their performance. It adds a huge amount of information that I think
probably have a lot of predictive value not only in terms of socioeconomic
status, but also in terms of —
DR. COOPER: Leslie Cooper from NIIH on the phone. I just want to commend you
for your presentation especially for pointing out such things as looking at the
various levels of education, the environment in which the student receives an
education as well as the community. One interesting factor is that as you
gather all that information we know that some individuals in very deprived
conditions continue to excel. These are individuals that are excelling in spite
of the fact that they don’t receive all of the benefits that others receive
when they are in those schools where networking which is really key for helping
them to get not only into the next level of education, but opening up doors for
jobs as well.
As we look at these individuals and we try to see in terms of the impact on
them, I would like for us to really be sure that we look at those individuals
and try to see what can be done because some people are not able to change the
environment in which they are located. But I think these points that you
brought out are very key. Thank you.
DR. MAYS: Thank you, Leslie.
DR. HOUT: With respect to minimum standard on education, I am a big fan of
that ACS question even with its problems with the little green labels there.
With regard to the suggestion that we get names of schools, at the General
Social Survey we have been experimenting with getting the name of the
post-secondary institution that people have attended or gotten their degree
from. And that is tremendously aided by the existence of the IPEDS and I don’t
even know what the acronym stands for. Basically it is a reporting requirement.
If you accept Pell grant students you have to report to the Department of
Education and they maintain a database with a 12-digit code number for your
institution. And we then make that as the interviewer in a CAPI situation
starts to type in the name of a university or a college or a vocational school
that accepts Pell grant money, names start to appear on the interviewer’s
screen. And if it narrows down to just one they say is that and they read that
name. And if it doesn’t, they say which of these three schools are we really
talking about? And from that we can then link the person to the IPEDS data for
their school, how big it is, whether it is a 4-year degree, et cetera. And we
are able to get a lot of that high-quality information that Dr. Wong has about
the high schools.
We can’t do that for people who didn’t go to a post-secondary institution
because there is no national database of high schools. Also, some of our
respondents are 70 years old. When they give us the name of a college or
university, they often know the name it is changed to. Carnegie Tech is now
Carnegie Mellon. They know that. They get the alumni letter. But on the high
schools with merges, closures and everything else it is just an impossible task
to go back in time to find high schools. But for post-secondary data it is a
good way of characterizing the kind of institution they actually attended. They
don’t know if they give 4-year degrees or not. But IPEDS knows and we can then
from the name they have given us.
DR. WONG: Let me add a comment because, Vickie, you mentioned this before is
a measurement of the type of school they went to. It would provide some
information, but not as good as you might necessarily think. Probably the best
distinction is are you in a private school or not or parochial school. It
usually requires resources that — it is an indication of more family
resources, but other distinctions are harder to get at and most kids can’t tell
you. I would guess if you did your reliability studies on that, you get a lot
of unreliable results.
DR. GREEN: I still want to go back to the other direction from aspirational
to plausible recommendations for standardization across federal surveys. Kurt,
you show a comparison chart of four or five different surveys. And there were
some variability in how it is done. Are you willing and could you lend your
opinion about whether the variability that exists now across federal surveys is
justified and helpful? Is it a good thing that there is that variability or
would it be better to standardize it?
DR. BAUMAN: Well, I don’t think there is value added from the different
questioning approaches. I think that the old question really is a weak
question. That it still exists on the Medical Expenditure Panel Survey is
probably something that is not desirable at least the way I see it that would
be better to go to a standard question.
DR. MAYS: Dr. Breen.
DR. BREEN: That was my question. Thank you very much.
DR. MAYS: I want to follow up because this issue of the quality of the data.
In a federal survey what we are talking about is across a lot of states and we
already know from the rankings is that the quality of education via states
really differs. In a federal survey if I were able, for example, to know
something about the school that they are connected to what is the — can you
propose how I would get equivalency and whether or not the quality of the data
is going to depend on the ranking already of that particular state. What I am
worried about is the quality of what we are going to get. It will mean a lot
for a state like California and it might mean a little for a state that has
very little per pupil expenditure, is ranking in the 49th,
50th. That kind of thing. I am addressing this to all of you.
DR. BREEN: Dr. Wong — API rankings. Are those California or national?
DR. WONG: Every state has their own terminology. If they are all required to
use standardized testing for identifying which schools are failing and those
who are doing well and meeting the annual yearly progress for achievement. I
was talking to a couple of people in the education world about the equivalency
between states. It is a really contentious issue. People have worked on it, but
the problem is that when you take a score in one state and you make it
comparable to another state and another score in another state the — are very
wide and trying to make them comparable. Even with NAEP and these standardized
questions that are inserted in the state tests in every state. They are hard
questions that are uniform across states. Even those questions are very
difficult to say that the scoring in one state is equivalent to the scoring in
But there are moves to try to think of other ways to back into that. One way
to do that is to look at what it takes, what do you need to score so that when
you get to college you don’t need remediation in math and English. There is
some extra effort through the Department of Education to figure out a way to do
that. They are way far from finding an answer.
DR. HOUT: Most of these issues are relevant for people who are currently in
school especially in secondary school. But most of the data collection you are
doing is for people who are long out of school and going back in time on these
measures is even more daunting than going between states. I think that
educational attainment at least we know what that is for somebody who is 65
years old. But finding that school they attended 40 years ago is going to be a
DR. BAUMAN: There seems to be a lot of concepts here that we are trying to
get at with educational attainment, quality of the school, achievement of
students and then somewhere in there is just ability or IQ level. These things
are very hard to separate. There are lots of details and there has been lots of
studies looking at how they all try that they all end up working out on the
market. I mean not in the market. In the labor market or in success in life. I
think that we need to go back to those studies and see which ones are really
important. I am not up on the literature. I think Dr. Hout is. I am putting him
on the spot. To a certain extent I think that is the way to go with these
Let me just throw in another thing, another aspect of education that we
found really does make a big difference in outcomes and that is field of study.
If you getting the post-secondary level the field of study that you take
whether you take a technical field or a professional field or liberal arts, it
makes a big difference in terms of earnings — there are lots of things to
unpack about education and related things.
DR. KAPLAN: I am still hung up on this. What can we do in an intervention
sense? I think Dr. Wong mentioned that a lot of the action seems to be
occurring in that pre-K era. We see a lot of this. Anything from the IOM
Neurons to Neighborhoods report and so forth. Do we have any data on quality of
pre-K exposure? Is that just too hard to get?
DR. WONG: I don’t know that literature that well. I don’t know if the other
two — I have heard probably the same things that you have that matters a lot.
I mentioned earlier that the achievement gap that you have observed by
kindergarten or even by 4th grade certainly they are pretty stable
over time. Most of those gaps are —
DR. KAPLAN: There is a fairly big investment. Jim Heckman has gotten into
this. And what has happened is that — again, there is this IOM report. It is
about 10 years out — a whole series of other things that have come along
arguing that there are structural changes in the brain that are affected by
these very early experiences. Not a lot of people believe that is where the
investment should be —
DR. WONG: I think the question is whether once you hit middle school or high
school, is it too late? I hope that is not true otherwise what are we doing in
our secondary schools. I have heard the same thing that you have heard that
reading to your child when they are young and all those things that can be done
DR. MAYS: Let me just comment because I think there are two things. One is
Head Start. Then we have tons of data in terms of Head Start that those early
intervention make a difference. I don’t know necessarily about the difference
in terms of health, but they make a difference in terms of quality of
education. I also know that there are several intervention programs which are
really trying to take minority children out of these school settings in which
they are very poorly performing if you replicate what you see at a Head Start
in terms of enrichment. I think the best known one is ABC which is in Boston
that they are able to get the kids at the level of Harvard, et cetera. Again, I
don’t know what the health changes are. But in terms of being able to increase
the likelihood of their educational attainment, they are actually able to
really jump start that.
DR. KAPLAN: The most likely site is the Perry Preschool Study which has lots
of problems with it, but there actually are other studies in the literature
that sort of look like that. There has been a big interest in National
DR. MAYS: Dr. Sondik.
DR. SONDIK: I have a question. It goes back to your conclusion slide, Dr.
Wong where you had the three points about educational attainment, educational
achievement, and future studies. I am trying to think of this in the context of
HIS, the Health Interview Survey. If I think about these conclusions or I want
to ask you how you would modify these conclusions in thinking about the age of
the respondent which has come up and the occupation of the respondent. We have
heard a lot about the importance of occupation. In other words, it seems to me
and we have heard this already should go through time as you get older. The
attainment is well documented relatively easy to elicit from the respondent,
but achievement is another story.
I am wondering if achievement, in a sense gives way to an importance, as you
go through time as people get older into occupation. I would think it is not
perfect by any means. We can all think of lots of examples of individuals who
are captain of industry or whatever and didn’t finish high school or went to
Reed College for one semester. Do you see what I am getting at?
DR. WONG: I think they are all related. We believe there is a latent
construct in socioeconomic status and that is why they are all related. But I
think that in all the work that we do we know that they are not identical
measures that occupation isn’t the same as education. Each measure has some
nuances and provides information on its own. I think in terms of achievement in
the way I am thinking about it. Certainly if you look at the school
achievement, you are talking about the neighborhood that you are in. You can be
a very wealthy family. You can be a very poor family and you can both be in the
same school. But it is the people who are surrounding you that matters. It is
kind of like a neighborhood effect.
But I also think you are also measuring other things about the family and
what the family is doing to invest in education. And you are also getting into
issues of like human ability which will predict later on what occupations they
go to, but not 100 percent because obviously IQ doesn’t predict 100 percent
what your income will be or your occupation, but there are other aspects that
influence your socioeconomic trajectory.
DR. SONDIK: I understand that and I understand it in the sense of a
particular study that is aimed at studying a particular issue, the kind you
were just getting at. But I am also thinking of this in a Larry Green context
if I can put it this way. In terms of these national surveys that are very
general purpose that have a limited amount of resources in terms of time you
can spend with the respondent and thinking about for these more general purpose
studies like HIS which, is extremely valuable in many things. We have to
choose. What should we be aiming for or maybe we should have something that is
a more adaptive standard. For example, it never occurred to me frankly to think
about for the HIS that we should have a question on achievement. Perhaps we
should think about this for people under the age of X. This is something that
we should be thinking about dealing with children, dealing with teenagers, very
young adults. Maybe this is something we should think about and think about
that of course in place of doing something like occupation.
In some sense these standards to me strike me as a kind of rigid way to look
at things. Maybe we should also think about being more flexible in this and
being able to adapt the technology. Be more flexible in our use of technology
and in the idea of what a standard actually is.
DR. MAYS: Dr. Hout, I see he is trying to get in and I know he does GSS.
DR. HOUT: It is not about the GSS. Actually I have invoked Bob Hauser’s name
before. He spent most of his career analyzing the high school graduates of the
State of Wisconsin in 1957. Two years ago he and I believe Alberto Paloni
published a paper that showed that the best predictor of longevity — it
started out as an early career. Wages studied and now it is a study of
The best predictor of both BMI and premature death was high school grades.
If I can make a recommendation I think maybe — we know nothing about how well
people in their 60s can recall their high school grades. But if we could
somehow validate people’s recollections of their high school grades, we could
perhaps add the kind of achievement based predictive power to surveys of older
people by — if they can recall their high school grades by asking them were
you an A student, a B student, a C student because it turns out that the A
students live longer and outperform the C students on a number of health
behaviors. I am pretty sure it is Hauser/Paloni study. It is the one typically
on measure — they also have IQ scores for them by the way. The high school
grades do better than IQ score. They have some argument about showing up and
paying attention is more important than being just smart. But however you want
to read that result. I think it is going to be easier for a person to
self-report their high school grades than it is their IQ.
DR. WONG: Let me add another point just to argue why. If you want to do what
is easiest, just measure attainment and years of education and degrees. That is
obviously the easiest thing to do. We are talking about doing something extra.
Obviously that is going to take more time and money and effort to do. But I
think it is really important here. We can refine every measure, occupation and
income and wealth. But education is something conceivably would be potentially
very malleable in educational policy. And in thinking through all the things
that are happening now with educational policy and where we are going being
able to measure that impact I think would have a lot of policy relevance. While
it would be difficult to go back to interview a 60 year old today and say what
happened to you back in high school. I think that is important and maybe there
is a way to get it.
But we are talking about the future of these measures. We have an
opportunity now as we are seeing a lot of education reform going on. Now that
we have standardized test to take advantage of that linkage and be able to
track that data which I think we could have. Yes, it would be more expensive to
do, but I think it would be extraordinarily valuable.
DR. MAYS: I think we have time to take a question if there is someone in the
audience that has a question. Ask one final thing because I think that we are
getting near the end of the time. Part of what I think is an issue here again
that I am going to push, push, push at is, is quality. I am going to play the
devil’s advocate because I know the costs of the surveys and the difficulty of
it. Things that require recall. Can you comment on, Dr. Hout and Dr. Bauman,
what you find is the quality of the data and how far back do you usually go
before you start worrying about whether you are getting good quality or not?
DR. HOUT: Actually we have turn over data is similar to what Dr. Bauman’s
has in the GSS on occupation and education as well as the parent’s occupation
and education and the very surprising finding. By the way, we are replicating
other people’s findings and this literature goes back to the ‘60s, people
report their parents’ characteristics almost as reliably as their own.
You get these reversals, these little neighborhood effects of is it 10 years
or 11 years? Is it a vocational college or a junior college, kinds of reversals
between repeated waves separated by two years in the general social survey?
They are only slightly bigger on mom and dad than they are on the respondent
And the bias tends to be that every mom and dad graduated from high school.
There is a piling up of that and they tend to do it at one interview or the
other and presumably also sometimes both although that makes it appear
reliable. That is a bit of a bias in the parent’s reports.
Back to the parents who are co-resident, they are almost — for people under
40 in the general social survey, almost a quarter of them have missing data on
dad. They don’t know how far he went in school. They don’t know what his main
occupation was, and presumably it is because he wasn’t there.
DR. MAYS: Alright. Dr. Green, do you want to —
DR. GREEN: Let’s go eat. We will reconvene at 1:30.
DR. MAYS: Let’s give our participants a hand.
(Whereupon, a luncheon recess was taken.)
A F T E R N O O N S E S S I O N
DR. MAYS: Good afternoon everyone. We are going to get started this
afternoon with one of our panels on income which is I think very critical to
this discussion. We really welcome and appreciate those of you who are joining
us. In terms of how we will proceed, we are actually going to do it in the
order that we have unless there is any reason that you want to switch around.
If not we are going to start with Connie Citro. Those of you who know about the
Committee on National Statistics know Connie Citro’s name. Many of us who know
their work know Connie via that. Of course that is with the National Research
Council. Without further ado, Connie, thank you very much for joining us today.
DR. CITRO: Thank you so much. I am just delighted to be here to talk to this
group about — my topic was the income section of this whole collection of
attributes that are often known as socioeconomic status. I am the director of
the Committee on National Statistics. But before I say a little bit about the
committee I wanted to give you my bottom line. That is what they say. Bottom
line first. Then you tell them. Then you tell them again. It is critical, but
totally challenging in terms of respondent burdena dn data quality to include
good income measures in federal health and health care surveys. And I am also
going to make an argument that there should be a really good attempt to
approximate the new supplemental poverty measure.
It is just as critical in my opinion not to reinvent the wheel. A lot of
health and health care surveys do include income and then we have surveys you
are going to be hearing about in the Census Bureau and other agencies and there
is an amazing amount of variation in how those questions are asked, who they
are asked of, whether it is categories or continuous amounts. While some of
that variation may be justifiable, I think a lot of it is just because we are
all too busy in our own groups and we need to work together.
The Committee on National Statistics, very briefly, I should say Mike Hout
is a member of the committee right now. We have a two-fer here. It was
established in 1972 at the National Academies to improve the statistical
methods and information on which public policy decisions are based. And we do
studies in a whole variety of areas that you can see and we try in a small way
to serve as a coordinator force in a highly decentralized US federal
statistical system. In HHS itself there are a whole lot of agencies that do
statistics. The Office of Management and Budget estimates there are 14 kind of
official statistical agencies of which one is the National Center for Health
Statistics, but there are another 70 or 80 agencies that do substantial data
collection analysis publication, et cetera.
We have done over the course of CNSTAT a lot of relevant work on income and
poverty measurement. This report Measuring Poverty: A New Approach is the one
that laid the groundwork for what is now the supplemental poverty measure. We
have also done a lot of work on small-area income and poverty estimates using
models with surveys and administrative records. We have looked at a number of
the major surveys that are relevant to the discussion here.
Joan Turke and ASPE sponsored a workshop that we did providing national
statistics on health and social welfare programs in an era of change. That was
when by the date you might remember that PRWORA, the welfare reform act, went
into effect and made it much harder to collect data on welfare benefits because
program names, the types of benefits, et cetera were changed quite
We have done a lot of work with the American Community Survey which is the
replacement of the Census long form question so that every year instead of
every 10 years you get detailed data including income, occupation, education
for small areas. But there are a lot of issues associated with using that data
having to do with it is continuous which is great in one way, but can make the
estimates sometimes a little tricky to figure out.
We have done a lot of work on the survey of income and program participation
which HHS supports. This is a Census Bureau survey, but they contribute to it.
And it is a major source of data on the subject in its title and it has had a
number of issues over the year on which we have worked. We also did a review of
the National Children’s Study which is changing as I speak as I understand.
Are we talking about what I and these committees and study committees and
the committee as a whole have learned over the years about income measurement?
This is obvious I am sure these next couple of slides, but it is important when
we are hearing about the current income, it is not necessarily the best measure
of socioeconomic status. It varies a lot year to year. It has lots of reporting
problems. But it is really needed whether or not you are going to put it into a
socioeconomic status measure. It is the most commonly used measure of the
economic condition of the population nationwide for geographic areas and
Often people look at median household or family income and just as often
they look at multiples of the poverty threshold which is where the new measure
comes in. Gini coefficient is another, although not as often cited, statistic
on income inequality. And for all of these measures I should say you need
continuous income data, getting just categories like is it under 20,000 or
20,000 to 50,000 or something does not cut it for these important purposes.
Also, many federal program are tied to generally the poverty threshold or
multiples of the poverty threshold and have detailed provisions on income for
eligibility. And agencies very often want to take a look at who is eligible out
there and who do we know is participating, what is the take up rate, what is
the characteristics. The USDA Food and Nutrition Service does this all the time
about SNAP otherwise known as food stamps.
You all will be doing this even more as the Affordable Care Act provisions
go into place because like Medicaid eligibliity and Medicare Part D and the ACA
ties a lot of its provisions to income in terms of as a multiple actually the
poverty threshold for the subsidies for premiums and copays and so on. This is
just absolutely integral to primary function of the federal government.
In terms of income statistics to understand relationships here is where it
does get very complicated. And in addition to what other speakers such as Mike
have talked about Jim Smith at RAND and colleagues using the Health and
Retirement Survey and other longitudinal surveys have tried to tangle out —
sort out the tangle of is it health that makes us poor or is it poverty that
makes us unhealthy or some combination of both.
One, I think finding that seems pretty sensible, justifiable is that if you
have a major health shock in your adulthood or into retirement is going to
generally affect your economic status, not just the cost of the medical care,
but your lost earnings, opportunities or whatever.
But in terms of sort of the health status that you start out with say as an
adult it turns out that current income or wealth doesn’t relate to that nearly
as much as your education and what your childhood family status was. If your
family was higher income relative to poverty, your own education is good, those
are the things that seem to explain your current health as an adult.
Income is compared with education and you have heard arguments for
occupation. It is not the one you would pick if you were trying to measure SES
itself and you had to pick one or two of those. But you obviously need it for
all kinds of, again, program evaluation, description, looking at things like
the economic effects of the ACA, episodes of illness, disparities, and access
to health care and all kinds of things.
The rub is that income is very difficult to collect. We take it for granted,
I think. All these reports come out. The media is flooded. Here is the median
income. It has gone up or down. And underlying that is a lot of hard work on
the part of the statistical agencies and also a lot of still noisy data. People
just plain do not report income sources that they get and then they often
underreport the amount even if they give you the source and sometimes by
substantial amounts. People often just leave income questions blank. Either
they don’t know or they refuse. And then to use income to get the new poverty
measure which is a much better measure for your purposes than the official
measure the problem is you need even more information.
Just to give you some highlights of a lot of work that is out there it is
not easy to get administrative benchmarks to compare against survey reports.
But the work has been done. Administrative records have their own flaws, but
they do generally know when they have paid out of benefit to somebody. That may
not be the right somebody, but they generally know that.
Comparing the CPS annual social and economic supplement which is our
official source of poverty income statistics the estimates are that about — it
is short of administrative comparable benchmarks by about 10 percent and it
gets worse from there in other surveys.
SIPP oddly enough is quite good relative to CPS in getting things like
welfare benefits, but it falls short on earnings oddly which is one of the
generally best reported subjects. The Health Interview Survey on MEPS is also
lower than CPS. And in the Health Interview Survey they seem to be particularly
missing income at the bottom of the distribution. And the citations I have here
— I will be happy to provide the full citation for the record should you want
Again, earnings is well reported in surveys. But, again, looking at our best
income surveys: CPS and SIPP. TANF benefits, that is the welfare reform, are
drastically underreported. Regular Social Security, old age, and survivors is
well reported, but disability is underreported. Unemployment insurance is
underreported and on it goes.
Property sources of income are also underreported, but I think that is of
less concern here becuase we are really more focused, I think, on the middle
and lower end of the distribution rather than whether we are getting the 1
percent quite totally right or not.
Also, because people do not — they will tell you sometimes they have an
income source, but then they do not tell you the amount or for other reasons
there is a lot of the income data that are published in our fiscal statistics
that are imputed. I won’t say made up. They are not just made up. The
imputation is a very careful process to try to replicate and get the
And there is evidence — John Czajka and colleagues have done some work that
seems to show that th eimputations are doing a pretty good job there. But a
third of the income in CPS, SIPP and HIS is imputed and over 40 percent in
MEPS. In CPS actually there are three opportunities for people to not report.
Either don’t answer the labor force survey which is the main component or they
don’t answer the income supplement at all or they leave some of the individual
Then there are complications here from the new Supplemental Poverty Measure.
It is a much better tool for policy analysis and research than the official
measure. The official measure, which you know, is Mollie Orshansky’s minimum
food times three for everything else based on 1955 consumption data by the way
updated for inflation compared with before tax money income is a decent enough
economic general economic barometer. It goes up when the economy crashes and
vice versa. But it is totally unsuited for policy and progarm evaluation
because it doesn’t count all of our in-kind benefit programs. It doesn’t take
account of medical costs or benefits. It doesn’t take account of support that
comes through the tax system. It is as if the earned income tax credit, food
stamps, et cetera doesn’t exist.
To get this new poverty measure which has just come out for the first time
officially this last year, you do though need some more information. You need
family relationships. You need employment status of the parents. You need
in-kind benefits, taxes, medical out of pocket which our panel labeled MOOP and
that acronym has stuck, child care and child support payments. Although I have
learned that in the actuarial field in health insurance there is actually —
they have a MOOP too, which isn’t quite the same thing here.
I have said that income is essential, but difficult. What do we do? Here are
some principles that I would offer to you. First of all our health and health
care related surveys are that. They should not burden respondents with the
detail that is included in our major income surveys and actually generally they
don’t. But the goal ought to be, I think, a minimum question set that research
shows gets you reasonable estimates and perhaps a kind of a mid-range somewhat
longer set when having more detail such as a survey about MEPS which is about
expenditures and you would like some income detail when you would like more for
analytical power. But in general a minimum set and again maybe a middle set,
but not the 60 income types or the 50 or whatever that are on the CPS and SIPP.
And also, again, there are so many variations out there. The age of people
whose income is asked about varies from 15 to 18. Whether bracketing is used to
get response. Whether categories or continuous. How many income types? There is
tremendous variation that I think is not helpful in trying to make comparative
analysis and being sure of what you have. I really strongly urge a stronger
effort on the part of the department working with others, principally Census
and BLS, to develop this minimum set of questions. If there are things that are
already done on health surveys that the folks there think really have some
justification then perhaps those should feedback into our major income surveys.
But there really ought to be a way where when you are trying to compare across
the datasets to see what you have, you are not constantly having to say well
maybe it is this wrinkle or that wrinkle is why I am not getting the same
I put on the table the ACS asked the eight questions that were in the Census
long form plus total income and they have worked pretty well. They might be a
Also, it is critical even if your survey is focused on an individual
respondent and not the family, you have to have the family income becuase that
is what the resources are that are available.
Now approximating the SPM. I am saying I think this is important, not just
to have it come out from the CPS and I assume it will come out from SIPP in due
course, but approximate it in these other surveys without weighting them all
down. I would say the key information is family size and composition. The SPM
includes cohabitors and their kids as part of the family, also foster children.
Actually the health surveys I believe the major ones are ahead of the Census
Bureau in this regard because they have been doing this for a while.
Employment status of parents. You have to know if the only parent or both
parents are working. Just a yes/no kind of participation in the major in-kind
programs of getting food stamps, public housing, et cetera. MOOP. Of course
MEPS. That is what is all about. It is getting the whole range of expenditures
both those covered by insurance and those you pay. But in other surveys the CPS
has found I think to its surprise that two fairly simple questions on premiums
and other MOOP work pretty well. I think the HIS also has the two questions. I
would bring that to your attention.
Given that you have got that information much of which is already on many of
your surveys, family composition, employment status, et cetera get the Census
Bureau to develop a calculator which I think can be done for estimating the SPM
with the information from the survey which is going to include cash income plus
in-kind benefit participation, et cetera. And I really think it should be
within the realm of the possible to do a calculation that you can just plug
into you — the health care agency can just plug into your survey and get while
not a perfect equivalent of the SPM a better way of sorting people out than the
official measure does.
The official measure right now understates how much we are helping children.
We still aren’t helping them enough, but we are helping them in ways that are
just not in there. The official measure actually overstates how much we are
helping the elderly because it doesn’t take account of their medical expenses.
Having that better in your surveys is, I think, going to be very helpful.
I had just a couple of other thoughts. We do have this income quality issue.
Health care surveys are not the place to try to solve that. It is really the
Census Bureau and CPS and SIPP. I think we are at the place now where we can
and should be able to use administrative records to do something, calibrate or
substitute for or impute better for to get the CPS and SIPP income distribution
more accurate which will principally effect the lower end of the distribution,
welfare, employment insurance, et cetera in the upper end of the distirbution.
I think there is really no excuse not to have these better estimates.
Given that the flagship income surveys do this then it should be possible to
do some calibration that you all in the health care surveys can then draw on.
But they have done a lot of work about this. But the work has never quite
gotten from we are actually ready to roll to implement something in the
The other one — we had some mention about neighborhood characteristics.
There is now a lot of literature that this really matters. The ACS is sitting
out there with a small area of data on things like median income and poverty.
And appending that not to get small area estimates, but to get the
characteristics associated with the people and families that are in your health
care survey I think would be very valuable. There are confidentiality issues.
But I think those can be worked out particularly if you have essentially
national survey and you are putting characteristics onto individual records.
There is not a huge amount of additional identification from the neighborhood
There is also non-survey data increasingly out there that you may want to be
on the look out like the food desert indicators that the USDA is developing
based on grocery stores and neighborhoods and so on which you can now get right
off the internet.
Those are my food for thought. I will be happy to — look forward to
participating in the discussion and thank you all for your attention.
DR. MAYS: Thank you very much. Dr. Dahlhamer, welcome. One of the things we
have tried to do is to make sure that we also hear from some of our surveys
that are in-house. And Dr. Dahlhamer is with the National Health Interview
Survey. Welcome and thank you very much.
DR. DAHLHAMER: First, I want to thank the committee for allowing me to
participate in the session today. As you can see from my title, I am going to
be focusing on the collection of income data in the National Health Interview
Survey. In particular, I am going to talk about some recent changes we have
made, some design changes of the collection of that information to hopefully
improve the quality of the data we are collecting, but also some changes we
have made to the data as we release it to the public we hope has improved the
usability of that data.
Quickly here is an overview of what I am going to talk about. I am going to
give a very brief overview of the National Health Interview Survey for those
who are not familiar. Kind of give you a high-level overview of some of the SES
measures or data we collect in the NHIS. Like I said, I am going to drill down
and focus on the collection of income data. And, again, in particular some
changes we have made to the collection and release of that data in recent
years. I think in the process or as I go through that that will address some of
the committee questions that were posed to me in terms of some of the design
and methodological challenges and issues in collecting and releasing this data.
And then time permitting I had a few general responses to some of the other
committee questions that I would like to go over.
Very briefly, the National Health Interview Survey is an in-person household
interview survey. We do allow for some telephone follow up in cases where we
need to complete or missing portions of the interview. The interviews are
conducted by trained interviewers with the United States Census Bureau using
computer-assisted personal interviewing. Assuming we are not having sample
additions or subtractions in any given year the expectation is that we will
complete interviews in roughly 35,000 households a year.
Now in terms of the basic structure of the survey instrument, we have an
opening section we call household composition section where a household
respondent provides basic socio-demographic information on all members of the
household. This includes things like age, sex, race and ethnicity. Then for
each family within a household we have a knowledgeable adult 18 or older
provides additional socio-demographic information as well as general health
related information on all members of that family. You have a mix of self and
proxy report. That is what we call our family interview. And then we also
randomly select an adult 18 and older and a child under the age of 18 to
receive an additional set of health-related questions, what we call our sample
adult and sample child modules.
Now in terms of the collection of SES in the NHIS the bulk of this occurs
toward the end of the family interview in what we call our family
socio-demographic and our family income sections. In our FSD section we collect
information on education for all persons 5 and older, employment status for all
persons 18 and older, and then for persons 18 or over who were employed in the
previous calendar year we attempt to collect their personal earnings from that
prior calendar year and that is an exact amount. We are looking for continuous
income data here.
Then after the FSD section we have our family income section. We ask about a
variety of income sources. These are receipt questions. These are yes/no. This
is an issue that will come up in the discussion of the HIS. We don’t ask for
amounts on individual income sources. We simply ask for receipt — any family
members receive those sources of income. Yes or no.
Then we ask a total family income question, a global question where we
attempt to collect total family income for that family in the prior calendar
year. Again, that is an exact amount question. And then we close the section
with questions on housing tenure and program participation.
Just one last thing that I wanted to point out because I heard it kind of
bantered around the room before the session is that in our sample adult
interview in the opening section we also collect occupation and industry
information on a primary job. There are also detailed follow ups on things such
as number of employees to the job, tenure, paid sick leave, paid by the hour,
et cetera. We do collect detailed occupation or release detailed occupation and
Primarily what I want to do here is focus on the collection on income data
particularly the collection of family income. And, again, I think the emphasis
here or the approach that I am taking should address some of the questions that
were posed by the committee. What are some of the methodological issues facing
surveys in collecting and releasing this data? What are some of the best
practices or best ways to collect that data? And how can we improve upon the
quality of the data?
I want to start out and just throw up on the screen the question that we
have used to collect total family income from ’97 through 2006. Now before you
get too confused by the information in the squiggly brackets this is just
indicating that there is a fill. In other words, depending on some information
we have on that family we may fill different items in the question text. The
first one you see there we would either fill for you if it was a single person
family or of your family if it was a multi-person family. The same with last
calendar year, in 4-digit format. If I was interviewing you using this question
2006, it would fill 2005 in the question. Essentially what we were trying to do
here is to get an amount from that family respondent, again, focusing on the
prior calendar year.
Consistent with what Connie pointed out in terms of the difficulties in
collecting income information this shows you basically the response rate to
that total family income question. As you can see it hovers just below 70
percent. Take the inverse to that you look at your item response rate to total
family income over this period. We are looking anywhere from about 31 to 34
percent. Potentially problematic if you are working with this data not only in
terms of the bias that could be introduced into your analysis, but the impact
on the precision of your estimates.
One of the ways that we try to tackle this in the field is through some set
of follow-up questions to initial non-responders to the exact amount question.
This is the question that we had in place for quite some time, again, up
through 2006. Long story short what we were trying to do at least in the very
beginning tell us if your income is $20,000 or more or less than $20,000. If we
got a refuser, don’t know here that was it. They were out. We were done. We
weren’t collecting anymore income amount information.
If they gave us one of these two responses then we had an additional
follow-up question where the interviewer would hand the responder the flash
card and on that flash card were series of detailed income intervals listed.
And they would ask the respondent just give me the letter associated with the
interval that you think your income falls in. And there were different
intervals used depending on whether they had previously answered less than
$20,000 or if they answered $20,000 or more. Now you could probably get a feel
from looking at that those intervals are pretty detailed. Thousand dollar
intervals. You probably know where I am going with this.
The gist of the story is that we weren’t getting much with those follow-up
income questions. This is the same graph that you saw earlier. Your green line
over the top. That is your response to exact amount question. Just under 70
percent of our families gave us an amount. If you look at the navy blue line
these are families where they didn’t give us exact amount. They told us that
their income was $20,000 or more or less than $20,000 and that was it. They
were done. They wouldn’t give us anything more. That is a substantial
proportion of your families where you really don’t have much in the way of
usable income information.
And then finally if you look at the bottom, the yellow or gold there, these
are folks who, again, we didn’t get an exact amount, but they at least
completed both of the follow-up questions. We got some income detail on them.
But as you can see, it is a relatively small percentage of all of our families.
The feeling here was that these follow ups really aren’t doing much for us in
terms of getting some partial income information we could use for these
We decided to do a small scale field test in the second quarter of 2006. And
the primary emphasis here was on these follow-up income questions. What we
decided to test were series of unfolding bracket questions. Essentially what
would happen is if the respondent said don’t know or refused to that exact
amount question, we would follow up and ask them can you tell me if your total
family income is $50,000 or more or less than $50,000. I know this diagram is
kind of hard to follow so we are just going to go across the row on the top.
If they said less than $50,000, we would follow up with a question. Is your
total family income greater than or equal to $35,000 or less than $35,000? If
they said $35,000 or more, we were essentially done. We bracketed them between
$35,000 and $50,000. If they said less than $35,000, what we did is we asked
them an additional follow up. This would be the last follow up they would get
and we did it in relation to the poverty threshold for that particular family
size, family composition. We could at least categorize them at or above poverty
or below poverty.
Long story short when we did the field test is that it wasn’t — we got
considerably better response from these revised questions. We decided to move
forward and adopt these series of follow-up questions with the 2007 instrument.
In addition, as part of that field test we also did some tweaking and a
revision of the wording to the exact amount question which you see here which
we have subsequently adopted and have been using since 2007. I can tell you
during the field test that that didn’t have a huge impact in terms of
improvements of response to the exact amount question. We did get a little
boost. But really the bang for our buck was coming from the follow-up
And this slide is again similar to the earlier slide, but now we have added
in 2007 through 2010. If you focus on that navy blue and those gold lines so
that a proportion of families where we are only get some partial income
information, they didn’t get us an exact amount. They don’t complete the full
follow up. It drops dramatically when we introduce the new questions in ’07.
Those were giving us complete follow-up information. When we are not getting
the exact amount, it goes up pretty dramatically.
The other thing that is interesting here is that over time the increase in
percentage of families who were giving us an exact amount. It seems like a good
story at face value, but I have some issues or concerns — if we can come back
to that possibly later, we will do that.
Another way to look at this is we are collecting more partial income
information. How usable is it? Can we take that information, compute the
poverty ratio, and then categorize these families into basic categories of
poor, near poor, or not poor which are standard categories we use in official
NCHS publications? You can see we have a dramatic increase in the percentage of
families that we are able to place into these categories with the addition of
the new unfolding brackets.
Now you might say you still have about 12 to 15 percent of families where
you can’t even categorize into these three basic categories of poverty. I want
to work with continuous income data. As Connie mentioned, if you are a policy
analyst, you are interested in looking at things like program eligibility. You
might want to look at income cut points for program benefits. You want the
continuous income data. And we still have non-response in the 25 percent range.
As most surveys do we have to impute those missing values. We impute missing
values on total family income as well as personal earnings. And we use an
approach called multiple imputation. Probably many of you are familiar with
multiple imputation. At its simplest it is the process of imputing more than
one substitute data value for each missing value.
In the case of the NHIS we impute five replacement values for each missing
family income and each missing personal earnings value. We think that MI is a
much better approach than single imputational routines because it accounts for
the extra variation due to the imputation process and it incorporates it into
the analysis. It is an imputation. I will stop short as well as saying we are
making up data. But it is an imputation. It is not a reported value.
And the problems we run into with single imputed datasets is that the
software program doesn’t know that you are working with a reported or an
imputed value and you are likely to underestimate the variants. We think this
is the right way to go.
Subsequently we release five imputation files for each year of NHIS data. It
usually comes out about 2 to 3 months after our main public use release and the
July after the year of data collection. And we have released these files going
back to 1997.
There is always a tradeoff when you make these decisions. And of course one
tradeoff here is there is increased analyst burden with working with the
multiple imputation files. Analysts now have to work with five data files as
opposed to one. It requires more computing time, more computing resources.
Now I will say this that I think current software packages are getting much
better at accommodating multiple imputation files and I think that burden is
being relaxed somewhat. But it is a consideration that a survey has to go
Some other recent changes to NHIS. One is release of continuous, top-coded
total family income and personal earnings amounts on our public-use imputed
income files. Connie made note of the Czajka and Denmead report. That was a
mathematical report that I believe is produced under auspices of ASPE. It was
looking at income from a policy analyst perspective across the large-scale
surveys. This was clearly a critique or criticism and suggestion for the HIS.
You are collecting this data, but you are not releasing the continuous income
data. And, again, this is vitally important for analyst. We took heed and we
are releasing that information starting in 2009 and we are gradually going to
be working back and releasing this data back through our 1997 data files.
Other recent changes on the design side that started in 2011 is we have
added two additional questions to those unfolding bracket questions as follow
ups to our exact amount question asking, one, income to relation to $150,000.
But I think more importantly we are also looking at 200 percent of the poverty
threshold. This is going to enable us to categorize even more of families into
those basic poor, near poor, not poor categories even when we are just
collecting partial income information. We think that is going to enhance the
usability of the data.
Collection of personal earnings. I haven’t really talked about it. I talk
about it here in context with relationship with total family income in the
NHIS. This is the basic question that you see here. As I mentioned earlier, we
ask it of all persons 18 or older who it was reported worked in the previous
calendar year. And it is asked at the end of our associate demographic section,
the family interview prior to our income section.
The reason why I bring this up in relation to total family income is we are
kind of unique in the way we collect our income information particularly total
family income. We ask a single, global question to collect total family income
whereas many surveys would ask about amounts for specific sources. They sum
them up. You get your total family income. You don’t get discrepancies. Well,
we have run into an issue where a small percentage of our cases or our families
we have discrepancies between earnings and total family income where earnings
can actually exceed total family income. And, again, this was another issue
that was bright to light in the Czajka and Denmead report. It is something that
we are trying to tackle right now. And we are really experimenting at this
We started in 2011 with an experimental set of verification checks or
screens to see if we can resolve the discrepancies between earnings and total
family income. What we did is pull random half of families. They were eligible
to receive this verification check if they met one of the two discrepancy
criteria that you see on the slide. Currently analyzing that data. I don’t have
a definite answer. But I can tell you from the usability perspective the
interviewers didn’t seem to like those verification screens all that much.
Their tendency was to not use those, but to back up and try to make changes to
the original field which meant quite a bit of traversing through the
instruments. We have done some revisions to the screens to make them more user
friendly and we have beefed up our training on this for 2012, but the story is
still out on this as to whether this is going to work or some other design
solution that we need to explore.
Last two slides. I just wanted to veer off here slightly. One of the
questions that was posed to the speakers, were there other types of indicators
that we should be collecting? I wanted to mention just a little experimental
work we did with the collection of wealth. It is part of that field test I
mentioned back in the second quarter of 2006. Our feeling was the literature
seems to be telling us that wealth may be a much better measure of SES for
older adults. Is this something that we can ask in the NHIS? Is there any value
added above the traditional indicators of SES we are collecting? We included
four questions in that field test. One of them is total financial assets. We
looked at property value for home owners as well as the amount still owed on
the property and just a simple question on car ownership.
Long story short, there appeared to be some value in collecting this
information. When we did some analysis, we did some modeling. There appeared to
be some additional explanatory power above and beyond some of the traditional
SES indicators particularly as we hypothesized for older adults. The problem is
they don’t put a whole lot of stock in those findings because we had such
significant item nonresponse to these items as you can imagine. The first three
there listed were all exact amount questions. We had a nonresponse approaching
50 percent for some of these items.
And for various reasons for the test we didn’t do a series of follow-up
questions to try to get them into wealth intervals. Long story short is we have
tabled those for now. But there appeared to be some value and it is something I
raised just in terms of consideration for thinking about some minimal set of
questions for health-related surveys.
Finally, I am just going to conclude by throwing out a few design
considerations. And actually I think this is going to come up in some sessions
tomorrow. I know John Czajka is speaking tomorrow. He would probably be able to
address all of these things right off the top of his head. How much detail to
collect in non-income surveys? Connie mentioned this in her talk. Where do you
draw blind? Yes, we could go and collect income amounts for each of the income
sources reported, but that is going to lengthen the income section were health
survey. And the concern there is do we put off respondents so to speak. Do we
risk an increase in break offs and therefore not collect the health data that
is the crux of our survey? Where is the balance in there?
Reference periods. Should you be collecting income data from the prior
calendar year? Should it be the last 12 months? Should it be the previous
month, the previous quarter? All of these things have implications for recall
and the quality of the data you are collecting.
Mode of data collection. Can we arrive at a set of questions that we think
will work equally well across multiple modes? It may be that we have to tweak
some questions to work better of an in-person versus telephone. Should we
consider as we have surveys, consider the use of Achasm(?) for collecting this
type of information. At least on the HIS it is by far some of the most
sensitive data we collect in terms of item nonresponse.
And of course finally I will just mention imputation again. If you are
collecting continuous income data, the likelihood is you are going to have to
impute missing data. You have decisions to make about how you do that and what
the impact is from the standpoint of analyst’s burden on — you also want the
public to be able to use your data and use it appropriately.
I am going to stop there. Thank you.
DR. MAYS: Thank you very much. Dr. Selden.
DR. SELDEN: Good afternoon. My name is Tom Selden from the Agency for
Healthcare Research and Quality. I am here today to talk about income in the
Medical Expenditure Panel Survey. The outline of the talk today is I just want
to briefly introduce income measurement in MEPS and take a look at a very
stylized compressed version of the sorts of questions that we are asking. Talk
a little bit about imputations and talk about comparison to CPS.
I should have perhaps had an introductory slide on MEPS. The Medical
Expenditure Panel Survey is actually a follow on after people are in the NAS. A
subset of those is then tapped to participate in MEPS. They will then
participate for two calendar years in MEPS. We gather an in-person
interviewing, CAPI designed. It is a smaller survey than NHIS. We gather a
tremendous array of data. Our main focus is on medical use and medical
expenditures by type of payer. And we also take a close look at insurance
coverage. But on SES we have education. We have industry, occupation. If you go
to a data center, we can give you geo-coded information that can be merged for
neighborhood variables. We have also income which I will talk about today.
We collect not as extensive array of income measures as income sources as
CPS, but still quite extensive especially for a health survey. Each of the
items here — I have grouped them. But it is not like we are asking those
together. For instance, interest we ask about separately from dividends, but I
put them in the same line to try to get it all in a comprehensible format on
We ask about wage/salary information, unemployment, worker’s comp, et
cetera. Also, scanning down we have some pension information, alimony, child
support. And then we have Social Security or OASDI, SSI, welfare which could be
TANF or other maybe general assistance. And then we have food stamps, but I
have been a long advocate for getting something on housing. We don’t yet have
We are asking those about each individual, not at the family level, but at
the individual level. We are asking it for everyone in the family over a
certain age. I believe we have 16. And then its income is collected in the
spring and it pertains to the prior calendar year much like the March CPS. It
supports calculation of total family income and percentage of the poverty line
because we produce when all is said and done, we try to collect exact dollar
amounts. And when all is said and done, our public use variables are continuous
measures. It can compute percentages so the poverty line.
I will also mention sort of financial SES data that we collect fairly
detailed asset data whereas I haven’t gotten the housing question on yet. I did
get medical debt just added. We have a couple of questions. Thanks to NHIS
actually we were able to borrow wording from you folks whereas I could never
get my wording approved. Thank you.
Then we also asked detailed employment information throughout the year. This
can matter. One of the things — I wrote a paper looking at spikes within your
spikes and out-of-pocket spending. Somebody might be spending over the course
of the year $5000 out of pocket. And for a lower income family that could be
quite a hit. But what if that happens all in one month? And what if that month
is the month immediately following where you stopped working because of an
illness or injury. If you don’t have savings, you are in a tough situation
especially if you don’t have insurance or if you have just lost it.
Within your dynamics can also play an important role for thinking about
participation in ACA. In Medicaid it is based on a current countable income
notion whereas subsidies are more based on your prior tax filings with then
some accounting to meeting things up at the end of the year. There are concerns
with — within year fluctuations and income will cause churning back and forth
between the exchange plans and Medicaid. For that reason the detailed
employment information can help us look at a more point in time of view of
income. We align well. That is relative. I will show you later on.
We are about 10 percent low which is better than some surveys do. We take —
CPS is the gold standard on this. But of course CPS is itself a little bit low
on this. It is the extreme top end and I guess there are some others maybe at
the low end too. I don’t know.
We think that perhaps, but we have no basis on which to say this is perhaps
because we ask all these detailed questions. Moreover, if you pull out the
public use file, we do really well, compared to CPS. But that is because we
post-stratify the weights to match CPS. When I talk about comparisons to CPS —
just before that last little bit of fairy dust is sprinkled on the data.
We have a very unconventional questionnaire design. Had there been time and
money for pretesting, this might never have happened. There was no time or
money for pretesting. The people who developed this the intent was to try to
improve on income collection by getting people. We were told we were going to
be fielded after April. Why not get people to pull out their 1040s if they have
them and just read down the list? Now there are some issues about what is
taxable and what is not. But you can usually get the non-taxable portion off
the 1040. Why not just do that? But people don’t use their tax forms when they
answer MEPS. All the structure that was added on to accommodate this one goal
is now stuff that we have to work around. We thought about changing it for
something simpler, but there has never been any money to do that.
Here is some idea of what sort of questions result from thinking about this.
First, we start off and say — and these are not verbatim. I am trying to
simplify it so that you could actually read the question more like normal
English. Has person filed a 2009 federal income tax return? If you say yes then
were you married filing jointly? Let’s assume that you answered that yes. Who
is the other taxpayer that the person is filing jointly with? That makes you a
payer. Again, because we were going to be collecting these amounts. During
2009, how much money did the person and the spouse, the married filing jointly
couple, jointly receive from let’s say wages? And then we follow up and say
what percentage was by this primary filer versus the secondary filer?
If you do not give an exact dollar amount then we follow up with bracketed
responses that are fairly tight. I was just looking at them on wages. For
instance, it is 0 to 5000, 5000 to 10,000. Then I think it is goes up to
20,000, so the increments start getting larger and larger, but it is a fairly
Now that is how most of our income questions are asked. But then some types
are asked with a family level screener, does anyone in your family receive SSI?
Then we would follow up and say who received it and how much did they get.
Changes over time. I am going to skip over this pretty much. The main thing,
there was a skip pattern that we took out. But then the nice thing about
removing skip patterns you could put them back in and simulate what data you
would have collected had the skip patterns still been in there. Then we did an
old editing method and a new editing method, and we were stunned at how well we
had been doing. This introduces a little bit more burden. Again, since people
are not using a tax form, some of the skip patterns didn’t make sense. They
were really key to that.
And then there were some minor changes that we have done. But from
everything we have looked at it is basically a pretty consistent measure over
Missing income is imputed singly. I think it is a great idea to do multiple
Each income element is hot-decked separately. Once we have done wage and
salary, we then use that to condition what sort of as an interest amount. If a
person with high earnings will be matched — a recipient with high earnings
will be matched with a donor who has high earnings to preserve the correlation
between the types of income.
The bracket responses — we impute those. We don’t just like take the
mid-point of the bracket. We find somebody who will give an exact dollar amount
and then we do the imputation that way.
And this is sort of a novel thing about MEPS. If you told the interviewer
all throughout the year that you had a salary of $43,000 at your job, you are
in that job all year long. What is your best income of wage/salary income?
Because some people who were willing to say that having answered all the
employment questions. Some people won’t balk at that point or it may have been
a different respondent at that point. But we can’t get the income information.
But what is your best guess for the wage/salary earnings? I think it is
$43,000. That was your salary.
We go to the employment data to help us fill in missings. We then produce
detailed imputation flags. That would be one of the imputation categories. We
relied on employment data. But we only rely on employment data if that was if
not itself imputed.
One of the things about our imputation is that right away we had all these
different income sources. We had all these different brackets; imputations we
had to do just sort of are we going to use the bracket information. Do we have
information from NHIS that we could at least use as a conditioning sort of
class variable in the hot deck to help improve the precision? What sort of
information do we have on the person? What type were they? We have a gazillion
One of the things that — when I arrived at AHRQ, I saw people going through
giant stacks of paper to try to collapse cells because if you try to push on 20
different dimensions to line up donors and recipients, you might end up with a
recipient, but no donor. Or you might want to have larger cells for various
statistical reasons. You collapse cells so that they match on the donor
recipient side. But it could be incredibly tedious. I looked at the number of
hot decks and I was like we aren’t doing this.
We developed with a guy at SSS a really slick piece of shareware that is
artificial — you can tell it what your rules are for implementing the
collapses and then it will do it. If it gets stymied at one level, it can
bypass that and continue on. It is a very useful thing, not just for doing
automated cell collapsing, but the discussion earlier — Connie was mentioning
perhaps merging some data — some additional data that wasn’t collected to try
to get a sense of what the alternative poverty measure would be on a different
survey. To do that you have to match somehow and to match at the most precise
level requires something like this. Other places have variance. But I just
wanted to put in a pitch for that.
This is imputation frequencies. I saw the figure 43 percent. I think that
might be — did you have any element of income that was —
PARTICIPANT: I believe it is the amount of total income, but it is in the
DR. SELDEN: And maybe I am not — in any case I come at it at a very
different perspective. It could be that somebody is — I don’t know. I can’t
remember the details about it. I get a very different view of things.
On this slide I am sort of nudging you all to believe that the green is good
and red and various shades of that are not so good. I will leave that up to
you. Exact dollar amounts for wages would be the dark green. Then if you go to
the bracket and I am not sure whether they allowed us to use brackets. But we
felt that our brackets were fairly narrow. That gets at least a light green.
And then if we use employment data, we will shade it a little bit lighter
still, but still call it green. But then we switch it over to pink which is we
knew something. We knew you were employed. But maybe you always balked at
telling us how much you made and now you haven’t told us that. But at least we
know that you have a positive amount. Then we can find a donor who is employed.
And then the ones where we have no idea are really quite small. There are
varying gradations within 43 percent however that was calculated.
Looking across income sources you say well, you did a great job on SSI. Most
people can pretty quickly say, no, I am not on SSI. You get a very high
response rate there. That is easy.
I computed a weighted average across all of the income sources where — took
a weighted average as the total. This is not by the total amount of income.
This is by persons which I think might be more relevant for thinking about
poverty because if you care about the low income folks, you want to get their
income right even though whether you get it right or wrong may not matter for a
total dollar weighted calculation. This is person weighted, but I have weighted
across income sources by the share of those incomes nationally and our
aggregate income. There you can see that depending on how much of the green you
grant us we either do quite well or not. I think we actually do quite well.
Where we don’t do quite so well though is when you subset the people over 65
because now earnings matter much less. With these other income sources we don’t
have a way of back stopping our collection of data by looking at employment. We
also are looking for income elements which are more prevalent and that reduce
We did some top coding. I don’t know if I am going to get into this. I think
this is the best way to top code because what we do is we find a threshold. We
go to the next person below. We calculate the amount that the person was over
the threshold. We use that in a regression and then we predict it back to them.
What that does so long as we have done — made sure to do any smearing or
anything to transform, we preserve the means and we preserve the means also
based on subgroups defined by whatever is in X, which I actually promised I
would never tell anybody.
Comparison to CPS. You can see that we are 10 percent. Thank you.
DR. MAYS: Thank you. Charles Nelson is joining us. He is with US Bureau of
the Census with CPS and ACS.
DR. NELSON: Thanks for inviting me here. I will talk about the Census Bureau
efforts in collecting income data. I will touch on the three major surveys that
the Census Bureau conducts that have a major focus on income. They are the
American Community Survey kind of the new kid in town when it comes to surveys,
the Annual Social and Economic Supplement to the current population survey
which has been around since the 1940s, and SIPP. I will talk about all three of
them, talk about the differences, talk about the questions we ask and the
problems we have uncovered. Connie mentioned several of them.
First, SIPP. The main thing about SIPP is that it is a panel survey. We
follow people over several years. It is really the survey you want to use if
you want to find out about poverty spells, exiting poverty, entering poverty,
characteristics of people who have changes in income over time. It is really a
survey for doing that. And it has a wealth of topical module data. It has
detailed disability data and it has data on assets and net worth. It is the
only Census Bureau survey that has that kind of information. I know that was
one of the topics you were interested in.
In the past we have done SIPP, we ask questions on a very short reference
period. We have gone out every 4 months and asked about income received over
that 4-month period. It does a very nice job as Connie indicated, very nice job
of collecting income on — transfers, temporary kinds of payments. It has a
very detailed set of income questions as well as questions on all kinds of
program participation. It is undergoing a major redesign that will take place
CPS ASEC. It is a source of official national US poverty estimates. It is
around 100,000 addresses, around 78,000 interviewed households. It is a great
source of national income data. It is a state representative sample. You can
get state estimates from the CPS. We generally tell people to use multi-year
averages to have lower standard errors. The CPS ASEC is collected in February
and April of every year. And the questions refer to the previous calendar year.
We use computer-assisted personal or telephone interviewing. It has a pretty
detailed questionnaire. You can report over 50 different sources of income on
the CPS questionnaire. As I said, it has been around since the 1940s. It used
to be called the March CPS.
The ACS is a very large sample about 3 million addresses around 2 million
interviewed households a year we are able to produce. That large sample makes
us able to produce annual data for all population areas of 65,000 or more and
multi-year data for smaller areas.
The ACS is somewhat different than the CPS in that we conduct the survey
throughout the year. You can think of the sample as being divided up into 12
equal groups and then we interview one-twelfth of that sample every month.
Because it is taken throughout the year we thought the best way to ask about
income would be to ask about the previous 12 months. When you see a poverty
figure from the ACS, a 2010 poverty figure, it was based on comparing that
family’s income over the 12-month period to a set of poverty thresholds based
on that same 12-month period since poverty thresholds are updated with a
consumer price index which is available monthly. You can come up with a set of
poverty thresholds that correspond to those reference months for that
referenced 12-month period.
The ACS is also different in that it is a mailout/mailback survey. You get
something in the mail. And then if you don’t send it back, we follow up with a
computer — with a telephone or personal follow up. Personal follow up is
actually a subsample of people who don’t report who don’t send back their
In terms of income, it is a lot of less detailed than the CPS ACEC. We
collect income on eight different income sources.
Here is a depiction of the way the two questions differ. Those are the eight
CPS questions, not the questions themselves, but the topics of each question.
Those are the separate CPS questions that correspond to those eight questions.
For wages and salary there is one question about all your wages and salary
income. On the ACS and CPS we ask about your longest job wages and then other
jobs you may have had. For property income there is one question that covers
interest, dividends, rents, royalties, estates, and trust. At CPS there are
separate questions about all of those.
For SSI and Social Security they are basically the same question. But for
every other type of income there are a lot more detail on the CPS. The biggest
difference is other income where there is one question on the ACS that there is
really anything you haven’t reported elsewhere. And on the CPS which includes
unemployment compensation, workers’ compensation, Veterans’ payments, alimony,
child support, financial assistance, educational assistance. That is all one
question on the ACS. In CPS we ask separate questions about all those types.
What kind of a difference does this detail make? Not as much as you might
have thought. Last time we compared the two and sort of recoded the CPS to look
like the ACS in 2006. The aggregate income generated by the CPS questions was
around $7.8 trillion and about 4 percent higher than the ACS aggregate, about
As you might expect one of the types of income that the CPS has a higher
aggregate was other income. Asking all those separate questions makes a
difference. And also earnings which are important — 82 percent of income comes
from earnings. And that is one reason why the two don’t do so well is that ACS
does a nice job of collecting earnings. The 12-month reference period seems to
work and people report earnings pretty well and that is perhaps the main reason
why those two aggregates are really pretty close. And there was a little bit of
a trade off.
The ACS actually had higher aggregates for some kinds of income. There may
be some misreporting. There may be people including other things and public
assistance income and the ACS because one difference besides the questionnaire
differences and the detailed differences was also the fact that every answer in
the CPS is filtered through an interviewer. And in the ACS over half of the
answers come from mailback where there is no filter between what the respondent
thinks is their income and what we take as their income. If you are reporting
your food stamps as public assistance, we will take that whereas the CPS
interviewer would say no. Don’t report that here. But ACS does these simple
questions and the ACS do generate a fair amount of income. We actually get
pretty comparable results. Though I think people do a better job of
categorizing their income types on the CPS. Obviously the detail does really
Here are the CPS and ACS poverty rates from 2000 to 2010. And as you can see
for most of this period, the ACS has been a little bit higher than the CPS and
some of that is because the extra detail of the CPS questions probably
generates a little more income. Although more recently they have converged and
the difference is not as prevalent as it was in the early mid parts of the
decade. But the trends are similar and the ACS really produces reasonable
estimates of both income and poverty. I can show you the same thing for me and
household income. You see basically the same trends. And there it is kind of a
mixed bag. Some years the ACS is a little higher. Some years the CPS is a
little a higher.
DR. HOUT: ACS has institutionalized population, right?
DR. NELSON: This is based on the poverty universe for both surveys. There
are some differences between the two. One difference is that in the ACS we
don’t have the detailed household relationships that we do in the CPS. If you
are not related to the household and you have other people in the household who
are related to you, but you are not related to the householder. CPS — we can
do the family coding — calls that an unrelated subfamily and we can come up
with the poverty status based on that family’s characteristics. The ACS — all
we know is that these people are not related to the householder so we treat
them well as individuals and we come up with a poverty status individually.
That is probably the only — one of the differences that shows up. It makes the
ACS rates a little bit higher. Aside from that this is based on the same
One nice thing about the ACS and then Connie mentioned this before. If you
want to characterize — since the ACS is this large sample and it has really a
wealth of information on it, social, economic, and housing characteristics, one
of the few national surveys that has social, economic, and pretty detailed
housing data. I think something could be done along the lines of Connie’s
suggestion. The ACS could be a great source of neighborhood characteristics
data. While there may be issues with disclosure, but I think those could be
I think that is something that if you are interested in doing that, doing
something along those lines, the ACS is really kind of made for that kind of
application because it really is a wide — for most of the topics that has a
less detailed set of questions in the national surveys, but it really does get
reasonable estimates for just about everything there. It really is a great
source of information about a wide variety of topics at a very small geographic
area if you use 5-year averages.
Here is a description of how the ACS is published. Basically for any
geographic area in the country regardless of size, school district, city,
village, census tract, you can come up with estimates or the Census Bureau
publishes estimates based on a 5-year average. When the population is between
20,000 and 65,000, you can get 3 or 5-year averages and obviously for 65,000 or
more you can get 1, 3 or 5-year averages and that is every year. We have put
out two sets of data 5-year averages so far from covering ’05 to ’09 and ’06 to
2010. It really is a great potential source of income data for small areas.
The Census Bureau, when we put out multiple sources of income data, users
always want to know what source do I use. We have something on our website that
is simplified here that tells people or tries to generalize what source you
should use. We basically tell people that for income and poverty if you are
looking at the national picture, you ought to use the CPS. And if you want to
look at long-term trends, you really have to use the CPS because the ACS has
only been around since 2000 on a national basis and SIPP has only been around
since the 1980s. If you really want to look at that long-term trend.
For states for anything above the national level we tell people really use
the ACS because the sampling era is so much better, so much smaller than the
CPS. And the data in terms of income and poverty are really pretty good.
There are these measurement issues when it comes to income and Connie
touched on some of these before. These income questions tend to have the
highest imputation rates on a survey. I remember we were asked in the 80s to
advise HIS on adding income questions at that type. We went to these meetings
and we showed you guys what the CPS, what the Census was doing, what the CPS
was doing at the time. I looked at the HIS questionnaire and I said finally a
survey where income questions aren’t going to be the most sensitive questions
on a survey. I read the survey and I saw these health questions. How do people
answer these questions? And then we came back from the feedback and the
interviews and they hated the income questions. They hated them. Even on the
HIS they were the questions that gave the most problems with interviewers were
These imputation rates are higher than really any other — for every survey
I have worked on it has different topics that income imputation rates are the
highest and they differ a lot by income type.
One thing is that ACS imputation rates are a lot lower than the CPS. That is
because it is a mandatory — primarily because it is a mandatory survey. You
get it in the mail. It says you have to answer these questions. Part of the
decennial umbrella. That really results in a very good response rate and very
good report rate for income items. There is underreporting is also as Connie
We have done comparisons over the years with the National Income and Product
Account. We get about 90 percent of the NIPA benchmark. But that is kind of
misleading because earnings are 82 percent of income. They were about 98
percent on earnings. It kind of pulls up the number a lot.
Transfer incomes. The results are mixed. We do pretty well on permanent
transfer incomes like Social Security and SSI, not so well on means test and
transfers. Property income and pensions are probably the worst returns of
underreporting. Underreporting is pretty consistent across surveys. With some
exceptions SIPP is better on transfers. ACS and CPS seem to do better on wages
Here are the imputation rates over time. You see it is from the late
‘70s to the present time. The income imputation rate, that is percent of
dollars imputed has increased from 21 percent to about 35 percent.
The big increase has been among people who are — there are types of
imputation besides those who just refuse to survey. There are the ones where
you just impute the items where they took part in the survey, but didn’t answer
the income item, mostly the amounts. And then there are times when you have to
impute the whole supplement. That rate the whole supplement imputation rate is
relatively stable over time. The big change is item imputations which are good
because you know a lot more about these people. You generally know the week’s
worked, their hours worked, their occupation. You have a lot more information
about the items imputes. The fact that that has grown over time is actually —
it would be much worse if it was the other way around.
Some observations about imputations. As I said before the ACS imputation
rates are much lower than the CPS. We have been doing some joint work with ASPE
on matched data. We have these SSA earnings records which are the data that
kind of goes into the SSA master earnings files. It is a good independent
source of learnings data. We have been looking at these data matched to the CPS
to see if there have been any biases with imputations. Our initial work with
the poverty status — and we actually found surprisingly that things — that
when you substitute these SSA earnings for reported earnings and imputed
earnings in the CPS and you look at whether or not somebody’s poverty status
changed. It is 94 percent of the time poverty status didn’t change. It is only
about 6 percent of the time poverty status changed and it was pretty even. It
was about 3 percent — it was not that much different for the imputed cases. It
was kind of encouraging although I think there is a lot more work to do. When
we get to the top and middle of the distribution, I think we will see probably
more of an impact of imputation.
And by echoing Connie’s statement that you should think about using a broad
measure of income or poverty when you are coming up with these measures that
the kinds of things we have been doing for the supplemental poverty measure and
using a broader definition of income that includes non-cash benefits and taxes
and even spending on necessities is a good idea. It is something that you
should think about even though it does entail more data collection.
And the Census Bureau is coming out with — there is a paper that is going
to be presented at the population association meetings that will show some SPM
poverty rates based on the ACS. We are starting to do some work along those
DR. MAYS: Thank you. What I would like to do is to open it up for questions
and comments. Anyone on the phone? Any of the members first? Why don’t I start
with Dr. Green?
DR. GREEN: Connie, I would like to go back to a paraphrase of what I heard
you say and you fix it. You basically said there is no reason to tolerate the
idiosyncrasies of the different approaches to this. Would you like to expand on
that further —
DR. CITRO: I would qualify it slightly. You don’t want to have the straight
jacket where people cannot, for instance, experiment with better ways like to
reduce nonresponse. But when I hear things like MEPS has this questionnaire
structure because they didn’t have the money and I understand. There are budget
implications of this. But I do think that we do have surveys that come out with
different things like different poverty rates. Health insurance coverage
actually has been one of the worse in that regard and you would like to — if I
could use another analogy. In the employment statistics there are often
different rates from the CPS employment and the business payroll survey. There
are actually reasons for those to be different. They are very different
measurement instruments. One is going to household and one is going to the
business establishments. There has been some really interesting work comparing
But for things like health insurance coverage, income that are all on
“household” surveys where the universe, the basic mechanism is the
same I think that it serves the users much better and the policy community if
they can be pretty confident that things they are seeing that look strange are
not just because some survey decided to go off completely on its own direction.
DR. GREEN: I would like to invite the rest of the presenters to dissent from
that if you want to.
DR. CITRO: — be great things and one survey is doing that ought to be fed
into another. It shouldn’t be, for instance, that whatever the Census Bureau
does is totally the gold standard forever and always. But on the other hand it
shouldn’t be I think that we throw out fairly time tested questions just
because another survey thinks that they don’t like them for some reason, but
That is true also we saw in education although it is not quite such a big
deal perhaps. There are other kinds of standard background, covariate
variables, whatever you want to call them that you would really like to have
some confidence across surveys. Again, you don’t want to turn a house survey
into an income survey and you don’t want to turn an income survey into a house
survey, but there is some basic cross cutting that can be used where again you
have tested that this minimum set actually works pretty well. That kind of
thing. And that would be useful.
DR. GREEN: Were you about to dissent?
DR. SELDEN: I think that what we have in MEPS is probably not the way I
would design the income section if I were doing that. But I think there are
issues. Is public assistance more important for MEPS than for some other
surveys? You need to get more information on that. If we were told that we
needed to use the ACS questions, maybe we wouldn’t feel as good about this.
There are also questions of comparability over time. There are questions
about whether in the context of MEPS would the ACS work as well as it does. We
do not have the Census requirement that you must respond.
DR. GREEN: Let me jump in here and react to that. Where do you guys get
together to ask and answer this type of question? You have your different
surveys from different bureaus, different agencies. Where do you come together
to talk this out and decide what you think is the best way to go? Where is the
location? Who calls the meeting?
PARTICIPANT: Connie is the coordinating center.
DR. CITRO: We can take that on if we get asked to hold a meeting. The
coordinating center for the Cisco System is the chief statistician’s office in
OMB which is about six people. It is very highly stretched. But I will come
back to that.
But I did just want to say. I said a minimum set and then also maybe a
medium sized set. That is the MEPS set, the medium sized set, not just the ACS
because you are a health care expenditure and how people are using their
resources. But on coordination I do think it is a problem for the statistical
Committee on National Statistics is a standing unit of the National
Academies. We do things from — it is not a government organization. We have a
congressional charter, but we are an independent nonprofit and we are chartered
to respond or request from the government. And we have often done workshops,
panels on disability measurement or that sort of thing, but we don’t have any
ability to do other than suggest.
The chief statistician’s office is the official — the Paperwork Reduction
Act, the questionnaire packages go in. Again, they are very stretched. And
there are interagency groups that get formulated on this. I think in some areas
like income measurement I think there may have been a tendency just
historically to think we know what we are doing here and we are getting
something. But with the new poverty measure and with the Affordable Care Act
provisions it seems to me it would be very nice and I also include health
insurance coverage there to have people on the same page so that some of these
basic important variables can be more consistent.
DR. MAYS: Let me raise a couple of questions that we would like to hear a
little bit about. One of the things that we want to get a sense of is who these
measures work best for, who these definitions sometimes make it very difficult.
I want to raise two. One is about household and how the household is defined.
We now have a very different situation as the economy gets worse. We sometimes
have unrelated families in a household. There is this growing LAT problem,
living together, but apart. I think that came up in an earlier session where we
are talking about co-residents, not co-residents.
I would really like to get a sense because it seems like this is a very
traditional and I may be wrong, but very traditional definitions of household
or whether or not you all are looking at as America is really changing whether
or not the definition of household that is related to asking about income need
DR. NELSON: In terms of poverty there has been some work on redefining the
definition of poverty. It has always been a family definition. Official poverty
uses the Census family so that if you are two unrelated people living together
they are counted as two separate units for poverty purposes traditionally.
Measures like the SPM use a broader definition of family so cohabiters are
treated as one family unit and the children are also treated as part of that
DR. MAYS: In terms of your cohabiters, what do you do when it is same sex? I
know the Census has a coding problem with that.
DR. NELSON: CPS — I believe you can report same sex —
DR. MAYS: It becomes a family then.
DR. NELSON: My understanding is that it is. And certainly people have looked
at going all the way and using a household definition although we haven’t done
that and we are not sure that is such a good idea, but researchers have. And
certainly a broader unit of family makes a lot of sense because people share
resources and the poverty status should reflect the fact that they are sharing
DR. KAPLAN: Maybe this is best address to Dr. Dahlhamer but all of you.
There was sort of a hint I think in your presentation that net worth may be a
valuable question — giving up on it or not giving up on it, but it is hard to
DR. DAHLHAMER: A couple of things. One with the field test that I described.
Those questions weren’t the primary focus, but we use as an opportunity to get
those on there and see is there any value added by including these questions. I
think the short answer to that was yes, but we would have to do a much better
job from the design standpoint of collecting that information. And the question
there was do we go into detailed follow ups on those wealth questions on top of
what we are collecting for total family income on top of what we are collecting
for personal earnings. I think there were some tradeoffs in terms of the amount
of detail that we would be adding to the income section. For that reason we
tabled it. We set it aside.
I am not suggesting it is not something we couldn’t return to because there
were some promising results from that data. Maybe that is something we revisit
with an upcoming redesign we have in 2015. At the time I think it was primarily
just a matter of what can we do right now. Our primary focus is on total family
income. Let’s make those changes. And we can revisit the wealth questions at
some later point in time.
DR. KAPLAN: Just a comment. There has been a lot of interest in this in the
epidemiology community and that it is never really — actually this was before
the economic downturn, but this sense of how much of a buffer you have and how
stressful it is to get close to that zero point. It would be good to get some
— there is a little bit of a hint of it in the women’s health study.
DR. MAYS: Dr. Citro, did you have a comment?
DR. MADANS: I just want to add to what was said about to Dr. Green’s
question about when do we talk about this. I think Connie was a little maybe
negative on that. We were going to give Chuck an office in HHS because he was
over so often talking about how to measure insurance. There is a data council
within the department that does talk about these issues. We always invite
Census when there is a topic to discuss. We don’t always reach a consensus on
changing questions, but certainly have come, I think, a lot closer on the
insurance working together across the HHS surveys and the Census surveys. I
know there has been a lot of discussion about things like income.
There is also the federal committee on statistical methodology which is
related to the chief statistician’s office, but not necessarily done by her
staff although they lead it. And there are a variety of subgroups out of that
and things kind of come up about issues that the federal statistical community
should deal with and a lot of them are about these. What do we do about the
warring estimates which relate to the difference in the questions — have that
meeting where there was disability, income and employment where we went through
those. There are venues. We do talk about it.
I think our preference would be that we were closer. There are these kinds
of traditional ways and people don’t like to — was it you who said the way you
measure change is don’t change anything. And there is this big push that if we
change the way we are doing some of these things and it is not a really good
reason to change it then there goes all of our trend data. I think we tend to
do the same thing. We add rather than change and then before you know it you
have the 3-hour HIS.
DR. NELSON: It is also tough too when you have these surveys that just have
a different structure and they have different reference periods. Sometimes we
see these HIS questions and we say we could probably do something there. But
then by the time we sort of make it look like CPS it is a different question.
There is more coordination than you think. We are both aware of what we are
doing, but sometimes translating that into making a change. There are only so
many opportunities to make changes and then the other parts of the surveys are
different. It is harder than — it sounds easy — coordination. It really is
tougher than that.
DR. MADANS: I think we have also made some progress especially disability
was a good example. You have that. We have changed some questions and we have
analyses across the data systems. CPS is household based. Does anyone in the
household — it gets a little tricky.
DR. BREEN: Thank you all for great presentations and this was really helpful
I think in understanding what is going on with income. I didn’t thank the other
speakers for their great presentations too, but all the presentations have been
extremely good. Thank you so much for coming and helping us out.
As I was listening, I was trying to do a comparison in my head. Charles
Nelson gave a nice comparison of the CPS and the ACS questions. It is kind of a
similar table that Tom provided for the MEPS question. Tom, you weren’t here
when Jim Scanlon spoke and he said that this would be a floor, not a ceiling.
If there are eight questions that ACS asked — it looks like you are asking
those questions, but you are just doing a lot more than those eight questions.
It wouldn’t limit the MEPS to have these eight questions. It looks like you
might want to respond to that.
DR. SELDEN: I think that the thing with MEPS is whether and there has never
been any money to even now let alone pretest it to post test it to ascertain
whether asking questions at the couple level which we do for a lot of people.
They are married filing jointly. Whether asking income sources of the couple
and then having them break them apart whether that is a big challenge. Maybe it
is. We don’t know. Otherwise, yes, we are sort of like ACS. We are somewhere in
between CPS and ACS. We perhaps use maybe more use of family level screeners
for things like SSI to reduce respondent burden. I think we are in some sense
already there except for this one thing which would require a substantial
change in coding which would be — there needs to be money attached to a
recommendation for that to occur.
The other thing is to bear in mind that the ACA is taking full effect in
2013. If someone were to change their questionnaire design in a fundamental way
that interrupts the ability to look before and after health reform, I think
that would perhaps be quite unfortunate.
DR. MAYS: I think it is —
DR. CITRO: You are absolutely right on. If I had my druthers I would say to
the appropriators every major survey and we are talking about major surveys
here that have been going on for a long time needs funds for what amounts to an
ongoing methods panel where you do experiments and it is just part of the
program. And also where there is both money and techniques for bridging series
because you can — which is done — I think the unemployment rate or the CPI
makes some major changes they absolutely have to do some overlaps and so on.
And if that were built in we could get out of this paralysis that sometimes we
get into of you can’t change things because it is going to upset the time
series. And there are ways to deal with that. But you are absolutely right.
They require resources.
DR. BREEN: Thank you for that. That is probably a good idea. Maybe a joint
one is a matter of fact or one that is —
You talked about the best time to measure income and this came out
especially when Tom was talking about MEPS being designed to use the tax forms.
I guess if I were thinking about this and answering this question, I think you
would probably get the best answer out of me if I used my tax forms. But then
you said the previous 12 months from the interview is more consistent with what
ACA is going to want. What is your current income? If I understood, well.
I am just wondering can you all weigh in on what would be the best approach
for the timing of the measurement of income in these surveys for the ACA of
DR. NELSON: Well, certainly people — I think there is pretty good evidence
that people have a good idea of their income on a calendar year basis. It does
not work as well for transfer payments as it does for earnings and income.
Asking about previous 12 month income and a survey you take throughout the year
to me that is an operational decision more than a data driven decision given
that — you don’t want monthly income because that is too variable. You want
income to be measured over a long enough period that tells you somebody’s
economic status. You have to take the survey throughout the year because
operationally that is the only way you can handle this size survey. Then that
kind of leads you to ask about previous 12 month income. If you could do a big
survey every year and ask about previous calendar year income, I would probably
do that, but if it is only for income. Obviously if you could build an annual
dataset from asking shorter questions would be even better, but that is very
costly. That is what SIPP does. Clearly that is probably not something you
could do on a big survey.
DR. KAPLAN: If I understand, aren’t you just giving up one quarter? By April
15th people —
DR. BREEN: It depends on when you are interviewed. Most of these surveys are
continuous throughout the year.
DR. KAPLAN: You would probably have poor information for those given in the
first quarter. By April 15th most will have completed their tax
returns for the last year.
DR. BREEN: I didn’t know if it made any difference. That is the question.
DR. DAHLHAMER: I don’t know that I have an answer to that question, but I
will throw another conundrum into your question and that is what we are leaving
out here is how you define the family and when you define it. For example, with
the HIS we collect income for the previous calendar year, but we define the
family at the time of the interview. The further that you get out from the
prior calendar year, the greater the chances that that roster has changed over
time. When they are thinking about income from a prior year, are they
conflating that with current members in that family roster and how does that
affect the quality of data you are collecting? That is also another issue.
Do you use some type of a fixed household composition at the end of the
prior data year for the collection of income and then something different for
collection of other data? Those aren’t easy questions to answer. I know that in
the Czajka and Denmead report they did some simulation work looking at that.
Looking at the time lag from the point at which the interview was conducted
from the prior calendar year and they did see some minor impacts on poverty
rate, but they were really minor, less than 1 percent where they simulated.
They would get an increase the number of persons in poverty the further you
moved out in terms of defining the family from the previous calendar year.
There has been some work done on that. I don’t think I have a good answer for
your initial question, but it is even more complex than that.
DR. MAYS: Let me move us to our last comment before we go on break. I just
want to say that I think this issue of how family is defined is probably the
crux of a lot of making sure that it is working and that we can understand
comparisons. Dr. Hout, let me give you the last comment.
DR. HOUT: I have comments on net worth and also relative timing. First, on
the relative timing, since that is what we are talking about. There is a
similar time match on occupation. You ask about what did you make last year,
what did you do last week, and how many hours did you work last week. You can’t
actually back wages out of that even from the wage and salary income. That is
perennial. That has always been there. There are the outgoing rotation groups
at CPS that allow you to jigger that a little bit to study wages per se. If you
are measuring socioeconomic status, you always have that time mismatch there.
I am a big fan of getting something on net worth because I think that
people’s socioeconomic status particularly later in life is better indicated by
people’s net worth than by the occupation they retired from a decade earlier.
It is very hard to measure reliably. People don’t know what their house would
sell for next week if it were to sell because they are not going to sell it
next week and that is their main asset. That is extremely hard to do, but I
think it is really important, and then to tie the two ideas together.
There is some work by Dalton Conley and Mel Oliver and Tom Shapiro that
indicates that parents’ wealth is less important for kids’ college education
than grandparents’ wealth. And yet it is not co-resident grandparents. We
measure net worth of the household and yet this kid is going to a school they
can’t afford. How are they doing that? That is because grandma thought about
that 20 years ago and set it up and so that things are working out for her
granddaughter. She has never been under that roof and so we miss that all
together. They took advantage of PSID’s wealth module that built into the
design and where else do you get money and how else are you paying for it. It
was a specific thing on where did all the money to pay for Jane’s education
In these household-based surveys where this sphere of bringing — you assume
everybody who brings money into the household is in the household. You miss
DR. MAYS: I want to echo what Dr. Green said. I want to thank you. You
notice this is one of our largest panels because it was one of the conundrums
that we knew we had to really get as much expert input as possible. I want to
thank you immensely on behalf of the group because this will help move us
along. We may come back and ask you did we get it right and ask you a few other
questions. Don’t be surprised.
Let me let the group go and let us take a 10-minute break. We should be back
at what time? 3:35.
DR. MAYS: We are going to get started with our next panel. Thank you very
much for being here. They are great troopers in doing this because a couple of
them we got kind of late. We are very appreciative of it. Sherry, are you going
to join us up here? Thank you very much. I think we are ready to go. We are
actually going to start with Dr. Gottschalck. He is with US Bureau of the
Census. He is in the division on social economic bureau of statistics division
— housing. Social, economic and housing statistics. Thank you.
Agenda Item: Panel: Occupation
DR. GOTTSCHALCK: Thank you. First, I want to thank Dr. Mays for inviting me
to speak. She saw my presentation at FCSM or knew of my presentation. This is a
presentation of the same findings, but geared a little bit more towards the
purposes of the panel’s questions. This is work I co-authored with some of my
colleagues, Braedyn Kromer, David Howard, and David Hedengren.
In the 2008 American Community Survey we made changes to the questionnaire
that modified and improved the existing questions for several subject areas. In
particular, we improved the labor force questions to better capture data on
This paper will detail some specifics of the employment status question
change. And we will also present comparisons to BLS data both the Current
Population Survey and the Local Area Unemployment Statistics program for the
years 2007, 2008, and 2009.
Just a preview of our results and our findings. We believe the modifications
that we made to the ACS series of questions had the effect of increasing the
number of employed persons captured in the 2008 ACS. As a result of those above
changes, we find now that the ACS data is more consistent with the benchmark
BLS data from the CPS and the LAUS.
Some background. In January through March of 2006, the ACS conducted the
first field test of modified content since the ACS reached full implementation.
One objective of this field test was to test the new series of questions,
measure employment status by addressing several limitations that we identified
in past research in terms of the question wording.
Past research revealed that these employment levels were underestimated and
unemployment levels were overestimated relative to the CPS and the LAUS
program. Three of the employment status questions were modified whether the
respondent worked last week, was temporarily absent from their job, and was
looking for work. All three of these questions get at very key concepts in
measuring employment status. I will discuss these a little bit more.
Obviously whether you worked last week is the driving force, the driving
question to this concept. Whether you were there full time or not is also very
important. But even more so is were you looking for work. And the key issue
here is whether you were actively or passively looking for work. When I show
you the revised set of questions, I will discuss that a little bit more. But
essentially BLS to be officially unemployed you need to be actively looking for
work. Even if you are looking through the want ads, that is not considered an
active search. It is a passive search. When you see the revised questions, I
will point out how we address that.
The findings from this field test showed that the test questions produced a
higher estimate of employed people compared to the control. The test questions
did not produce a lower estimate of the unemployed, though the overall
unemployment rate was lower. Given these positive findings, the revised
questions were implemented in the 2008 ACS.
One of our first questions was who answered this revised sequence of
questions. As I had mentioned before, past research had found we were missing
some employed people. And primarily it centered around this worked last week
question. And findings from this field test and other research indicated that
we weren’t completely capturing those workers that had a marginal or irregular
attachment to the labor force.
On the left hand side you will see the 2007 version of the employment status
question series. On the right hand side you will see the 2008 question series.
As I said, the driving question is number 28 which is whether you worked last
week. And in 2007 you can see we had a very large text portion of this question
where we were still asking whether you worked or did any work for pay last week
and then a yes/no categorical response.
In 2008 we broke out that parenthetical reference where we highlight the
fact that we consider work even if it is only for one hour in that particular
week. What we found was in our research and our test people were missing that
parenthetical. They weren’t considering even one hour in a given week as work.
What we did in 2008 we broke that out into two separate pieces where we still
asked in that first part last week did you work for pay at a job or business.
Yes or no. If you said no then you would skip to the second piece where we
would emphasize last week did this person do any work for even as little as one
hour to emphasize that hour component.
The other questions get at as I was talking about whether they were on lay
off, temporarily absent, and so forth. And then in question 35 where we asked
has this person been looking for work during the last 4 weeks. This is where
this actively versus passive concept comes into play. And in 2008 what we did
is we emphasized the fact that we considered only active search as a positive
response to this question and coupled with the questionnaire be it mail or CATI
or CAPI there were health screenings indicating what do we consider active
Back to question 28. What we are really concerned about then is who answered
this question in this sequence. Who said no to the first part and then who said
yes to the second? These “marginal” workers are regular workers who
just have a minimal attachment to the labor market.
When we looked at that data, we had data from our internal files. We didn’t
have a true control to control to because we didn’t administer the ACS in 2008
with two different versions of the questionnaire. That would be the ideal
situation. But what we can do is we can see how they answered those two pieces
from our internal data. We only release a composite employment status —
variable in ACS and all of these questions go into feeding into that ultimate
But we can look at the two pieces and see the characteristics of those. In
2008, there were approximately 1.2 million people who were considered marginal
workers who fell into this category. The demographic and economic differences
between these two groups were consistent with characteristics of workers who
may work temporarily or having changing work schedules. A prime example of this
was those who were enrolled in school and self-employed, those types of
arrangements. We were happy that we saw that in the data. It met our
The next step was then to try to get a handle on how those questions
changed, impacted our data relative to BLS’ benchmark data. The BLS — they
produced the official statistics for employment and unemployment data for both
the nation and the states. Our goal is not to replace that in any way. Our goal
here is to say can we compare, how do we differ, where are areas we can still
improve on. We used those official estimates to compare back to. And we did
that comparison at national level and also at the state level.
On the left hand side you will see a graph that displays the
employment-population ratio. And then on the right side you will the
unemployment rate and this is at the national level for the years 2007, 2008,
and 2009. The year 2009 is presented as this is the second year where we have
had this question sequenced in the field. An additional question is is this
questionnaire, this question sequence more or less behaving as it did in 2008.
In other words, we didn’t revert back to our 2007 pattern.
As we can see in 2007 we had a sizable difference in both the
employment-population ratio and the unemployment rate. When we instituted the
question change in 2008, we saw a significant narrowing of that gap — still
different, but now much closer and in particular with the unemployment rate. We
really saw a narrowing there and we were much more in line with BLS data. And,
again, in 2009 we see that that relationship that narrowing of the differences
is still clear and consistent.
And the second piece is at the state level. ACS is very powerful in regards
to providing estimates at the state level and below. And also BLS has state
level estimates from their last program, the Local Area Unemployment Statistics
that we can compare to. And, again, we are presenting the employment population
ratio and the state unemployment rate. Now these are the number of states that
are different from each other when we compare between the two sources. I think
this graph is even more striking in terms of the impact of the question change
where in 2007 in terms of the population ratio we see very large numbers of
state differences. We see those gaps now narrowing much less. But what is even
more dramatic is the unemployment rate where before we were most everywhere
across the board significantly different from BLS data. But when we instituted
the question change, we cut that in half. And, again, in 2009 we saw a nice
In terms of additionally explaining some of these differences which was
highlighted by Chuck Nelson, the ACS is administered at three different modes.
It is administered by mail. It is also administered by computer-aided telephone
interview and also personal interview. It is not entirely FR administered. The
CPS though is. The FR and CPS can help the respondents understand concepts and
get them down to correct skip patterns. For the ACS in terms of the mail
questionnaire the skip patterns as you are filling out not every respondent is
perfectly following every skip pattern. You do see some inconsistencies there.
And not everyone will refer to the materials explaining concepts and so forth.
Again, we are going to differ.
But I think what can be taken away from this is that the ACS has a very
limited set of questions that ask about employment status. ACS really only has
about eight or so questions that get at this depending on skip patterns. But
the CPS asks a very large battery of questions, again, depending on skip
patterns and so forth. Roughly 16 — 20 questions. One of the really key
differences you could say about BLS is data is the CPS, how they are collecting
about active search is they will give — what was the type of search that you
were doing. For example, if you said I was just looking through the job ads.
That is not considered active. That is part of the list.
With that being said the ACS in a very limited way is able to get at some of
these same concepts and has very limited set of questions, but I feel we do a
very good job of that.
In conclusion, we feel the ACS labor force data is now more consistent with
the benchmark data from both the CPS and LAUS program administered by BLS. We
believe this revised worked last week question is capturing those additional
workers that we were potentially not identifying before.
In future work what we hope to do is essentially we have a preliminary
project where we are going to be linking some of our ACS data to administrative
records to see whether we can identify additional characteristics of the people
who ask that two-piece question of whether you worked last week or not to see
if we can get further information about their income, their other
characteristics to see if we can do a better job even capturing those
individuals. Given that I think we have made great progress with the question
change. And, again, the ACS has lots of strengths in regards to the geography
and the sample size and being able to come up with disaggregated estimates that
potentially CPS can’t get at to the sample size.
DR. MAYS: Thank you very much. Our next presenter will be Dr. Melissa Chiu.
She is also with the Census Bureau. I am going to ask you because I am without
my internet access is if you will tell me the branch that you are the chief of.
He put his on his slides. Just so that we are very clear.
DR. CHIU: I am the chief of the industry and occupation statistics branch.
Just to give you an overview of this talk. This talk will be focused on data
collection of industry and occupation statistics at the Census Bureau mainly on
the ACS. I will go through a little bit of the context, how we collect industry
and occupation information, how we code and classify it, how we edit it. I will
touch briefly on data quality, on studies of data quality, and some
considerations and recommendations for standards and best practices. I know
that there are a lot of slides here, but some of these things I will go through
rather quickly, but I wanted you just to have the slides for future reference.
For the history of I&O data, and by the way I retake industry and
occupation together. We treat them at the same time whenever we process and
collect the data. That is why I will talk about both of them together. Industry
has been collected continuously since 1910, but it was first collected on the
1820 census decennial. Occupation has been collected continuously since 1850.
And class of worker has been collected since 1910. Basically, these questions
describe the work activity and occupational experience of the American labor
force and it is tied into a broad swap of other items that are on the decennial
census and now moved onto the American community survey.
There are broad many uses. People formulate policy from these. Businesses
use it to measure compliance with antidiscrimination policies, for instance,
and develop business plans. They figure out where to put their ATMs, all sorts
of things like that. And then of course researchers use it to analyze social
and economic issues.
Just to give you an overview of what we do on the ACS to our I&O data.
First, we complete the questionnaires. We collect it. And we have three modes
for that. We capture the data after the data have been given to us. We clerk
code it and then we edit in post processing.
Starting with the three modes of data collection we have paper. The ACS is
still currently — about two-thirds of our records come from the paper mode.
Then we have computer assisted telephone interview, CATI, and we have computer
assisted personal interview which is CAPI. And just a heads up the internet
mode should be rolling out — we are planning to have it rolled in 2013.
Everything that I have right now is basically through the 2012 questionnaire.
The 2012 ACS paper questionnaire, you have probably seen this before, but
basically 3 million addresses annually, 21 housing questions, 5 demographic
questions, and 42 population questions. We are pretty close to the end right
before and after labor force and before the income questions.
Now our universe is a little bit difference. Even though Dr. Gottschalck’s
questions ask about employment and current employment, et cetera, we actually
ask for industry and occupation for anybody who is 15 years and older now and
who had a job in the last 5 years. You had to have held that job last week. We
only collect one job. We ask you to report the job with the most hours. If
there is an equal number of hours that you have worked on two jobs, then we ask
the one that you most recently reported to. And if you weren’t working last
week then we ask you for the most recent job in the last 5 years.
Class of workers, about the type of ownership of the organization. Industry
data, it describes the kind of business that is conducted by a person’s
employing organization. We ask for the name of the company as well as what kind
of business it was. Then we have a check box version of that.
Occupation. We ask what kind of work was this person doing and we also ask
for the person’s most important activities or duties.
All of our questions are standardized and there is a rigorous change process
for development of questions and for changing those questions.
And then in addition with the collection we have a rigorous process for
developing the instructions to the respondents as well as to our field
representatives who we call FRs. For industry, for instance, we ask them to
describe the major activity at that place. For occupation we ask for
descriptions more specific to descriptions like a registered nurse, a personal
manager rather than using less specific things such as nurse, teacher, manager,
et cetera. We try to get as much specificity as we can.
Job duties. Directing hiring policies, typing, filing, et cetera.
For the field reps there is a little bit more play there, but we have very
strict extensive training for them and strict procedures for our FRs for
interacting with the respondents. We describe to them the purpose of the survey
questions. We give them instructions on what kind of information is sought so
they really get at the concepts that we are trying to get. The importance of
the accuracy and getting more specificity. Multiple-word descriptions rather
than single-word descriptions. We emphasize that the accuracy of the text
affects the I&O codes and classifications that are assigned.
We tell them to probe. Could you be more specific? What type of clerk? What
kind of engineer are you? What field of research, et cetera?
After the data are collected then we create what we call a data capture
file. It is basically anything that came in via mail is key from a scanned
image of the questionnaire. We have about 45 keyers daily at our National
Processing Center in Indiana. We have keying rules. They are supposed to key
exactly what is seen including misspellings and foreign languages. This is also
where we have illegal value responses identified such as multiple check marks
and invalid entries. Our I&O items are truncated to 60 characters, but most
of the time that is plenty. Most people don’t even come close to making 60
Here is an example of what we actually receive at the end of the day. You
will see a misspelling over there under child care. Under occupation you will
see all jobs that needed doing. It is not very specific. About two-thirds of
the way down it has something — an entry in Spanish. When you have coders,
which I will talk about in a minute, we do have Spanish language coders working
— but if there is any other language we have great difficulty with that.
I know that our telephone centers have multiple people who can speak
multiple languages, but for coding we only have people who code in Spanish.
Part of the reason that we can support the Spanish is because we do have the
Puerto Rico community survey which goes hand in hand with the ACS.
The next step is I&O coding. Currently all ACS I&O coding is
computer assisted clerical coding. We have 200,000 ACS cases a month. This is
huge. This is a huge task and a huge project. But starting next week we will be
starting an auto coder. We are pretty happy about that. That should take off
about 30 percent of our cases right there and change our consistency in a
different way. It will give us a different problem to think about in terms of
consistency and standards.
What is the clerk coding process as we have it right now? First it goes to a
first-level coder. They could be qualified or unqualified. Basically what that
means is does a person have a certain amount of experience or not. And then it
goes to a dependent verifier if the person is not qualified. And then for
tougher cases they can punt it to a referralist. That person has more resources
and tries to get the best code. And usually they are the people with the most
experience. And then the last step is that we have quality assurance coding and
that is done on all records.
The information that we use to code I&O we have age, sex, date of birth,
educational attainment, the residence county and state, whether or not they are
active duty, class of worker, employer name, kind of business, the industry
type, the kind of work that you do, and your job duties. We take all of that
and the person looks at all that and puts it together for both the industry and
Where do these codes come from? Another standard. All our Census I&O
codes are designed for our household surveys. It is a different situation than
some of the business surveys, for instance, that can really collect really
refined information. As I showed you a few slides back with the write-ins, the
specificity that we get we can’t always collect everything. We can’t get all
the detail to get it down to the full six-digit SOC code. The degree of
aggregation does vary by sector or major group. We update our codes with the
NAICS and the SOC changes.
Let’s take the industry codes first. The NAICS is the North American
Industry Classification System. It is a standard that all federal agencies must
use whenever they collect data on industry. The NAICS classify establishments
based on the similarity of their production processes. Our codes cover all 20
sectors. Each of our codes will be somewhere between a two-digit sector through
a six-digit code. It varies. It depends on which industry it is. We end up with
269 Census industry codes. I believe that the NAICS has around 1200 codes. That
is how many codes that we have to boil it down into for industry.
This just gives you an idea of the industry code crosswalk. I won’t go
through that now.
For the occupation codes we have another standard. It is a federal standard.
Both the SOC and the NAICS — we have chairs and a board for that, but it is
technically headed by OMB. Again, all federal agencies that collect occupation
data have to follow the SOC in some form or another. Our Census occupation
codes are based on the SOC. The SOC classifies work primarily on the work
performed, the work and the task that you actually on the job, not necessarily
training. Only in very rare instances is training and education taken into
Our codes do cover all 23 major groups. And, again, the detail goes from the
two-digit codes down to — some of them refer to six-digit codes on the SOC. We
end up with 539 Census codes. There are 800 and something codes on the SOC.
And then in addition to all this information and to the classifications we
have an index that we maintain regularly. This is for use by the coders. This
is from the ground up kind of a process. Whenever new records come in and the
coders say I don’t know what to do with the French teacher then they look it
up. And if there is a French teacher in there then they know that it goes to
7870. There are many teachers that could be going into code 7870, but
specifically if somebody writes in that I am a French teacher we have that
reference. Whenever we see something new, we always add it to the index so the
coders have that reference for them.
After all the data has been coded, it comes back to headquarters. And then
we do a post processing of data edits. We do universe check. We do some logical
edits. And we do some imputation. Here is an example of an edit for lawyers. If
you are employed and you are a lawyer and your educational attainment is less
than a Master’s degree and you earn less than $930 a week, we are going to
downgrade you to a paralegal or a legal assistant. We do not upgrade anybody in
our edits, but we do some downgrades.
For incomplete data we do assignment from donors’ hot deck. And then at the
very end of everything we do a consistency check that evaluates a consistency
among all the labor force, education, and income data.
Here is a summary of all the standards that we do for the I&O data. We
have collection instructions and interview procedures. We have keying
procedures for the mail out/mail back responses. We have occupation and
industry classification systems that come down to us from OMB. We have coding
procedures and quality assurance. We have editing and consistency checks. We
also have a rigorous process for question changes. Dr. Gottschalck was talking
about their most recent content test from the 2008 content test cycle. And in
that we have a pre-test which is a cognitive test, a small sample field test
which we call the content test, a questionnaire design test, and we have
instrument tests. We have to do all of those things in order to get new content
or revised content onto the ACS.
Now this is a process that we use on many surveys at the Census Bureau, not
just on the ACS, but we also use it for the CPS, the Survey of Income and
Program Participation, and some other reimbursable projects that we do
including the National Health Interview Survey.
And some considerations for when you want to use a data source. Just some
things to think about is that ACS has a larger sample. It will give you the
most complete distributions on occupation and industry. If you are looking for
a small occupation, ACS will do better for you. On the other hand, it has
breadth of topics. It has more general usage. And it has very good geographic
coverage as well. But it is not a health survey. It is good for broad usage,
good for occupation because of the sample size, but it is more breadth of topic
than specific to health.
I point out the Survey of Income and Program Participation because this is a
survey that we do where we collect multiple jobs. It is longitudinal. We
interview people every panel every 4 months over 2 to 4 years depending on
which panel it is. If you are interested in occupational transitions, here are
some other things to think about.
On a longitudinal survey you will have respondent consistency issues. One
day I might tell you if you were to ask me I might tell you I am a sociologist.
Four months later you come back to me I might say I am a survey statistician.
You have that inconsistency is introduced. And then the inter-coder reliability
issue. During the coding there is definitely with some occupations there is
definitely interpretation that has to happen. Those interpretation issues are
amplified during a longitudinal survey. If you want to study transitions, you
want to see that somebody has had a job change. You might want to be aware of
that because if there is a job that has multiple interpretations and neither
one is wrong necessarily, it might be an artifact of the inter-coder
reliability issue rather than a real change in position or occupation.
I will touch briefly on establishment surveys. We don’t work on it in my
branch. But establishment surveys might be a little bit more consistent in
their reporting. They won’t have any activity in the informal economy however.
It is not a household survey. The establishment survey will be above the board
kind of establishment. You might get more on the organizational context, but
you will get less on the person level.
Specifically for the occupational employment statistics the OES program
which is run by the Bureau of Labor Statistics — just to let you know they
have wage and salary, workers only, but they don’t have any self employment and
they only have one job reported per worker as well. Those are just some things
to think about data sources.
And then some considerations for health survey specifically on collecting
occupation and industry data. There might be some things to think about how
well you can really link occupation and industry to health outcomes. There
could be specific on-the-job injuries versus long-term disease onset. The
person may no longer be in that job if it is a long-term disease onset versus
on-the-job injuries. Some occupations may or may not be prone to very acute,
specific event that happened on the job.
Then there is a distinction between occupation and job and whether you want
the current job, the most recent job, the longest job you ever held, the most
time that you ever spent on any job, et cetera. Whether or not you have
multiple jobs and occupations. They present different hazards. If you want to
observe all of those things over a life time that would be some task.
In addition, I have only been talking about occupation so far. As I have
shown you on the write ins the kind of information that we get is not specific
enough to be on specific tasks and activities. Nobody says I lift 50-pound
boxes every day at work every hour. They might say I move boxes, but they might
not tell you how heavy those are. You might need additional questions on
specific tasks and activities that might be relevant to health. And then you
have to ask yourself also whether or not the data you need is really
collectible and standardizable.
As for industry, this is not necessarily a substitute for workplace
conditions or exposure on the job to hazardous chemicals or anything like that.
It also does not measure organizational structure or dynamics. You don’t know
how hierarchical the organization is necessarily by industry. You might not
know how much autonomy that person has on the job, how much stress they really
have because of the organizational structure and dynamics within that and how
work is actually really happening in that organization. Industry is not a proxy
for that either.
Finally, if you want more information, there is our branch information. I am
the branch chief of that branch of industry and occupation statistics. And my
boss is Jennifer Day who is the assistant division chief for Employment
DR. MAYS: Thank you very much. Dr. Baron is with CDC, NIOSH, National
Institute for Occupational Safety and Health.
DR. BARON: Thank you very much for inviting me here. When I first was
invited, I thought I was going to be in the unenviable position of trying to
convince you about the utility of using occupational information. But I was
very gratified in the morning’s presentation that there were a lot of very
positive comments talking about this.
What I would like to do in this presentation is to give you a little bit of
overview. You have heard a tremendous amount of detail about collecting of
occupational codes. What I would like to do is first talk about what are some
of the major approaches that have been utilized for taking these occupational
codes and using them as elements of SES and SEP. This will reiterate many of
the points that Dr. Hout talked about this morning.
Also one of the question you were interested in is are there differences by
gender, education, race/ethnicity or income in the quality of the occupational
survey measures. I present to you a small amount of data on some of the studies
that we have been doing related to that. And then conclude by talking about
what I think are some of the most important survey items to consider.
The first issue is what are the approaches for examining occupation as an
indicator of SES and health studies. I show you a number of different
approaches which aren’t necessarily mutually exclusive, but take a variety of
different approaches to doing this. And many of these were discussed this
morning by Dr. Hout. I will be going through each of these very briefly.
The first approach is to use socioeconomic status as a relative rank in
social hierarchies. And in this case occupation is used as either a reflection
of education or scales and income and also a reflection of social status. We
heard about that very eloquently this morning. I will just give you a couple of
examples of how this has been applied in terms of health studies.
The first is the idea of using occupation as an indicator of income and
education. I use here an example of the Nam-Power Score. In this as was
described this morning, what you can do is you can take these occupational
codes from the Census and you can match it to the educational information and
the income information from incumbents in those jobs. Sociologists have
developed index using that information and rank order people on a scale of 1 to
99. These kinds of scores have been developed using Censuses from 1980, 1990,
2000. I assume they will be doing one for 2010 also. Some examples of this; a
doctor scores very highly because they have a high income and high education.
Plumbers are somewhere in the middle and maids are at the bottom.
This is just an example of a study that was done using these Nam-Powers
Score. This was done by a colleague of mine at NIOSH. And what he did was look
at all cause mortality by SES using the Nam-Powers Scores. And he divided them
into quartiles. You can see that those people at the lowest quartile had twice
the all cause mortality rate compared to those in the highest quartile.
Another example of this approach also that we have heard about this morning
is this idea of occupational prestige. In this case occupation is used as an
indicator of social status. Again, we were described this morning how
individuals through the General Social Survey are given a set of occupations
and they rank order them in terms of their perception of the social status of
those jobs. And then they take all these rankings and put them together to
generate a score that goes from 0 to 100. The last update was in 1989, but
again we heard this morning that they are going to be doing a new update of
that with updated scores.
This is just to give you an idea of the difference the Nam-Powers and the
prestige scores. Things like a registered nurse or a minister actually scores
better on a prestige score than might be predicted based just on their income
or education. But then other people like bill collectors and sale persons who
have lower status in society compared to the education and income score worse
on the prestige score.
And, again, this is just a study that uses these kinds of prestige scores
for health outcomes. It was again done by one of my colleagues. What she did is
look at occupational prestige as a predictor of self rated health looking at
self rated fair or poor health. What she did was she assigned occupational
prestige scores using these occupational codes. And what she found is that
incumbents with higher prestige scores were less likely to report fair or poor
health. And then also when she adjusted in the models for differences in
demographic variables, income, census occupational categories, and work stress
measures that are collected as part of the general social survey, she found
that this relationship still held up between prestige and lower rates of
reported fair or poor health.
The second approach to using these codes is something that was talked about
this morning, this idea of occupational class. I would like to just explain to
you a little bit of the concept of occupational class. What this does is it
considers differences in employment conditions and relations within a
workplace. This may be related to education and income and social standing, but
is not directly a measure of it. What this means will become clearer when I
explain one of these systems.
The biggest and most important system that uses this kind of classification
is the United Kingdom’s National Statistics Socioeconomic Classification System
or what is called NS-SEC. This is the system that is used for all of the basic
vital statistics within the United Kingdom. For this system in addition to the
occupational codes there are three other variables that they need. One is
whether a person is an employer, self-employed or an employee. The size of the
organization so differentiating small work places from larger work places and
also the supervisory status of individuals.
What their system does and again this was described briefly. This is the
reduced five categories. There is also a slightly expanded seven categories.
But each of these is what is called different classes and these aren’t meant to
be hierarchical. It is just dividing the workforce into these various kinds of
classes. You have managerial and professional, what they call intermediate
which are kind of technical, administrative support, financial services, et
And then they separate out small employers or the self-employed and what
they call lower supervisory with the idea that these are different from
management personnel because they don’t necessarily have the same control over
assets, resources, decisions, but maybe making some decisions in terms of
supervision. And at the bottom they put what is called semi-routine or routine.
Those are kind of production workers, service workers, et cetera.
And this is just one of the many statistics that the UK system puts out.
This is life expectancy by NS-SEC class for males’ life expectancy at birth.
You can see this works quite well for them differentiating the different
classes in terms of differences in terms of life expectancy and also changes
The idea of social class in US statistics is really a very unexplored area
as also was mentioned this morning. Things like self-employment as we heard
from several of the people talking is pretty much universally collected in most
federal surveys, but it really hasn’t been incorporated in health surveys and
could be quite important. In fact, one out of nine US workers in 2009 were
And then the other thing is this idea of supervisory status really hasn’t
been explored at all in health studies or in statistics. This may be because
people believe that supervisory status is subsumed under the management
category, but that might not necessarily be the case.
I am just going to show you a few statistics here that might peak your
interest of some of these factors. This shows you rates of self-employment. You
should concentrate on the red line there which is what is called the
unincorporated self-employed. And this shows the rate of unincorporated
self-employed by education level. You can see here the people with the highest
rates of self-employment are actually those with the lowest education level
which may go counter to what people normally think of as the self-employed.
In terms of supervisory status as I said this is very under explored. We
have been very interested in looking at this to see if it is useful in terms of
socioeconomic status. We have incorporated it into our recent collaboration. We
have been working with an NIH-funded longitudinal study that is at the
University of Alabama and it is a study that is being set up to look at stroke
rates. We added an occupational module to the annual telephone survey. The
study includes 30,000 subjects. They are all over age 45. It includes large
populations of African Americans and Whites and is a national distribution.
We included the question that was drawn from the NS-SEC questionnaire. Do
you have formal responsibility for supervising other employees? This is the
result of 11,000 respondents to their longest held job. What is very
interesting here — you see it across the bottom there. Those are all the
different occupation categories that Melissa was talking about that you can
aggregate these occupational codes into. And what is interesting for this is
how much supervisors are really spread across all of the different occupation
categories. While management has the highest percentage, we find supervisors
within all these different categories and therefore suggest this might be
something interesting to explore.
The third approach in terms of thinking about occupation in terms of SES is
the actual content of the work itself. There is a wealth of occupational health
research demonstrating the impact of work exposures on health. And specifically
there have been studies that have tried to look specifically at the
contribution of work exposures to explaining health disparities by SES and SEP.
There are many different exposures that people have looked at. There is
chemical, physical, biological, and safety risks, work-organization factors and
work stress, issues of long work hours and nonstandard work shifts which are
very common, the job insecurity or temporary work employment, and also as we
talked about this morning the fact that health insurance and sick leave is
something that comes as part of your employment relationship.
I just want to point out that many of these things are important not only
for the individual, but also for the family unit. If someone is working long
work hours, they have a stressful job that also spills over into their family
life and other factors in the community.
We recently held a conference last fall and we had some white papers that
were put together that specifically tried to address this idea of health
disparities and the role of occupational factors and SES. I show you the
website there that has the white papers and they will also be available as part
of a special issue of the American Journal of Industrial Medicine later this
Why do I really want to emphasize this idea of work content? The definition
of health inequities that was put out as part of the CDC Health Disparities and
Inequalities Report says that these are disparities that arise from modifiable
exposures associated with social disadvantage. The workplace exposures are one
of these modifiable exposures. I think the more that we can get at the issues
of work content and their contribution to health it is really one of the
opportunities for us to intervene and make some changes.
We are going to have a whole presentation after this on something called
O*NET. This is a new system that was created by the Department of Labor. It
replaces the Dictionary of Occupational Titles. It links to the Census
occupational codes. And it provides a wealth of job descriptors that covers
many of the various things that I have been talking about in terms of exposures
because we know it is going to be difficult enough to get you to include the
basic information to get occupational codes as part of standard surveys. The
importance of it is that there are linkages that are available that get us
information, more detailed information about the content of work that might be
interesting for health studies.
Examples of some of the measures that are included in O*NET are things like
exposures to contaminants, physical demands on the job like time spent sitting,
bending, kneeling, psychological demands that might be used in work stress
indicators, work hours and work shift, and there has been several studies that
have specifically tried to look at the construct validity and predictive
ability of the O*NET to compare to self-reported reports on occupational
exposures. And, again, we will be hearing much more about this in the following
One aspect of work content that I wanted to emphasize is the idea of
temporary employment. This is something that may be difficult to get from
linkages because the concept of temporary employment may actually have
fluctuations based on economic fluctuations. But there are many examples of how
to capture temporary employment on federal surveys. There is the current
population survey contingent worker supplement which has been repeated several
times. It doesn’t happen every year. But they collect information on temporary
employment and contingent employment. And it was also included as part of the
occupational health supplement of the NHIS. That was done in 2010.
This is some data from the 2010 NHIS. It shows you the proportion of people
who reported temporary jobs. I just want to emphasize here. You can that
African Americans, Hispanics, the young and the old, low income, those with low
high school were disproportionately represented in temporary employment. Again,
this might be something of importance to think about in terms of disparities.
The second issue I wanted to cover is a little bit on demographic variables
impact measurement of occupation because I know this is something you are
particularly interested in. I will give you a little bit of data related to the
quality of industry and occupation coding and also the importance of capturing
longest held job in addition to current job which we discussed before.
In terms of examining whether demographic variables affect coding quality, I
will give you some data from a project that we have been involved in and
another study called MESA which is an NIH-funded longitudinal study of
subclinical atherosclerosis. This study has very large proportions of minority
populations. It has White, Black, Hispanic, and Chinese. Again, it is an older
population, 45 years and older.
The investigators collected open text industry and occupation information
through a self-administered survey using the questions identical to what is
included in the Census. We received and coded that open-ended text for use in
This just shows you one of the surveys — an example of the survey. Again,
this is self-reported data that was done when they came in for their visits for
data collection. It just gives you an idea as Melissa showed. These are very
brief answers. As she said, 60 characters are more than enough to capture what
most people will tell you.
For each of the records that we received, occupation was coded independently
by two coders to the three-digit 2000 Census code. If the two coders didn’t
agree which we call discordant, the NIOSH lead coder as a quality check would
then assign the final code and that occurred in about 35 percent of the
All total through this process of double coding and the leader coder they
were able to code 99 percent of the records that had an occupational history.
This demonstrates that with this self-reported occupational data it was in fact
possible to generate occupational codes for most of them.
What we were interested in looking at is how often the two coders — the
reliability between the two coders. We wanted to see if things like race,
ethnicity, age, education, current working status, immigrant status contributed
to the fact that the double coders were more likely to give discordant
responses. And what we found that all those demographic variables had
absolutely no effect on the reliability between those coders. And actually the
biggest characteristic that led to lack of reliability was the occupations
themselves with some occupations being much more difficult to code.
The second thing in terms of demographic variables is looking at the idea of
collecting longest held job versus current job. This is data from the 2010
NHIS. What we can see is overall about a third of the individuals their current
job was different from their longest held job. That is a fairly substantial
portion of the population. But you can also see that that didn’t vary much
across demographic categories. And surprisingly even for the older population
65 and older they were a little bit less likely to have their current job be
the same as their longest held job, but actually not as much different as you
might expect that when people retire might have another job that has nothing to
do with their longest held job.
Finally, I would like to talk a little bit about a new initiative that NIOSH
has been doing over the last couple of years. It is the NIOSH Industry and
Occupation Computerized Coding System or what we call NIOCCS. It is a new
artificial intelligence system being developed at NIOSH that will automatically
code test data to create census industry and occupation codes. This was not
developed so much for federal surveys, but potentially could be used for
But in all these other kinds of community NIH-funded surveys where they
don’t have the personnel to do coding we wanted to make available the auto
coder to try and decrease the burden of people collecting this data. This is
one of the top priorities of our institute to get this auto coder developed and
out in the market for people. We are anticipating a beta version will be
available in the fall of 2012. The hope is that the first public use version
will be expected to be available in December 2012.
This just shows you the screen. It is kind of hard to read there of the
computer assisted coding. But you can see at the top that this person is a
daycare worker or they work in daycare. They are a child care worker. They work
for Kinder Care and they watch kids, change diapers, and give snacks. The
screens below that shows how for this combination of industry and occupation
the auto coder was able to select the right code and the fact that it is green
on the side says that they thought this was a good match. And on the bottom it
gives other kinds of occupations that might be similar in case someone looking
at this didn’t like the match it gave to it.
I am a little reluctant to give you some results of the preliminary testing
of this because it is like the cup is either half full or half empty. I want to
emphasize that this isn’t ready for prime time yet. But we have been testing it
with both death certificate data as well as survey data. We have about 15,000
death certificate records that were coded by manual coding. The system was able
to auto code about 87 percent of the records and the accuracy as comparing it
to the code given by the human coder was 74 percent. On the MESA data I talked
about before which again is self administered survey data, it was able to auto
code 73 percent with a 67 percent accuracy. But we have many more algorithms
that are going to be entered into this system. We are very hopeful that this
will be available, again, perhaps not for use by federal surveys that have the
kind of state of the art coding as Melissa described, but might make it much
more available for many surveys to start collecting this data.
In conclusion, what should be measured? We feel very strongly that including
standard census industry and occupation questions are important, if possible,
including both current and longest held job would be important because of the
third of people who have different jobs.
The coding burden we hope could be decreased through this computer assisted
or auto coding system. The use of these codes will be made much more useful
through linkages with work-related exposure systems like O*NET and other kinds
of databases that will be available. As someone talked about this morning, it
may depend on what you are looking at in your health survey. The linkages that
becomes pertinent to a particular survey. You may use linkages for one kind of
outcome and other kinds of linkages for others.
Other kinds of questions that we might want to think about that have been
included in other kinds of surveys or where linkages are available is this idea
of self-employment which is available on many, many federal surveys already.
The importance of work hours, which is included as part as NHIS and work shift
which was included in the 2010 occupational health supplement. Interestingly,
about 30 percent of the respondents to that survey worked a nonstandard work
shift. It is becoming increasingly common in our country for people not to work
between 9 and 5. That could have big implications for health.
Other things are temporary employment so getting at that idea of job
insecurity. And then, again, recognizing the fact that health insurance and
sick leave is not a national policy, but tied to employment and maybe something
important to capture as part of someone’s job.
And then this idea of supervisory status, again, hasn’t been well explored,
but may be an area for future evaluations in terms of its utility especially
thinking about the British idea of social class.
I would just like to thank all of my colleagues at NIOSH who helped generate
data for this. Thanks.
DR. MAYS: Thank you very much. Our next presenter is Tom Plewes who is with
CNSTAT at the National Research Council. If I didn’t say your name properly,
please feel free to correct me.
MR. PLEWES: Thank you. I really enjoy being the last speaker of the day
because if you speak really fast and end early, it is much appreciated by the
audience. I will keep that in mind as I go through my presentation today.
My title is on the first slide as shown to you on the hard copy. The
important thing you should know is I work for Connie Citro. We have you covered
today. Thank you very much.
And the other thing that you might want to know is that in a previous life I
was the associate commissioner of the Bureau of Labor Statistics in charge of
employment/unemployment data where I helped developed the Standard Occupational
Classification system and the NAICS and things we have heard about today. I
kept this interest.
These are the things I am going to talk about today. My job, I think, is to
take Dr. Baron’s two slides and expand them into 28 and get that done in about
10 minutes. We will see if I can meet that. What is the Occupational
Information Network that we have been talking about here? It was launched back
in 1998 to replace a thing called the Dictionary of Occupational Titles. The
Dictionary of Occupational Titles is a huge document. It had been around for
many years. It was developed during World War II. In fact it started to develop
during the Great Depression to define occupations and help match workers and
jobs — 15,000 jobs. They were collected and updated by a crew of folks who
went out and did on the job searches and looked at what people were actually
doing and came back and put them in a document. It got old real quick because
it was hard to maintain.
The Occupational Information Network was developed back in the 1990s and
introduced in 1998. It was to provide information for career guidance,
reemployment, counseling, workforce development, and by the way research. And
that was kind of an afterthought. There was really very little done in the
beginning of the thing that would allow us to focus on the kind of questions
that you are asking and the system was developed not with this in mind. And
some of the characteristics of that are talked about.
It has two components: the content model, which is really a framework for
occupational data which I will show you in a second and then a very large and
very useful searchable electronic database. It is fun to go in there and take a
look at it every once in a while because you can find lots of interesting
things, not much useful, however, for what you are trying to do.
Here is the content model. Six domains: worker characteristics,
requirements, experience requirements, and then occupational requirements,
workforce characteristics tossed in on top of it come from other sources
outside and occupational-specific information all gathered from a process which
is a very structured process unlike the old process of going out and knocking
on doors and collecting information on occupations.
It is a continuing data collection program that populates and maintains the
database. It is conducted by an outfit called the National Center for O*NET
Development down at NC State through its contractor Research Triangle
The information is collected in a two-stage design. They use this
occupational and employment statistics program that Melissa talked about to go
out and identify where they might find occupations that they are interested in.
Then they go out and statistically random sample businesses expected to employ
workers in the targeted occupations. And then they go out within those
companies and do a random sample of workers in those occupations to ask a
number of questions.
There are four standardized questionnaires. There are several hundred rating
scales that come out of it so you know that those questionnaires have got to be
big. And the questionnaires — each contain different sets of questions. The
sampled incumbents for each occupation are randomly assigned to fill out a
questionnaire. You have four different questionnaires of a population that
comes into the company. You have company characteristics on top of that. In
addition, the respondents are asked to talk a little bit about their tasks and
provide some general demographic information. No income information is
The fifth questionnaire focusing on abilities is done later by a group of
occupational analysts using the updated information from incumbent workers. You
have a fresher overview over the whole process before it goes into the database
by trained and qualified SMEs who do this business. It is a very structured
process and by the way the data as they exist in the form in which they exist
are actually pretty good.
Here are the kinds of questions that land on a person’s desk. I just
selected this one — static strength. You can tell right off that there are
probably a few issues with the cognizant aspects of these kinds of questions.
For example, at what level of static strength do you need to perform your
current job? Is it more like pushing an empty shopping cart or pulling a
40-pound sack of fertilizer across the lawn or lifting a 75-pound bag of cement
into a truck? You have to think about that a little bit, don’t you? Some of
those kinds of things are bringing into question some of the concepts that we
were talking about here. But that is how it is done.
We did a study. The O*NET was 10 years old back in 1998. We were asked to do
a study and we were asked to look at all these things. And we did actually look
at research issues and the kind of issues, but not this issue. Here is the
report we did. I didn’t bring copies of the report. It is available on our
nationalacademies.org website under the reports and you can just download a
free PDF copy of it.
One thing that you will see if you do it and you do what I did yesterday is
to go through it and do a search on socioeconomic status and you will come up
empty. We didn’t talk about it. It wasn’t one of the research issues. In fact
the research issues we talked about in chapter 9 if you are interested in going
into it had to do with understanding human resources issues and understanding
labor market issues and not much there for you.
Rather than go through the report as it was let me just see if — I tried to
distill what we learned about the strengths and weaknesses for the purpose that
we are talking about here which is a measurement of socioeconomic status. Here
is what it does. First of all I mentioned before it is theoretically informed.
It is research driven and it is pretty good in that way. It provides a common
language across all jobs. Everybody looks at it in the same way and it is
hierarchically organized. It includes both work activities and worker
requirements. They talked about the content model and all those other things
that go with it are available to talk about each particular job.
It has multiple types of descriptors and worker descriptors are collected by
different ways and job incumbents and are appropriate for the different
domains. There are different ways of doing it. No one descriptor comes from any
one source. There are multiple sources. There is a redundancy to the process.
There is a rigorous sampling plan, high response rates. RTI does a fantastic
job on the survey quite frankly. A strong business participation because they
are interested in getting results basically. They use this information for
their own workforce purposes and frequent updates. The database is national in
scope. It provides metadata which a lot of the databases don’t talking about
what information is supporting each of the data items. It provides mechanisms
to collect new and emerging occupations. Most recently on green occupations
they have been focusing on that. Methodology allows occupational description
beyond the most detailed occupational level of the SOC. We will get into
aggregation issues in a second here. And it is vetted through the OMB clearance
process like the surveys you just heard about today. It is a pretty good
It is really easy to use. It doesn’t cost you anything. It is wonderful in
these times. It minimizes burden to the public. Standardized information within
organizations. Reduces costs of information gathering. You can read the rest of
it. Those are the positive parts of it.
There are a few problems. What are the weaknesses? There are 812
occupational categories. They are too broad. The data is not specific enough
compared to the 12,000-job DOT. Actually there were about 15,000 at the peak.
And they are really not useful for some of the applications like selection and
other applications. It may be a little bit too broad even for some of the kind
of uses that we wish to use here. Descriptors are very general. They are not
sufficiently differentiated. The context analysis may not provide sufficient
information and there are some issues in some of the descriptor elements.
There is a lack of real data currency. And although I mention that they do
focus on new occupations in capturing new and emerging occupations like the
green occupations, the fact of the matter is that there is a leg there. It is
very difficult to find out what is going on in this great economy of ours in
terms of being able to capture the new occupations and yet we would think that
if you are trying to measure socioeconomic status the newest and greatest is
probably the ones that you really want to start focusing on because that is
where the economy seems to be heading. It is really underrepresented in terms
of information operations and those other cutting edge kinds of things.
It leads to under sampling of small establishments. And of course it doesn’t
have — Melissa talked about — it doesn’t have self employed, part-time or
contract employees, which is of course a very important part of understanding
socioeconomic status. It is only US workers. I talked about I think all of this
There are aggregation issues. I want to hasten this. When you talk about
occupations and you talk about occupation classifications, you are talking
about a whole bunch of systems. We would like to think about we are talking one
system, the standard occupation classification system. But you have the O*NET
which is a little over a thousand occupations. It collects data on 965. It has
data in 820. You have to start thinking about what you have in the database.
The SOC has 840 occupations. The Census Bureau, we just heard, has 536. It
has added occupations, but there is no clear picture of what an occupational
total is for you use as a benchmark for distributing it. If you are coming with
information on socioeconomic status, you are coming with information that is
occupation specific and it is very difficult to aggregate those things to a
meaningful total in this regard. You can aggregate the Census Bureau to a
meaningful total within the Census Bureau’s definitions and the O*NET within
its definition and going across is very difficult.
We came up with a number of recommendations. We need to change the
occupational classification system. Unfortunately the committee decided that we
really didn’t know enough to make that decision. They just kicked it down the
How are the data used for socioeconomic analysis now? As we know the primary
research datasets don’t have information on job tasks and skills. We have heard
plenty about that in the previous presentations. Researchers have imputed task
requirements from O*NET to personal level observations from surveys. There is
an imputation and a lot of guess work involved I would guess in doing that. The
work I talk about here is work by Autor and Handel only because David Autor was
a member of our committee and he talked a little bit about this in his work.
They found by doing that that the tasks that workers perform on a job are
significant predictors of hourly wages. Job tasks vary substantially and that
there are relationships to race, gender, and English proficiency that can be
teased out of the data. It has been and can be used at the level of analysis of
the labor market by assigning two O*NET factors to occupations.
The other kinds of studies have to do with classifying O*NET job titles into
SES strata based on educational requirements and responsibilities. Again,
making assumptions about that, because there is no database that allows you to
do that from the original data that is collected from the respondents. But if
you do that, for example, you can find — D’Errico found that the analysis of
work injury risk by SES showed higher risk in lower SES categories. And that
O*NET can be used as a job level source for psychosocial exposure information
on health care jobs. That was the first Cifuentes’ article and it was done in a
Massachusetts nursing home. And then he went on to do the work that we referred
to just a minute ago and I will talk about now.
There are still other studies that use data collected by O*NET for job
classification for analysis of health issues. In other words, what is the raw
data collected by O*NET? How do you assign that to other survey results? There
was noise exposure work, predictive validity for health outcomes, and overall
it was documented that there are about 28 studies that have used work exposures
where they had health and safety outcomes. And what they do is they make
assumptions about socioeconomic status. They applied it to the O*NET data and
then they apply that to outside survey sources. That is the only way you can do
it. There is no direct way you can get to the issue.
Those are the references. If you need any more information, here I am. I
accomplished my goal. Thank you.
DR. MAYS: Thank you very much. What I would like to do is to open this up
for questions. We will start here at the table.
DR. LUCAS: Jacqueline Lucas from NCHS. I just wanted to make a comment in
support of something that was mentioned this morning and that Dr. Baron just
mentioned in her presentation about the importance of maintaining some sort of
detail in surveys on the occupation information as collected and really
thinking about how the data are going to be used in analysis because one of the
things that we are doing right now with the National Health Interview Survey is
looking at Hepatitis B vaccination in Asian populations.
One of the things we wanted to control for in our analysis was occupation
because there is some risk of occupational exposure to Hepatitis B. And it is
not the direct focus of our analysis, but something we needed to control for.
And the fact that we have the level of detail that we have in the National
Health Interview Survey on occupation were able to isolate those particular
occupations for which there is that risk of exposure.
It has lots of different kinds of applications not just in looking at
occupational health, but in looking at other types of things where occupation
is important. I just wanted to offer some support for encouraging that standard
to be when we are thinking about it to maintain the detail because once you
have the detail, you can apply it in all kinds of ways and I think that was the
take home message that was delivered this morning. With occupation in
particular maybe a little bit more is better than less.
DR. MAYS: I am going to remember that when we come to the surveys. I would
like to actually ask a couple of questions. What are the occupations that have
the most difficult in being coded in terms of do they fall into categories? For
example, in California we are very technology driven and those are frequently
coming out and the nature of those jobs change. And then for people who are in
— and I don’t know the extent to which you measure that more of this kind of
in and out of the job market, but have more informal types of jobs. They are
working a cart. And they do that for a certain number of hours versus something
else. What is the extent to which you get what we probably call informal in
your written in and code it?
DR. BARON: I think actually informal isn’t necessarily that much of a
problem in that the sense that people who are in the informal economy tend to
be like what do you do. I cut grass or something. Their actual job codes and
titles are often very easy to code. It is just that their jobs aren’t very
steady. It is making that distinction. Melissa can talk about this much more.
In our coding we actually in this MESA data include in our analysis we did a
qualitative assessment of all those cases where there was discordance between
the two coders to try and find out what were some of the factors that led to
discordance in terms of trying to correct that.
From that I think the main issue that came up was that when people don’t
describe their job, use a job title that is well recognized and gets into —
they have a very large index. That includes a whole lot of job titles and it
keeps getting expanded every time they get a new one. But people use all sorts
of ways to describe their jobs. If it is something that doesn’t show up in the
index then you have to rely on the job duty description. It depends on how good
a job the person does in their job duty. For example, they might say their job
title is I am an MS2SP or something or some company weird job title. Then you
are completely dependent on their job duties to be able to assign the job code.
And that depends on how specific they are in their job duties. I don’t know if
you want to add to that.
DR. CHIU: Specificity is again one of our — as well. Somebody says I am a
teacher. Then we like to give it a code that codes to specific level, middle
school, elementary, et cetera. But if they don’t give us the specificity, we
upgrade them as much as we can. We usually try to code them to the highest
level. But if they just say that they are a doctor, we do our best with what we
have been given. They might not go into the attrition, but they would at least
go into the general doctor’s —
As for the informal economy, some of it we know is informal economy. People
do respond — prostitute, drug dealers, et cetera. And we have a code for
those. Drug dealers are street vendors. Again, we do our best to enter every
new occupation into our index so that people can refer to it so that they can
be as consistent as possible when they encounter something like that. Being in
the informal economy is about how you are paid rather than more than anything
else necessarily. It is not necessarily linked to occupation per se. They can
be construction. It could be massage therapy. These are all potentially
informal economy — cleaning houses, et cetera. It may not be criminal
activity, but there are many things that are not criminal activity that are in
the informal economy.
DR. BARON: One of the things I just want to emphasize too is it is a little
bit — we want to be as precise as possible. We want to get as close to the
right code as we can. I also think it is important not to get too hung up as
long as you are close. There are many different kind of clerical workers. If
you get the slightly wrong code, the reality is in terms of their job tasks.
They may not be all that different.
That is actually one of the things that we have been looking at is in some
of these things where it is not exactly clear which job they might fit into.
You can look at things like if you did one versus the other and say you wanted
to predict physical activity levels at work or how sedentary the job is, it may
not make a difference in terms of that exposure measure which one of those two
jobs you are doing because they are pretty similar. That is one of the things
that you get into when you have 530 or 1000, 102 — you may not have to get to
that level of precision if the difference between the jobs is not all that
different from a health perspective.
DR. MAYS: But I think the issue is from a socioeconomic status perspective
it has to do with that status and prestige and where it is in society. If the
clerical person — this is really what I am struggling with less the exposure.
If a clerical person says I am a clerical person at a daycare center and the
other person says I am clerical person at the White House, which is like the
way that person is treated and what they have access to is probably very
different. I am really trying to understand within your coding and how you
approach it how do you account for the issue of prestige and status?
DR. BARON: That is a tough one because I think with the prestige score it is
not going to take into account whether you are the president’s secretary or the
secretary someplace else because they would have pretty much the same job code.
One of the things you can do is there is the combination of industry and
occupation. It is possible to look at — if you are a clerical worker in a
service industry, you can separate that out from a clerical worker. That is a
different kind of industry. That is one way of getting at that, but it is going
to be imperfect.
DR. CHIU: We don’t take anything procedurally into account except for
education. We don’t even have income when we code.
DR. MAYS: I think there are some questions. Dr. Green.
DR. GREEN: I have two completely unrelated questions. I don’t think any one
of them is particularly fair. I really would like to know what you think about
them. Here we are at the end of an extraordinarily interesting day — drinking
from a fire hydrant here all day about measuring socioeconomic status. We have
these three categories out on the table and you happen to draw the first straw
or something and wind up at the end of the day with occupation. What is your
thinking about the essential additional contribution that occupation brings to
measuring socioeconomic status that is not captured about education and income?
DR. CHIU: Stress, and perhaps physical burden on the job. If you are a
laborer or if you are a mine car operator, for instance, some things are very
clearly related to certain exposures. If you are a chemical engineer, perhaps.
Maybe that is actually more to the health issue than the socioeconomic issue.
We do spend many hours of the week at work. And some of us bring it home even
after we have spent those hours in the work place. Stress, I think, is a new —
I would imagine would be a new stress level. It would be a new thing that would
come from occupation specifically because workplace stress meaning workplace
stress. And the physical demands of the job — I don’t know if they have
DR. GOTTSCHALCK: As an economist I would say the duties associated with the
occupation lends itself to getting as Melissa says the additional level of
stress, the additional level of intrinsic effort that the individual needs to
exert on the job and how that impacts their life, their family, et cetera.
DR. CHIU: Just to piggyback on that. I can’t remember the researchers’ names
off the top of my mind, but I do remember having read studies back in the day
about how people who were in blue collared jobs on the line had different
behaviors at home and they ran their families differently. People who were in
professional positions and white collared positions tended to treat their
children — like if something bad happened, they would talk about the intent of
the act or non-intent of the act versus a lot of the blue collared workers who
worked on the assembly lines would go towards the results of the consequences
of the act. Don’t quote me on that necessarily. It has been a long time since I
have read those studies, but I do distinctly recall those linkages between
occupation and family. The causality might be endogenous, but there are
relationships that are correlated there.
DR. KAPLAN: In O*NET or any of these other systems, do you actually have
coding? Do you code for stress?
DR. BARON: Job stress is often made up of a variety of different components.
There are psychological demands in the workplace, decision latitude, how much
control you have over your workplace, social support. Those are all dimensions
and people have — there is something called job strain. You may have heard
about that. People have actually constructed job strain scores from various
items within O*NET that mimic that.
DR. KAPLAN: — to code sleep — sort of sleep disruption shift work.
DR. CHIU: Shift work is one of the items. It will tell you for people in
that job did they typically work a normal work shift or an alternative work
shift. It gives you composite scores.
DR. BARON: I was just going to answer. I think it gets at why are we
measuring socioeconomic status for health surveys. If the only thing is to show
that there is a gradient across the society, you are going to show that with
income and education. We know that. It is there. But I think the issue gets at
we want to do something about that. We want to understand what is the cause of
that. If you just measure education and income then it might take you away from
certain, as I said, modifiable kinds of interventions. It focuses on things
like wage rate or improving education levels. There is a granularity across
occupations that may have similar education and income levels where there is
quite a range of exposures within people who are at the same education and
income level. You miss all of that if you don’t take into account what people
are actually doing in their job.
To me the key thing is when we think about coming up with policy or
interventions or programs that are actually going to decrease the SES burden.
We are not going to get at that if we don’t measure it and take it into account
in terms of how we formulate it.
DR. GREEN: Make a big leap here. I was taken in the O*NET discussion about
its capacity to be dynamic. I am blessed here by ignorance and not knowing a
whole lot about the rate of creation of new jobs or new job categories and
types. But in the world I live in I am impressed about how hard it is for many
people to explain what they do to their mother. The old titles just don’t seem
It strikes me as a very important issue going forward. If occupation is to
be a critical ingredient in measuring in a standardized way SES then there is a
very dynamic workplace where jobs are being created to do different types of
work in the information age particularly on that sort of stuff. What is your
thinking about how statistical systems are going to be able to cope with that?
It is sort of encouraging to look at the way O*NET is being constructed and
thought about. But I would like to hear from all of you. Stability of years
gone by, monitoring things. being able to relate knowledge to the knowledge
base that existed before, but then these new jobs categories. What is the plan?
MR. PLEWES: I think that O*NET is a very imperfect way of identifying new
occupations because you have to identify the occupations to go in and survey
the task within those occupations. In the old days the job analyst used to go
out and talk to employers. He used to ask them questions. Do you have any new
occupations that you can tell us about and so forth? That is not happening
anymore. You use secondary sources or you use the kinds of information that are
collected by the Census Bureau. You get some odd ball kinds of answers when it
comes up to occupations or occupations within industries that signal that you
need to go out and take a look at that.
Or you have a special review that they are doing right now with green jobs,
for example, because it is so interesting. They did one of health jobs several
years ago. Now they are doing one on green jobs where you just focus on that
and you go out and talk to various companies, find out what is going on. That
is probably the only way that you can do what you are trying to suggest you
want to do. It is very difficult within the context of the ongoing program to
identify what is going on new.
DR. CHIU: I should say that the SOC panel that OMB runs officially, but BLS
chairs it currently. But it is a panel of several agencies including BLS,
Census Bureau. We have members from the National Academies. For some reason I
am blanking. NSF, I think, education. In any case we have a military person in
there for the military occupations. That has been on the schedule of having
changes every 10 years. There is a cycle for a plan automatically for reviewing
occupations in which things should come in and which things should go out and
be collapsed. We no longer have buggy drivers, but we now have data
architecture and systems analysts in the SOC.
The next SOC is actually — we are going to shift the cycle to the 8-year
instead of the 0-year. The next one will actually be in 2018. The panel
continues to meet on a quarterly basis. We continually get requests for where
does my occupation go. People want their occupation represented by unique
6-digit code. Everybody wants — we get some of these new occupations. Things
like doulas. Do they go into a new occupation or not? Where do they fit? That
was something that came up last year, but we constantly have requests. I think
that some of these small occupations if they start coming to us often enough,
we are already gathering that list of occupations to review for the next SOC
revision. Currently it is early 2012 and we have already been gathering that
list for a year. And the revision won’t come out until 2018.
DR. MAYS: Do you think that that is frequent enough given that there are
some industries that within 8 years could start and fizzle out and then other
things start and they are there for a long time? Is your recommendation less
than 8 years?
DR. CHIU: The industry is done every 5 years.
MR. PLEWES: It is a trade off. If you want it good and new then you are
going to lose some statistical power. You are not going to be able to follow
what is going on within the occupations over time. I think it is a trade off. I
like the idea — 8-year updating. I think that is better than 10.
DR. CHIU: No, it will be 10 again, but switching from —
MR. PLEWES: That is the trade off the government has made quite frankly.
DR. BARON: All that is very important. I think one of the things about
occupation since it is tied to the economic and productivity statistics for the
government there is a priority in trying to keep up with new jobs. We can
benefit from that. I just want to also point out that the job with the highest
percentage growth and the highest number of new occupants projected over the
next 10 years is home health care workers. We have to balance the fact that
there are these new jobs coming in. In terms of numbers of workers probably the
largest proportion of workers continues to do jobs that they have been doing
for many years. We want to stay current, but we don’t want to let our fear of
staying current which may be a small percentage of workers mean that we are
afraid to account for the much larger proportion of workers who are in the
tried and true jobs that we can describe well and have been in our economy for
a long time.
DR. GOTTSCHALCK: I think one of the issues too is if you change too quickly
then you lose some of the opportunity to measure the churn. People moving back
and forth between the occupations. I think some of the power of a little bit of
a wait and see type of approach. You had more power. It goes to your point of
doing a time series being able to analyze over time.
DR. CHIU: Another point about data collection is that if we have — at the
Census Bureau we have a duty to — it is legally mandated that we protect the
data. If we have an occupation that is too small then we can’t show it anyway.
We have to aggregate it in some way in order to protect the confidentiality of
our respondents. That is one of the things about household surveys is that we
do have to do that. We do have to balance something about specificity and what
is meaningful and what we can actually show from the data confidentiality and
stewardship point of view.
DR. MAYS: Let me ask one other question. Just as Dr. Baron started off
thinking that we would be here today trying to really battle about occupations
and it looked very different. Can you talk about what accounts for the
difficulty of using the data? I think that is part of what people are really
struggling with quite often is the complexity that it takes to use this data.
Can you talk about that? Can you talk about changes you are trying to make like
the O*NET and the autocoder? Those are great. Are there any other particular
barriers or any other particular fixes that you can think of?
DR. BARON: For my sense the biggest problem has been unless you have a very
large study you have to aggregate into fairly large occupational groupings.
When you aggregate into a few groupings you tend to lose the granularity.
Therefore people have been using — there is the white collar service, blue
collar. They put that in and those groupings are so heterogeneous that it tends
to not have much explanatory power within a model. People say this isn’t very
useful. And then there are the 23 categories and that makes it very difficult
because you have to have a very large dataset in order to show your data with
that many categories.
I think the key thing is this idea of being able to impute to other
databases so that if you are interested in things like physical work demand.
For instance, in this MESA study we have — for those people who are current
workers they have self-reported physical activity. About half the workers were
current workers. We found excellent correlations between the self-reported
physical activity measures and the imputed measures. It allows people to rather
than grouping by occupations if what you are really interested in is sedentary
work or high levels of physical activity, a predictor of some sort of disease
outcome. You can get that from imputation.
I think moving towards trying to impute to the exposures that are of
interest particularly for health studies will improve the ability to use the
information as opposed to just these codes that are very difficult to work
DR. KAPLAN: Just curious. Is this where we are going in the future? For
example, in the current era where it is easier to use machine language or
natural language in all this to create multiple categories might we be going to
a time where people are able to generate different classifications and codes
for their purposes with the same database.
DR. BARON: I think that is a little bit what Dr. Hout was talking about that
they want to do with the GSS is once you have the characteristics of jobs can
you classify them into job groupings that makes sense given the particular
outcomes. If you are doing a mental health study, you may want to focus on the
psychosocial characteristics of the job. If you are looking at cardiovascular
outcomes, you might want to look at that, but you might also want to include
physical activities. There have been very good correlations between measured
noise levels and the imputed noise levels. That might be something of interest
because there is a lot of data on noise levels and various kinds of health
DR. KAPLAN: I have been impressed with the NIH database which we have made
public now so people can go in and learn about who is getting grants and so
forth. I have been impressed with how the computer people have generated a lot
of flexibility so that there is an infinite number of ways that you can cut the
data with the same database and that is something relatively recently we
weren’t able to do.
DR. MAYS: Any other comments?
MR. PLEWES: In the kind of research that I talk about at the end of my
presentation I am very impressed by the ability of researchers now to make
assumptions and to do the imputations we have been talking about that you allow
you to cross between the standard occupational groups and the job tasks which
are not collected in the standard surveys. I think that is kind of the way of
the future. We recommend in our report here, by the way, in chapter 9 a number
of things that O*NET could do to help out with that process, for example, to
make its micro data more widely available so that people could go back and see
what are the characteristics at least the collected demographic characteristics
of the people who report these various traits and job content. There are a
number of things that could be done to sharpen that over time. I am very
impressed by what they have been able to do and the common sense results that
come from this research quite frankly.
DR. GOTTSCHALCK: — to that same point. At least from my perspective what we
are doing is with the micro data to help less sophisticated users — codes that
say summarize someone’s employment status on a particular month or a particular
week. But we have also been doing a better job of giving the user the
individual pieces that are used to construct that. If they want to disentangle
how we construct it, they can do that themselves given better documentation,
better access to the micro data and things like that.
Another thing that really can’t be emphasized more at least from our
division perspective we are strong proponents of ourselves using the data then
going out and publicizing our findings, showing people it is usable. These are
ways of using the particular data publicizing the data sets. All the data sets
are more known to researchers in the general public.
DR. BARON: One thing we have been thinking about doing is tying the O*NET
database to the autocoder. When people send in their batch things for coding,
they could actually click. I would like to get physical activity, decision
latitude, noise exposure, et cetera. I could actually download to their
database the specific variables explaining all the limitations that go along
with it. Again, by making it more available to people, I think, we will see
people using it more and therefore will begin to learn more about the utility
DR. KAPLAN: If somebody is a roofer or a firefighter that is very high
importance where it may not be as — and be able to code that out.
DR. GREEN: Totally different question. Earlier in the afternoon, in the
discussion about the Census data and the size of that sample and its ability to
be used to draw conclusions down to populations as small as 20,000 or so, this
intersects with the population subcommittee work, last year around communities
and learning health systems and localities being critical to health and that
line of thinking. We continue to struggle with smaller analyses from all sorts
of perspectives not at the least which is privacy and security and all that
sort of stuff.
When it comes to occupation, what would your advice be to community leaders
who are — they start from the position of what can we do to make our community
healthier. What should they know about occupation and occupations in that
community? Is there any sort of guidance for those folks about what they should
be looking at and why?
DR. BARON: It is a big broad question. I think lots of things that might be
of interest in terms of providing services for communities. For instance, if
you understood that your community was made up of a lot of people who worked
nonstandard work shifts, you might want to make available daycare options that
are available for people outside of the standard daycare times. We have talked
about having local farmer markets and other sorts of things as interventions
for nutrition. Depending on the location and timing of those if your population
is one that can’t take advantage of that, it may not be useful.
I think just creating profiles for communities of who are the workers in
your communities, what kinds of jobs do they do, what are the kinds of risk
factors, where do they live might be very useful in terms of thinking about
targeted intervention programs.
Also, some jobs tend to be particularly in our society now this idea of
large collections of workers is less and less. If you want to do interventions,
it has to be what formerly might have been done as workplace-based
interventions have to be much more community-based interventions. It may even
be possible to develop job training and education and other kinds of programs
that are useful for your community particularly if they are freelancers and
others who don’t have the advantage of being in a centralized workplace. Home
health care workers are a perfect example of that because they have no central
work place. Actually having community programs if you have a low-income
community where a lot of individuals are working in that job, you could
actually have services for them. I think there are many possibilities.
DR. CHIU: As far as data for those communities go, I would have to say that
because the ACS is so broad we can link it to things like workplace geography.
We have information in the ACS not just on where you live, but also on where
you work. I acutally used to work on commuting data. We actually have received
many phone calls from people trying to figure out where to put the rural health
clinics, for instance. We have workplace geography data that is crossed with
occupation and industry separately.
We also have information on health insurance so a broad swath of — I don’t
know how many categories are on it, but maybe something like eight, somewhere
between five and eight categories of health insurance. Using that crossed with
occupation perhaps some industries you can link — that is the power of the ACS
is the large sample size getting down to smaller geographies may not be
perfect, but smaller geographies and then linking huge demographic
characteristics with different statuses such as health insurance, where do they
work, et cetera. What kind of job do they do? Where do they live? Where do they
work? You might want to treat those places differently. For instance, the
daytime population of Washington, DC is vastly huger than its nighttime
population. But you might want to locate your places, your service, your
business, your health care service in a place where people work not necessarily
just where they live.
We also have information on when people arrive at work. We ask it as when do
you leave for work, but we also ask commuting time. We back out when you arrive
at work. In that sense you can get a sense of when they arrive for work. If
they are working the late shift, they arrive at work at 11 p.m. or something
DR. GOTTSCHALCK: One particular example that we have gotten a couple of
times now is for power of ACS data in relation to development enterprise zones
where they like to have data down to the block group level of how many people
are employed or unemployed in a particular block group and then they can
construct their enterprise zones and then make decisions based on that. We have
had requests from state data centers for that. We actually had a request from
the Treasury Department when Obama was proposing last — not this current state
of the union, but the prior one — expansion of these enterprise zones. The
communities can use ACS at very low geography levels.
We are in the process of trying to find a way where we can release that
block group level data at least at the employment status item to give
communities that information and then they can construct their geography that
they feel is weld into their community.
DR. GREEN: How is that going getting those data released?
DR. GOTTSCHALCK: It is under review now. How long review takes that I can’t
DR. MAYS: Let me just follow up because you have two types of centers as I
understand it that are located out in the community. You have community census
centers, but as I understand it they don’t get funded by you. And then you have
a census center like we have them at UC Berkeley or UCLA where as a researcher
I can go in there and actually do some linkages. Can you talk a little bit
about the community data centers which aren’t funded even though — if I
understand correctly, you bring them in once a year. They get some training.
And their job is to help communities understand the data. You could talk a
little bit about where those are in terms of being able to access specific
And also whether there is any plan on the table in terms of linkages outside
of the center. Could I call you and say I wanted to link this data and you
would do it or is it only through your centers that we could actually get this
data to do all these linkages?
DR. GOTTSCHALCK: I am more familiar with state data centers and same thing
as the community data centers. Your second point about the linkages. I was at a
recent conference that was focused on administrative data and linking that to
survey data. One of the sessions was ultimately there should be no reason why
people cannot access that data via the internet from different part of the
country, access it, download it. Obviously you have issues of disclosure and
security things like that. But I know it is on the agenda of quite a few
individuals. Theoretically it should be possible.
The state data center is what I am aware of at least is there was a regular
conference, I think, in DC. It is specifically focused on local
employer/household dynamic type data where it is linked to that confidential
restricted data that you talk about in the RDCs. But the state data centers —
they use that a great deal for their own economic development programs.
Typically in the case of this it is usually the state department of labor
entities that use that. They meet regularly each year to discuss their uses of
the data what they would like to see in the data in terms of usability and
flexibility. Actually one of those conferences was the first time I heard about
O*NET because there was someone there giving a demonstration. I know that is
one vehicle that is used. But I am not familiar with the community data
PARTICIPANT: I have nothing to add to that. I am not familiar with the
community data centers either.
DR. MAYS: Let me take and see if we have any more questions. We kind of
abused you a little bit because you were on the end asking all our questions
because there is nobody behind you. But let me just make sure if my colleagues
have any more questions. Any questions in the audience? Thank you very much.
Again, this was very helpful to us. And as you can tell, we had a number of
questions that I think will be very high on our agenda to try and see if we can
come up with meeting our task. Thank you very much for your time.
DR. GREEN: Is there anyone on the phone? We will make some plans for
tomorrow. Otherwise we are adjourned.
(Whereupon, at 5:40 p.m., meeting adjourned.)