[This Transcript is Unedited]

DEPARTMENT OF HEALTH AND HUMAN SERVICES

NATIONAL COMMITTEE ON VITAL AND HEALTH STATISTICS

SUBCOMMITTEE ON QUALITY

THE MEANINGFUL MEASURE SUPPLY CHAIN –
BUILDING MEASURES THAT MATTER FOR OUR NATION’S HEALTH

October 13, 2009

National Center for Health Statistics
3311 Toledo Road, Auditorium A
Hyattsville, MD 20782

Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030
(703) 266-8402

CONTENTS


P r o c e e d i n g s 9:10 a.m.

Agenda Item: Introductions

DR. CARR: Good Morning. Welcome to NCVHS. The NCVHS Service is a statutory
public advisory body to the Secretary of Health and Human Services in the area
of health data and statistics. In this capacity, the Committee provides advice
and assistance to the Department and serves as a forum for interaction with
private sector groups on a variety of key health data issues. Within the NCVHS
Charter there are a number of roles and responsibilities. The Subcommittee on
Quality focused primarily on the first function outlined in the Charter which
is to monitor the nation’s health data needs and current approaches to
meeting those needs, identify emerging health data issues, including
methodologies and technologies of information systems, data bases and
networking that could improve the ability to meet those needs.

A second area of focus it to identify strategies and opportunities for
evolution from single purpose, narrowly focused categorical health data
collection strategies to more multi-purpose integrated shared data collection
strategies.

So I would like to pause here and ask the members and guests to introduce
themselves and then I will give an overview for the next two days. When you
introduce yourselves, could you also add a line about your role in your
organization or field as it relates to today’s subject? Please also state
if you have any conflicts.

I am Dr. Justine Carr, co-chair of the Quality Subcommittee and member of
the Full committee. I am an internist and hematologist and chief medical
officer at Caritas Health Care System, the largest community based health care
delivery system in New England. We are in year two of our EHR implementation
across six hospitals and 1200 physicians. Other than that, I have no conflicts.

DR. TANG: Paul Tang, chief medical information officer of Palo Alto Medical
Foundation. We have had an EHR implemented for ten years and I am co-chair of
this Committee on the Full Committee and no conflicts.

MR. QUINN: I am Matt Quinn from Agency for Health Care Research and Quality.
I am staff to the Sub Committee.

DR. MIDDLETON: My name is Blackford Middleton. I am from Partners Health
Care in the Brigham Young Women’s Hospital. At Partners, I am the
corporate director for Clinical Informatics Research and Development and I
chair the Center for IT Leadership and in both roles have an abiding interest
in quality measurement and the measurement of IT impact on quality. No
conflict, member of the Population Sub Committee and the Quality Sub Committee.

MS. JACKSON: Debbie Jackson, National Center for Health Statistics,
Committee staff.

DR. FITZMAURICE: Michael Fitzmaurice, Agency for Health Care Research and
Quality, senior science advisor for Information Technology to the Director,
liaison to the Full Committee and staff to the Subcommittee on Quality.

MR. REYNOLDS: Harry Reynolds, Blue Cross Blue Shield of North Carolina,
Chair of NCVHS, a visitor to this Committee. I spend a lot of my time on Health
IT policy and Health IT in general, both at my company in North Carolina and
throughout the country.

DR. CARR: So the ability to measure our nation’s health care system and
health will be crucial to addressing such national priorities as improving care
coordination health statistics. Today’s hearing is about meaningful
measurement, the supply chain of functions of choosing and building a measure
and the gaps that exist in the measurement of our four major national
priorities including coordination of care, health care disparities, health care
value and efficiency and population health.

The ability to measure key aspects of our health care system will be crucial
to addressing such national priorities as improving care and reducing
disparities, driving efficiency and value, and fostering population health. Too
often, however, ease of measurement has taken precedent over measuring what
matters. The availability of new data sources including electronic health
records as well as the heavy reliance on measures of meaningful use to allocate
distribution of funds under the HITECH provisions of ARRA, underscore the
relevance of understanding supply chain for measure development and
improvement.

So NCVHS Quality Subcommittee seeks to gain feedback from measure
developers, endorsers, system developers and reporters to articulate the
process whereby meaningful measures of quality are created, introduced to the
field and refined. In addition, we seek input and recommendations on the state
of meaningful measures in national priority areas.

So the goal of our hearing over the next two days is to address the
following four questions. One is, How do we approach building meaningful
measures? Two, What is the current process for developing measures and does it
adequately address measure development for key national priorities and
sub-populations? Three, How do we introduce new data sources, clinical data
from EHRs, user generated data, et cetera and how do we exchange them for old
measures based on administrative data? Four, How do we maintain and update
measures and what are the health IT system implications?

So based on the testimony, the NCVHS Quality Subcommittee will summarize the
findings, identify gaps and develop a set of written recommendations for
consideration by the Full Committee for transmittal to the Secretary.

I am grateful to our speakers for sharing their time and thoughts with us.
We are also very grateful to Matt for putting this all together behind the
scenes. So I encourage the Subcommittee members to listen carefully to the
speakers and engage in dialog that will bring meaningful recommendations on
meaningful measures.

With that, I will turn to my co-chair, Dr. Paul Tang, to share his unique
perspective. Since I read everyone’s bio, I will give yours. Paul is an
internist and chief medical information officer at Palo Alto Medical
Foundation, consulting associate professor of medicine at Stanford, vice-chair
of the federal Health Information and Technology Policy Committee and chair of
the Meaningful Use Group. He also chairs the National Quality Forums Health
Information and Technology Expert Panel and is a member of NQF Standards
Approval Committee.

DR. TANG: Thanks Justine. I thought what I would do is go into a little bit
more detail on the HITECH context for our hearing today.

As everyone knows, the Recovery Act provides an estimated maybe up to $46
billion in incentives to accelerate the adoption of HIT, in particular EHRs.
And there are four criteria to meet in order to earn the incentive. One is that
you use a certified EHR. Two, that you use it in a meaningful manner. Three,
that you exchange health information among EHRs and Four that you report
clinical quality measures as opposed to quality measures that are defined based
on administrative and claims data. So the HIT Policy Committee that was set up
in the statute is to recommend to the national coordinator, criteria against
which we would evaluate whether hospitals are eligible professionals would
qualify.

So we have a number of options. One is that we could take a structural
approach, meaning does the EHR have these features? Two, we could take a
process approach; are you using these features? And three, we could have more
of an outcomes oriented approach; are you getting any benefit from use of this
technology?

Because one of the criteria says that you need to use a certified EHR, we
left to the certification program to decide the structural approach. That is,
are you using a product that can quality you for meaningful use? And we focused
on the Meaningful Use Work Group of HIT Policy on two and three that is,
process wise, are you using these features and more importantly, outcomes wise,
are you getting any benefit from the use?

We focused on these “clinical” quality measures. We found that
there is a paucity of clinical quality measures meaning most of the measures,
and there are over 500 that are endorsed by NQF at this point, are defined
using administrative or claims data. So in fact, they are not using clinical
data that might come out of an EHR.

Another point is that most of the measures apply primarily to primary care
and do not cover specialty care very well.

Third, although the statute asks us to measure quality and stratify by
characteristics of an individual that would allow us to assess disparities in
care, often times we do not collect that information or properly report on
using that information.

Another problem is that there are national health priorities such as
coordination of care, or assessing and improving population health, yet we have
very few measures that focus on those aspects of care.

Finally, efficiency is very important, certainly in the context of health
reform and there we also have a lack of quality measures or a way of assessing
that. So in short, we lack meaningful measures. So that was sort of the
motivation for having this Hearing and hearing from hearing from the very
stakeholders in this quality measure supply chain. What is the current state of
the practice? How do we encourage more measures to be developed that use
clinical data from EHRs and how can we assess the things that are so important
for health reform or for the current health priorities.

So that is sort of a setup for how we invited folks to come and testify on
these matters.

While Helen is getting setup, let me introduce her. Dr. Burstin is the
Senior Vice President Performance Measures at the National Quality Forum and I
think all of us recognize that NQF is the endorser of quality measures.
Currently they have, as I said, over 500 measures that are endorsed and I am
sure Helen is going to cover the criteria by which measures are endorsed and
how they are trying to encourage measures that pertain to the health priorities
of the country. Prior to joining NQF, she was the Director of the Center for
Primary Care and Prevention in Clinical Partnership at AHRQ and she oversaw the
HIT portfolio which invested over $166 million dollars on research at the
intersection of HIT and Quality.

I think most of us know Helen quite well and appreciate her taking the time
to present to us. Thanks.

Agenda Item: Setting Priorities For Measurement

DR. BURSTIN: My pleasure. You guys were early. You are throwing me off. It
is so unusual to start five minutes earlier than unusual — apologies for the
delay.

So I think I am actually doing two talks. I will talk about criteria for
measurement and our industrial criteria second. I think what we want to do with
setting this up is talk a little bit about the national priorities at a broad
level. Where we would like the measurement field to go and as you have already
seen, obviously, from the Meaningful Use documents that came out of the Policy
Committee, a lot of emphasis already on at least several of the big national
priorities. So I will just go through the process and leave a lot of time for
questions.

So those of you who do not know NQF, in addition to endorsing national
consensus standards, which has been a traditional part of our role for 10 years
— actually this is our 10th anniversary — and publicly reporting
on performance over the last couple of years, has added to our mission
statement to the goal of setting national priorities and goals for the nation,
thinking that if we had a set of aligned goals, we would perhaps make more
progress than we do with the 500 measures, for example, that continue to grow
each year.

And lastly, trying to think through the piece around how we would actually
get these measures used and the fourth piece, and you will hear from Floyd
Eisenberg, Senior V.P for Health IT this afternoon, is we really see ourselves
also playing an important role as the bridge between the quality measurement
community and the HIT community and thinking through EHRs can really be brought
to bear to get us closer towards useful HIT based measures.

So about a year ago, NQF updated our evaluation criteria and the first one
here is the one that I just want to highlight. I will talk more about this in
the second session with David specifically focused on what makes a measure
meaningful? But one of the important things we did was really make the case
that not all measures are equal. We really want to get at measures that are, in
fact, truly important to measure and report. Some I would say meaningful, or
perhaps that term is used at the moment. But really trying to get at the fact
that we only want to measure things for which we can actually make a
significant improvement.

So what is the level of evidence for the measure that has to obviously be
clear? Is there an opportunity for improvement? We do not want to be measuring
things that are tapped out or fairly close to tapped out. We really want to be
measuring things where the act of measurement can actually be an important
force in terms of improvement. And that would either be in an absolute gap or a
significant gap across the providers who are being measured by the measure.

And then lastly, is it related to a priority area and the MPP areas I will
talk about in a moment are high impact areas of care. And the high impact area
of care, we are getting far more structured on in the short term with some work
we are doing under our HHS contract, which is evaluating all the top 20
conditions that are currently considered the highest priority for Medicare and
aligning them across nine different criteria to begin understanding issues of
cost, prevalence, impact, morbidity, mortality, complications, the whole gamut.
So from our starting point, that has got to be an absolute and, in fact, this
is an absolute criterion and if you do not pass this criterion the other NQF
endorsement criteria do not even apply, we will just stop evaluating the
measure.

So that was a lot of the basis of the relation to the national priorities.
It is that same thinking of only measuring what is important. So I will come
back to these other criteria in the next session.

We have been trying to move the field also towards higher performance and
Paul knows this well from sitting on our ultimate approver of endorsed
measures, our Consensus Approval Committee, that we have been trying to push
the field away from very narrow process measures that feel like interim steps
to an outcome, often very distal from the ultimate intermediate or final
outcome and drive towards higher performance. Really what is more proximate, if
it is a process measure, proximate to the ultimate outcome? And I will talk
more about that.

Definitely a shift towards composites trying to think about a comprehensive
view of measurement, measuring disparities in all we do. Paul mentioned that in
his opening remarks and I will talk more about that towards the end.

Harmonizing measures across sites and providers -– we continue to have
measures that are not harmonized between the physician level or the hospital
level, plan level and we really see that as an incredibly important role for
NQF and the field in general.

Then promoting this concept of shared accountability, the right measures
often times cannot be assigned to a given entity. For example, readmissions are
perhaps the best example here. Hospitals who argue they cannot be held solely
accountable for reducing readmissions and yet community providers will also say
the same thing, “I cannot be held accountable, I do not get any
information from the hospital that my patient, was first of all was even there,
which I can attest to for most of my patients or that even if I got it, I do
not have adequate information for a really effective transition and hand off.

So measures that really get at what is the right measure for the population
regardless of how it ultimately gets attributed or accountable to an individual
provider has been an important focus for us as well as beginning to think
across patient focused episodes. This is not necessarily episodes in terms of
groupers of, you know, you stopped having billing for this particular event,
but actually from a patients’ point of view the longitudinal view of where
we are going.

And so we are increasingly moving towards seeing measurement being, in a
sort of two dimensional framework, across these high priority conditions across
episodes to get the most comprehensive view as well as across the national
priorities and goals as I will mention.

These episodes are increasingly moving towards getting us closer to
measuring outcomes, measures of appropriateness, and then in the next six
months or so, also adding to that cost and resource use measures coupled with
quality measures.

This is a schematic of one of the initial forays that we did into this
episode framework of beginning to understand how, if we just focus on the way
we have done measurement to date, all of our measures are in Phase II. They are
all, for the most part, in the acute phase bubble. Did you get PCI within a
certain period of time? — Things like that, but we kind of miss the larger
picture.

And so it begins with the concept of a population at risk, so the population
health goal that has also been embraced by the Meaningful Use listings. A
population at risk for whom an MI could have been prevented, and then moving
across the concepts of, well you know there is an acute phase, a post acute
phase, secondary prevention, but then also recognizing that not all patients
will wind up equal at the end of an MI.

Some will have had an early intervention and wind up needing quality
measures that would look quite different than a patient who winds up with
congestive heart failure or for whom may have multiple co-morbidities that
complicates it, in which case you would want to be sure to include measures
that get at functional status, quality of life, advance care planning,
appropriate for all, but especially important for the patients who have that
trajectory, as well as certainly for the relatively healthy person at the end
of the day — a strong emphasis on those secondary prevention population health
kind of oriented measures.

We would also like to begin seeing care that is longitudinally assessed. So
a patient with an MI perhaps you would see their care in those first critical
30 days to get at that acute phase in acute rehab as well as perhaps looking
one year out, to really begin getting a really robust picture of the care we
are providing rather than these very narrow slices that we have been doing to
date.

So that has been the background for the way we have been thinking about the
conditions piece, but there has also been a strong emphasis on seeing the
cacophony of hundreds and hundreds of measures is not getting us where we want
to go. And so MPP served as the convener of a group called the National
Priorities Partnership. And the logic here was that if we could agree that
everybody would focus on the high leverage areas, where we think that this
harmonization across the multiple groups, these effecter arms as we like to
think of them, around common goals for improvement, could actually
significantly and fundamentally drive improvement in a more rapid way.

And so the goal of this group was to establish national priorities and goals
for public reporting, to focus measurement improvement efforts on those goals,
and it was lead obviously quite ably, by Don Berwick from IHI and Peggy
O’Kane from NCQA. There are now 32 leadership organizations who sit around
the table.

The initial set of priorities and goals have been completed, as I will show
you, and the next steps are also thinking about what they could collectively do
across a set of drivers, of which measurement is truly just one driver.

We often talk about it because it is the easy one to kind of grasp. But in
fact, regulation, accreditation, paid for performance, the various ways we
could drive improvement, are what these groups are trying to do by coming
together.

So as they set the national priorities, the goal was saying, what are those
high impact areas? They focused on the areas for which we would achieve these
four key aims: providing effective care, eliminating harm, removing waste, and
eradicating disparities. And although disparities are not one of the six
national priorities and goals, it is considered the fundamental cross-cutting
area that we want to ensure across all the national parties and goals.

So let me just run through the six national priorities here. So the first is
patient and family engagement, one that has also been included in the Health IT
Policy Committee list of important areas to move forward on. And the idea here
is we cannot make significant progress until the engaged patients and their
families, in managing their own healthcare, are making better decisions about
care.

And so the specific areas of focus, the goals underneath this priority area
would be: patient experience of care assessment in all settings of care, with
feedback to the providers of how they are doing. I think we have made
significant progress on the measurement side here and have endorsed patient
experience of the care measures in almost every setting.

The second is what is really involved in patient self management and some of
the measures should really get at patients having the tools they need to better
manage their care. It is a whole lot easier to manage your diabetes if your
hypertension, if you in fact know what your readings were from your clinic and
when you were seen. So huge opportunities there, I think, for a bi-directional
flow of information to dramatically, hopefully, improve patient chronic care
management.

And the last one that, I think, is going to be in some ways the most
challenging, because the measurement field is probably at the earliest, is the
idea of making sure every patient has an opportunity for shared decision
making, before they make a decision for a treatment or a procedure.

And the field of decision science is still evolving into what that looks
like. There are some very nice condition specific or procedure specific, ways
of doing decision support for patients. But this is probably the area where I
think we are going to see the greatest growth over the next couple of years
because we just have so little to date.

The second one is population health, which so much of our measurement focus
has been in the silos of our healthcare providers and entities and here it is
really taking a more global view. How do we improve the health of the
population? Another really important area, I think, that Health IT can just do
dramatic things for, in a way we have not been able to do.

The three specific foci here are; improving healthy lifestyle behaviors,
really focusing in on those behaviors that we know have a significant impact.
Ensuring that all Americans get, at least as a starting point, all the
evidence-based preventive services that are indicated for them by age, gender,
and risk factors. And the third one, which is, I think, probably the most
innovative and the one I find most exciting in terms of thinking about the
potential for Health IT, is this concept of a community index. Such an index
would allow us to assess the health of a community, which would likely be a
composite of many different kinds of indicators of community health. But the
kind of thing very few of us — even as a practicing doc in a community health
center for decades, I have never seen, for example, the health of a community
that I serve in although I know it is pretty poor — it would be very helpful
to be able to get that.

And the ability, for example, of using geographic information systems, GIS
and other indirect methods, to in fact even be able to target in and say it is
this particular sub-part of this community for which A1C’s are alarmingly
high, figure out what the issues are and that is what I think the capacity of
bridging some of the community registry data, the population health data to the
personal healthcare system. Quality measurement data is very exciting.

Safety, obviously one of the foundational elements of the healthcare system,
has to be there. A lot of focus specifically here on healthcare associated
infections. We have been trying to think about what that looks like across
sites of care. The ability, for example, to have Health IT connected facilities
here is especially exciting.

If you think about the example of surgical site infections and how hard it
is to be able to do the 30 day assessment, or if you have a device, the one
year assessment to see if there is an infection, can really be driven by this
ability — again focusing in on some of the key issues, like serious adverse
events that we are beginning to look at, as well as mortality and then a
broader view of the different causes of mortality.

Care coordination is probably the one, I think, could be most significantly
impacted by a really interoperable Health IT system. Ensuring patients receive
well coordinated care across all provider settings and levels of care, with
specific foci here on ensuring that medication reconciliation can be done in a
way that is logical at the appropriate transitions in care and not a huge
burden without impact.

I am still the kind of doc who likes the brown bag at every visit filled
with every pill at home because I still do not trust what I get most of the
time. But hopefully the systems will catch up so that in fact the systems will
allow us to do what I can only currently do with a — actually they are almost
always Target plastic bags now because so many patients get the $4.00 generic.
So emptying them out all on the table –- but boy to be able to do that in
a way that is IT enabled and sharable across sites is really the goal here;
preventing hospital readmissions and preventable ED visits of trying to think
about how to use care better.

Palliative care is one, I think, that is probably the one, I think, may have
been the most surprising for folks who saw the national priorities and goals.
This was the idea of looking and ensuring that we have appropriate and
compassionate care for all patients with life limiting illnesses. And that is
not just end of life, that is not just hospice care and the end of life but
patients for whom relief of physical symptoms, whether it is from COPD or
dementia, are getting the appropriate care they need as well as help with the
psychological, social and spiritual needs that patients and families will face.

And then finally, thinking about what is access to high quality palliative
care and hospice services and how we ensure and measure the quality of those
services.

Overuse is a critical one, I think, that all of us agree and anybody
following the healthcare debate, it is hard not to focus in on this one –-
we want to ensure that we are eliminating waste, while ensuring the appropriate
delivery of care –- the delivery of appropriate care.

So there are specific areas of foci that the national price partnership
identified including inappropriate medication use, unnecessary lab tests and
the data is just resplendent with evidence of repeating laboratory tests,
repeating diagnostic tests, because we cannot get access to what was done at
another setting; again an opportunity, hopefully, for some of the IT to come to
bear.

Unwarranted maternity care interventions, diagnostic procedures, other
procedures, unnecessary consultations, preventable ED visits and
hospitalizations, inappropriate non-palliative care services at the end of
life, and then potentially harmful preventive services with no benefit.

And this is the classic example of the D-list from the U.S. Preventative
Services Task Force that I used to oversee at ARHQ, where we know the risks
exceed the benefits. These should be the easy ones to try, to at least as a
starting point, move forward.

We do have a lot of work to do on convincing patients in the communities
that more is not always better. Certainly not something I have ever convinced
my mom of, not for lack of trying for years and years and years, who views if
she comes home with several referrals and procedures to be followed up, that
that is a really good thing.

Lastly, those are the six national priorities and goals. I also just want to
at least emphasize, I think, it is really important, although disparities was
not listed as I mentioned, as one of the six national priorities and goals
–- there is a huge opportunity for us especially in an IT enabled
environment to ensure that disparities measurement is not an afterthought, is
not something we do after the fact to see if there were disparities. But in
fact, we should be routinely assessing our quality of care by race, ethnicity,
language, and socio-economic status as part of routine measurement. And we are
obviously exploring both the direct methods for collecting these data from
patients in a way that is patient centered and effective, but also some of
those indirect methods I mentioned earlier using GIS or coding.

And finally we understand we probably cannot stratify everything, but we at
least want to ensure that measures for which we know there are known
disparities get stratified.

So we have come up with an initial set of measures that we have classified
as disparity sensitive in the ambulatory care setting, where a set of criteria
were applied to say, if you are going to stratify any measures, at least start
by stratifying these with a focus on the prevalence of the condition, the
impact of the condition for the disparity population, the impact of the quality
process –- are there known interventions we can do to, in fact, reduce
these disparities and the size of the quality gap?

Our plan moving forward is to, in fact, go through the entire portfolio of
measures, at least the ones we think are important, to say which of these
across all settings of care should always be stratified for disparities.

And so just putting it together very briefly, this is the last slide, just a
visual for us of where we think we are going here. It is that same slide of the
acute MI across an episode, but with the overlay of the cross-cutting national
priorities here in gold.

And so you would, for example, assess those patient preferences at the acute
phase of what they would like done. You want to ensure care coordination across
the entire arrows that cross our various sites of care. You want to get at
issues of overuse, in terms of cardiac imaging or procedures; you want to get
at population health at the starting point to ensure that your population, who
is potentially at risk or could be at risk, reduces that risk for an MI. You
want to understand and ensure we have got the palliative care for the patients
coping with end of life, or if nothing else, just coping with the need for
relief of symptoms. And then you want to have safety, obviously, as a system
property that goes across the entire thing.

So it is this vision of where we are hoping measurement will go, that allows
us to get that comprehensive view across conditions and multiple conditions for
many of our patients, which I think will be one of the challenges in a
measurement way of trying to think about how you take these episodes across
patients who have multiple conditions. And then finally, overlaying these
national priorities and that is, at least our vision, of where we are hoping we
can move forward in the field. And I will stop there and take questions.
Thanks.

DR. CARR: Thank you, that was terrific. Could I start off with one question?
In terms of socio-economic disparities, I am wondering if incorporated into
that, is the factors that have to do with low income folks who cannot take time
off from work. In fact, let me refer in particular to the Massachusetts
experience, where health insurance is held by most folks, but that health
insurance may have a very high deductible or co-pay. So have we found a way to
think about the kinds of disparities that result from — I want to get
preventive care, but I cannot take time off, I cannot afford the co-pay, I have
no one to babysit my kids, things like that?

DR. BURSTIN: It is an excellent question. I think some of this is the
limitations of the dataset, if we have the capacity to include those kind of
data and I think those are probably difficult to incorporate into some of our
IT systems, for example. But patients’ ability to self report, some of
this might be particularly important.

I think the other thing is that this is one of those areas where that might
be an appropriate way to at least be able to pull out — and I will talk about
that in the next session — understand the issues around exclusions. And there
has been a lot of discomfort, for example, about excluding patients from the
denominator of a measure for which there were financial issues.

This is a classic example for me, it is very difficult to get ARBs although
I can get Ace inhibitors for $4.00 from Target for my community health center
practice. Getting an ARB is pretty difficult and if somebody has a reaction to
Aces I cannot get an ARB unless I try to go through the prescription assistance
program. Should I have an exclusion, for example, that allows me to say I
cannot get my patient this drug?

I think this is a philosophical issue and I think it is one of those issues
where I think, from our perspective, the key here is transparency. So if you
are going to be excluding patients like that, at the end of the day we should
be able to see the percent of times I am excluding patients based on
patients’ ability to pay, whatever the case may be.

So first of all, we can begin to understand where the issues really are and
second of all, the last thing you want it to do is become sort of an easy way
out. So I view myself as a safety net provider. When I see patients, I should
be kind of going that extra mile to try to make sure I can get those things
done. But we also want to have the transparency to allow it, so that if I can
routinely not get my patients in for colonoscopy’s, which is quite
difficult to do in D.C. if you are uninsured, there is at least a way to track
it and see it over time.

DR. CARR: Right, yes. I think your point is well taken. Not necessarily
excluding, but even –- and actually we were speaking last night just about
saying — what is someone’s deductible. If you have a $50 deductible and
you are in one cohort and you have a $3,000 deductible and you are in another
cohort, that that represents two populations.

DR. TANG: Thanks, Helen for really an articulate description of the national
priorities partnership and the way it is folding into the measurement
developing is really very nice.

Do you have a sense of the timeline of getting from where we are with this
big bulk measures to the kinds of measures that you are talking about? What is
the timeline for the development and what is the timeline for adoption? How
long will it take to get there?

DR. CARR: That is an excellent question. I tend to be a fairly impatient
person, so I hope not very long. We are fortunate in that we did get a sizeable
HHS contract that allows us to do some very wide-ranging projects. We have just
launched one with the Steering Committee next week on outcomes across 20
conditions.

I do not know that we are going to get all the outcomes that we want, but we
see it is something iterative that at least we can identify where those gaps
are. What are the most important measures and kind of provide a menu of what
needs to be developed going forward.

We are going to do the same thing shortly around resource use. I think some
of the measures around care coordination are in development. I know you will
hear more from some measure developers later, who have those in the pipeline,
at CQA for example.

So I think we will fill out those national priorities within the next one to
two years. I do not think it is going to be a very long period of time. I think
the challenge is going to be if you overlay that with what we are trying to do
in Health IT, it is not clear how many of those can be jump started, by
developing them from scratch in an EHR enabled environment.

So we often have this concept of developing a measure and at least as you
will hear later on from Floyd, we have been trying to think about how we then
retool them to make them work in an IT enabled environment to get at issues of
meaningful use for example. I think the challenge is going to be to have
measure developers kind of reframe it in their mind and say stop, if you are
going to build for the future here, build a measure built off of these
interoperable HIT systems, even if we do not have them in our hands yet, so
that we do not go through that secondary step of retooling. That, to me, is
most exciting.

Somebody just told me about a very exciting safety measure and he said,
“Well the only problem is you can only do it if you have an EHR.” I
was like, great, bring it in. That is the way to go. We do not need to just
wait and say, do this on pen and paper. Bring the right measures in and I
think, hopefully, if the other incentives move us forward we will be able to do
that.

But the capacity to pull in registry data and things like that, I think,
especially you asked the question earlier about specialty measures, I think,
one of the biggest incentives to move the specialty measures forward is going
to be the capacity to pull in registry data, good clinical data that clinicians
view as being critically important to understanding outcomes with the data from
an EHR. If we can do that we can really move on specialty measures in a way
that we are not going to be able to do quite as easily with the typical data
that is within the EHRs we have now.

DR. FITZMAURICE: Helen, that was a great presentation that really gives a
good grounding to what NQF is doing and why. I noticed an awful lot of the
demands site, here are the priorities, here is what we consider important, not
an awful lot on the supply site. Can these quality measures be supplied? What
is the cost of producing them? Are the data available? Does it take a lot of
physicians’ time, expensive time versus staff time? Is that a
consideration in the quality measures that you choose to go forward with?

DR. BURSTIN: Absolutely. And actually it is a little strange, because there
is a second presentation I will do shortly, on what makes a measure meaningful
which would have been logically pulled together, and in there I will
specifically talk about an online what is feasibility, for example. What are
the bars for which we would set a measure to do in terms of how difficult it is
to collect the cost of measurement, the feasibility of measurement, the
capacity to pull it off of IT enabled systems?

But I think the other important consideration is the fact that there has not
been a lot of dollars out there for measured development. So we can only go so
far, we can help with some off the retooling, but there still is a gap in terms
of some measure developers wanting to move forward with some of these newer,
more complicated measures. Most outcomes require pretty significant risk
adjustment, for example, that is not inexpensive to do in tests.

So I think the ability to pull that stream of funding in, which is part of
some of these proposals going forward, would be very exciting.

DR. FITZMAURICE: I remember when NQF was first set up, it was set up as a
consensus body a standards body so that, among other good features, a federal
agency who wanted to adopt quality measures, could adopt them by reference.
They did not have to go through and develop them themselves since it was a
United States standard, could put them into regulations. Indeed, I would expect
something like CMS maybe VA and DOD, to be very interested in that feature as
part of the National Technology Transfer Act and OMB’s has got good
directors. We are all executive directors; we are all encouraged to do that. Do
you see a lot of feedback from that in interest of CMS, DOD, VA for just that
reason?

DR. BURSTIN: It is an excellent question. It is actually a huge part of what
we do and it is particularly on the CMS side. We have not had a lot of
interaction specifically with VA and DOD. But CMS, for example, routinely looks
to NQF endorsement as a requirement to move forward, unless there is a
compelling reason not to with a measure set.

DR. CARR: I think why don’t we just move into our next section and then
it is you again.

Agenda Item: What Makes a Measure Meaningful?

DR. BURSTIN: So just continuing on that theme, I will skip some things that
may be duplicative. Their specific question was what makes a measure
meaningful?

It is amazing how many layers of definition meaningful have taken on in the
last couple of years. Specifically, I have already gone through this concept of
trying to move towards where we are hoping measurement will move us to.

I think it is important to consider where we are. We have had a huge growth
of measures, based on several important drivers. The need for measures for pay
for performance, specifically at the individual physician and clinician level,
disparity sensitive measures, patient experience of care measures,
cross-cutting measures, but we still from our perspective from where we sit,
have a couple of key questions. Do we have too many? Do we have too few? And
are they the right measures? And I think that is a lot of what I talked about
in the first presentation.

The availability of data sources for measurement becomes critical and then,
obviously, this whole transition to EHRs, I am hoping will be transformative in
the way we look at what is a meaningful measure.

Just to focus in the criteria a bit, I talked already about importance to
measure and report and I will go through each of these in a bit more detail.
The three other criteria that are especially important are scientific
acceptability of the measurement properties, which is really about the measure
itself. What is the reliability of the measure going forward? Usability, can
the intended audiences understand and use the results for decision making? The
ultimate be all and end all for NQF endorse measures, is to be publicly
reported and used to make better decisions, something that we are certainly in
a transition phase for at the moment.

And lastly, feasibility, can that measure be implemented, Mike’s point
earlier, without undue burden, capture it with electronic data or use EHRs to
capture it?

So I will run through each of these because I think at least the lists of
bullets are identified. Some of the key issues are already captured within
these criteria. So just to compare the old and the new, the key issues that I
mentioned are importance to measure and report is now a must pass criterion and
then feasibility now has a much stronger emphasis on Health IT. And it is
probably not a surprise that Paul Tang has his hand in developing these
criteria as well for us. Then, lastly, the issue of usability.

So importance to measure and report, I mentioned some of this in the earlier
talk, but essentially, is the juice worth the squeeze? Is the effort extended
to produce these measures worth it because it allows us to get it measured and
reporting in an important area in which improvement is possible?

The specific sub-criteria’s I mentioned is it related to a priority
area? Is there evidence to support the focus and opportunity for improvement?
And this, I just pulled this in, I have seen this certainly many times in the
Meaningful Use slides from the Health IT Policy Committee, but this is really
what we are attempting to do here with these new criteria of trying to move
towards the outcomes piece. The advanced clinical processes here, I sort of
view as the process measures most proximate towards the improved outcomes, as
you will see in the way we think about how measures that are meaningful are
developed.

So specifically, we would want to have evidence for each of these kinds of
measures, and measures are very, very different so on an outcome measure, for
example, you want to have evidence that that intermediate outcome, for example
blood pressure control, leads to improved health or avoidance of harm.

On a process measure, as I mentioned earlier, we want to specifically know
that that process measure is proximate enough to an outcome, that it actually
has an impact on improving the desired outcome. And those can be intermediate
outcomes. But we have had measures submitted to us, for example, that say, was
the patient assessed as to whether they needed a flu vaccine? As opposed to,
did the patient have a flu vaccine? So we no longer want to deal with the
measures of assessment and things like that, unless they are closer to the
ultimate end game of being able to track the outcome.

Structural measures continue to be very important and actually Blackford
co-chaired a committee for us just about a year or year and a half ago,
specifically thinking about Health IT structural measures. So what is the
evidence that that structure ensures consistent delivery of effective processes
or access to get to avoidance of harm or improved benefit.

And lastly, efficiency, an area of increasing emphasis for us, is thinking
about the association between the measured resource use and the level of
performance. So again, as I mentioned, we are only interested in resource use
or cost as it is coupled to quality, so we can see the two of them together and
get at the concept of efficiency.

I want to talk a moment about clinical guidelines because I think as we
think about the evidence base here; so much of what we do is driven by the
clinical guidelines. As much as we can complain about the state of quality
measurement, in fact much of that is driven by the state of clinical guideline
development.

The clinical guidelines are often not developed with quality measurement or
clinical physician support, our ultimate goal of improvement in mind. There is
often a lack of specificity, for example, we may know what services we should
be doing, but there is no emphasis on the periodicity of that testing because
often the evidence base is not as clear to make that determination.

There is lack of precise definitions. High risk patients are a classic
example, if we cannot specify that sufficiently, a good quality measure cannot
follow it. And then as a corollary there, the decision support rules cannot, in
fact, pick up the right patients either. And then the lack of imprecise action
terms, you know, may consider doing something does not lend itself well towards
translating to a measure.

So the consideration of appropriateness, for example, makes this very
difficult. It is hard to take the three inch tolm of an appropriateness
guideline developed by some of the specialty societies, which may be elegant,
beautiful work, but it is hard to distill that into what becomes a meaningful
measure.

We also tend to have a focus on those measurable branch points. They may not
be the most important, but they are the ones we can kind of grab the data on.

I used to have the pleasure of working for John Eisenberg for three years
and John used to love to talk about the drunk looking for his keys under the
lamplight, you know under the lamp post. And I think, unfortunately, a lot of
the way we have been doing measure development has been focusing on, well I can
see those data on the lamp post; let’s make measures out of that. As
opposed to saying, what is the most important thing to measure and then trying
to find the data to do that? — Again, obviously, a pretty big IT proponent. I
think this is where we can hopefully get at the right data to get to the better
measures.

And then, ultimately I think, as we think about standards on the quality
measurement side, we equally need standards on the clinical guideline side to
ensure we are getting clinical guidelines that are computable and useable for
both measurement and improvement.

This is just a slide that Danny Rosenthal, who works with us who is a
medical informaticist has put together, just making the case that all of what
we do is a case of shared evidence. We all rely on the same evidence in terms
of measurement and improvement. Guidelines really become the trunk of that tree
and ultimately those clinical decision points are those branch points. Clinical
decision support should be focusing in on the things that, if you can remind
somebody to do something or encourage a different clinical path, you can
improve outcomes. Ultimately, I think we are hoping that quality measures will
move from those very early narrow branches where there is something measurable,
towards those leaves at the outer point of getting towards outcomes.

Scientific acceptability of the measure of properties, we want to ensure
that a good measure going forward should have precise specifications so that
you could replicate it from site to site and do effective comparisons. You want
to ensure that there is some level of testing.

This has been a challenge for us, as measures have moved into the field with
such rapidity to meet the needs of various programs. We are not getting a lot
of measures that have been adequately tested. We are now in the process of this
year beginning to see all the testing data for that first year of measures that
came in under our time limited endorsement option. We will begin to see how
well they work going forward.

We need to see some demonstration of comparability of different data sources
we are using. I do not think we are going to get this is the short term. I
think one of the real challenges for us going forward, will be that as we have
EHR enabled measures, it is not clear that they could be compared to measures
off of administrative data or compared to measures off of chart review data.

And I am hoping some of the research work will help us begin to see, in
fact, how often we have measures that allow that comparability across different
data sources. We often think of the quality of the data sources, again some
work Paul led with us in our Health IT Technical Expert Panel, of thinking
about the quality of the data that is within the measure itself.

I think as we begin thinking about measures that may be based purely on
administrative data, probably the quality of that data, certainly on the
outpatient side, is not great. But as we move up that path towards getting at
clinical data from an EHR or other clinical registries and moving up towards
that, we are sort of assuming that chart review data and EHR data should be
fairly similar. We do not expect a whole lot of comparability to measures built
purely off of administrative data. But the jury is still out.

We want to make sure that the specifications should allow us to look for
disparities, as I mentioned, should have risk adjustment, certainly if it is an
outcome measure, and this issue of exclusions is the one I think is going to be
a really critical issue for us moving forward.

We all know exclusions significantly increase the complexity and the
measurement burden on what we do. It limits our ability to use electronic
sources, we often times have measures, for example, especially at the hospital
level, that require you to go to a chart, or pull an EKG, or pull a vital signs
sheet to get at exclusions. And it is a real barrier towards getting at
harmonization.

So we in our updated evaluation criteria, and this is an area where we have
not been as strict as I think we need to be going forward, need evidence
presented that the measurement results would be significantly distorted without
the use of that exclusion.

We oftentimes have measures where there are 25 or 30 exclusions, and if you
actually did a sensitivity analysis, one could argue probably 20 of them may
increase the confidence of the provider being measured but, in fact, in terms
of the actual impact on a result, are really quite small. And we need to have a
better sense of what happens with those going forward.

We need to have this issue, we talked about earlier that Justine raised
about SCS, but also patient preferences. We need to have transparency to
understand when a patient preference is potentially the reason for an
exclusion.

And we have very strongly made the case that the last thing we need to do,
as we are trying to move towards measures that are more feasible, is require
additional data sources beyond what is needed to do the actual measure itself
to get at an exclusion, unless really without it, you would be significantly
hurting the validity of the measure. This is a challenge for all of us.

And I think going forward, it is not clear what the best approach is going
to be in an EHR, in fact, from talking to folks about, do you want to have a
great deal of specificity on some of the clinical exclusions, for example, that
are contraindications and embed those into the EHR? Or do you want to allow for
more open ended exclusions with the ability to do back and audit those fields
to figure out what were the most likely exclusions and what was appropriate?

Usability, I mentioned really from our perspective, requires evidence that
those measure results would be both meaningful and understandable to intended
audiences. And we really do, again as I mentioned earlier, focus on measures
that are appropriate and usable for public reporting and in forming quality
improvement.

So measures purely used for internal QI do not need to come through NQF.
Those are great. But if they could not pass the four NQF criteria, we do not
necessarily need to bring them through our process. They may be very
meaningful, but they are not going to be meaningful for the comparisons between
providers and public reporting.

And we have also now specifically honed in on this idea that we have to have
measures that are harmonized, to ensure that we are adding measures that have
distinct or additive value. This is probably the toughest to implement. But I
think is the one, in some ways, that is the most important.

CMS supported our project for us about a year and a half ago on
immunizations for flu and pneumococcal vaccine. Because we had this cacophony
of measures across nursing homes, and home health and clinics and hospitals
and, in fact, when we began looking at the project, we had 35 candidate
measures for flu and pneumococcal vaccine.

One could immediately go – that makes no sense and so the idea is, we came
up with a set of what we thought were the appropriate specifications that we
think all the measures should align to. We recognize there may be different
data sources, oasis or MDS, but at least the science should not be modified or
different, based on the kind of measures you are using going forward.

Feasibility, obviously probably the most important from the perspective of
where we are sitting today, in terms of thinking about getting at the data
without undue burden, and as much as possible, trying to use data that are
routinely generated as part of care. So again, EHRs, clinical registries,
whatever the case may be, and we want to ensure that the required data elements
that are there, and I will mention some of the work that we have been doing on
our quality data sets shortly, are either in electronic sources or there is a
least a credible near term path towards getting to electronic data collection.

We are probably expecting, I would guess, within the next one to two years
depending on who you ask and I am not sure exactly what that curve will look
like, but we are going to require specifications for EHRs at submission.

As well as the fact that all measures are now going through measure
maintenance, thanks to our HHS contract, we were able to do a much more
rigorous job on measure maintenance. And I suspect many of those measures that
are currently endorsed, will not pass through measure maintenance going
forward.

And the requirement will be, going forward I suspect again and probably
around that same time period of one to two years that all the measures up for
maintenance, will have to submit their EHR enabled specifications. Again the
appetite for truck based measures except in very limited areas has really
dropped considerably.

And the last issue from our perspective on feasibility is since we are now
applying proprietary measurement systems to submit to NQF, one of the
feasibility considerations is the cost associated with the use of that
proprietary measure system.

This is Paul’s slide, you have probably seen it, but again as we are
thinking about the HIT enabling of this and we want to get at that shared data
element, that sweet spot, as Paul likes to call it, between quality measures,
decision support and clinical guidelines.

And so I will just end with just a couple of thoughts about some of the work
we have been doing around this quality data set. And trying to make the case
that from where we sit, the way to make measures more meaningful going forward,
is to ensure there is harmonization and ensures we are getting at the right
kind of clinical data. This would enable us to, for example, be able to very
clearly always know the code set, the code list that is required. For example,
if it is an active diagnosis of diabetes that is required, we will specifically
indicate, that has got to come off a problem list — that cannot come off an
ICD9 coding in the outpatient setting.

So this is a real transition for us. And ultimately, as you will probably
hear from Floyd this afternoon, we are envisioning this quality data set
getting built into a measure authoring tool, that will allow measure developers
to have again a publicly available measure authoring tool, that will allow them
to immediately pull up the right code list. Pull up where you would find the
data within an EHR so that again we are trying to build measures that are more
consistent and get at what are the clinical areas of importance.

This was a list of some considerations we had come up with early on as part
of our Health IT Expert Panel around what would make measures especially
meaningful from the IT perspective. And we have already talked about national
priorities or high impact, but we also talked about, does the measure reflect
leverage of something really important in Health IT?

Are you using your system to get at what is most important? Can you, for
example, pull aspirin off your medication list in your EHR? Something
notoriously absent from most things because it is an over the counter drug, it
is the most important preventive services indicator for adults in terms of
impact and prevalence and yet most of the time we cannot get at it now. So if
you leveraged your IT system to get at that, would that be another
consideration of a meaningful measure?

Are you getting a more credible representation of quality because you are
using good clinical data as opposed to assuming what comes out of
administrative data for example would work adequately.

The next one here, I think, is more future tense but I was really pleased to
see it in the 2013 and 2015 considerations for Meaningful Use, which is the
measure of innovative patient centered data sources. So as patients begin to
submit some of these data, do we get a different set of measures potentially
through their input?

And lastly, is the measure sensitive to effective coordination of care and
data sharing across sites and providers. We know just having the measure within
your own setting is not sufficient if you really cannot share across settings
of care.

And this was your work, again some of the HIT Policy Committee had done, but
I think we are moving towards trying to think about what is a meaningful
measure, I think is in evolution. And I think getting to the point where are
have those advance care processes with decision support on the path to
outcomes, we really see much of our work focusing in on getting us towards that
path of the more meaningful measures at the end of the day. We have already
talked about the goals.

And the last slide I wanted to show is a wonderful slide that the RWJF
Aligning Courses for Quality group have put together about thinking about the
comprehensive data that is needed to generate performance information. And I
think if we really begin thinking about what makes a meaningful measure, we are
going to have measures that are a whole lot more meaningful if we can, in fact,
have these data streams that allow you to pull in the information from
pharmacies, labs, EHRs, hospitals, registries and I would also add to this
patient report, PHRs.

That is the way to get at that comprehensive integration of data, to get at
what is truly a meaningful measure. With that I will stop and take questions?
Or whatever you would like.

DR. CARR: Thanks, that was great. David, why don’t you start now? David
is the Archstone Foundation Chair and Professor of the David Geffen School of
Medicine at the University of California, UCLA, and chair elect of the American
Board of Internal Medicine Board of Directors.

DR. REUBEN: Let me introduce myself a little bit. I know a number of the
people in the room and some I do not. I am Dave Reuben and I am geriatrician.
All my patients are Medicare patients. My oldest is now 100 and my youngest is
probably in her late 60’s. So that is one of my pediatric patients. Most
of my patients are in their 80’s and 90’s. Part of my day job is
running the division of geriatrics at UCLA, but another part of my day job is
working on both quality measurements, developing quality indicators and
improving physician performance on quality indicators.

I have been part of the ACOV Team which won the Eisenberg Award last year,
so I have a little bit of experience with this and I will draw in some lessons
from that. My nights and weekends job, and where I am actually supposed to be
right now, is in South Carolina because I am the chair elect of the American
Board of Internal Medicine. And Cris Cassel apologizes for not being able to be
here. She sends her regrets and as a public disclaimer, I will say, I am no
Cris Cassel. But, glad to be here.

So I am going to talk a little bit about meaningfulness criteria. My take on
this is a little bit different and I think complimentary to what you heard from
Helen. We could call it the gospel according to Reuben. But I will talk a
little bit about validity, importance and longevity of measures. Then I am
going to talk a little bit about physicians’ organizations and how they
relate to quality measurement and improvement. And then specifically, I am
going to speak about efforts of the ABIM, with respect to board certification,
rote learning into quality measures and unrelated quality measures. And then
finally I will close with talking about how these board measures align with
other efforts.

So in terms of validity, one of the things you always want to be concerned
about with validity, is does the measure capture what it is intended to? So,
for example, a pretty common measure is smoking cessation counseling. Indeed,
if your patients are smoking, you want to tell them to do smoking cessation
counseling.

So, I have two patients left that still smoke. All the other ones either
have died or I have convinced them to stop smoking, so I have two that are
left. And every time I see them, I tell them to stop smoking. I just tell them
to stop smoking -– if they are still smoking five or six cigarettes a day,
stop. And sometimes I sit down there and I go with a very scripted routine of,
let’s set up a start date, here are the different patches and different
ways of doing it, I will call you on certain dates, and that is more or less
what is meant by smoking cessation counseling. Other times I say, you know, you
ought to stop smoking. It is really important to your health. And that is it.

For smoking cessation counseling, those would be valued as equal in some
senses. I would satisfy the measure either way. But it is not always the same
thing. So does that measure really capture what was intended? What was intended
by the measure, is me sitting down and spending about 10 minutes with a patient
going over how to stop smoking.

The second, does measurement discriminate performance among providers? And
here you have to decide, what is a reasonable sample to distinguish one
physician from another physician in terms of how they are behaving? We at the
American Board of Internal Medicine and I will show you some examples later,
think that probably 25 is a reasonable sample size, with 25 diabetics, you can
get a reasonably decent measure of somebody.

Then the question is whether you are measuring at the individual level, in
other words, the individual provider level versus in the practice or the
system, so some of the things that are measured, in terms of quality measures,
are things that are very easily measured, because they are physician behavior.
Some are actually really dependent upon how good the infrastructure of the
practice is, so what are you really measuring, the individual provider or the
system?

The third question about validity is does improvement on the measure result
in the improved outcomes? So what you really want, and this is the way a kind
of measurement works in general, is if you have a randomized clinical trial and
it shows that a new therapy is effective — Beta blockers for MI. And then you
will have a professional organization say this is a guideline now. And then you
will develop quality measures that reflect whether that guideline has been
implemented. Then you will do a quality improvement step to improve the quality
on that. And then, what you should see is better outcomes. So you are linking
RCT data to better outcomes.

And we have a fair amount of information on this first part up to about here
but the back translation going from clinical trial data to quality improvement
to better outcome; sometimes we do not have those data. In fact, sometimes we
can improve the process of care and not improve the outcomes of care. So this
is the linkage that you really want to have but we always do not have
especially this final link.

And finally let me just say a word about forced responses to move on to the
next screen. In a lot of the electronic health records, to get to the next
screen, you have to answer something. You cannot get out. You cannot escape.
And there is some data actually on pain control where they have taken a look at
quality of care in practices that had to have a forced response. So you had to
click to get onto the next screen versus those that did not. And the quality
was actually worse when you had to click to get to the next screen. So people
who would do it would say, get this out of my way. I need to move on and do
other things and they actually did not do the process.

Importance, how much impact does satisfying the quality measure have? So
here you have two quality measures. One would be weighing the patient at every
visit to see if they have weight loss. And the second would be providing
nutrition counseling. They may be two quality indicators that are both valid,
but they are probably not of equal importance. You could say that weighing a
patient, which takes about 10 seconds, may not have a whole lot of importance
compared to providing nutritional counseling which may take 20 to 30 minutes,
but they may, in fact, be treated equally.

Related to this also is the value of individual measures versus composite
scores and I think Helen was mentioning that the field is really moving much
more towards composite scores. Some of the work we did with ACOV, we actually
showed that improvement of quality resulted in better survival. This is really
a nice thing to find. So we said, gee we can take it down and find out what
were the real drivers of that. We dug and we dug and we found that the one
thing that consistently showed better improvement in survival was pneumovax.
Now rationally, it does not make any sense. But, in fact, it was that surrogate
marker.

So trying to rely on a couple of markers and say if you move these, it is
really going to make a difference. It is probably not the way to do it. You
probably need a composite outcome.

How long does a measure remain current? Well one of my colleagues, Paul
Chakel looked at this a number of years ago and said that based on the evolving
science that a guideline lasts about three years. So if you build in quality
measures and they are there forever, guess what? They may be old and may not be
measuring the right thing. Witness the example of estrogen. One year it was a
quality measure that if you did it, you got credit for it. The next year, it
was a quality measure that if you did it, you got penalized. So in fact, the
science moves on.

How long does it take to game the system? There are a lot of electronic
health records now that are tailored towards quality measures and if you do not
do anything, actually the quality measure is satisfied because the default goes
to satisfying the quality measure. This is some scary stuff –that if you do
not do something, you can actually satisfy it.

I am also very impressed by the marked capacity of the American free
enterprise system to quickly respond to economic incentives. Indeed if it is
being measured and if you are being paid for it, they will find a way to solve
the problem.

So I am going to shift gears entirely here and talk about physician
organizations. This is the view from 30,000 feet. The first are medical
societies such as the American College of Physicians, the American College of
Cardiology, the Society that I belong to is the American Geriatric Society and
these are membership organizations.

And they are designed to advance the field, to advance the field of
discipline that would help professions and also for the general good of the
public. They promote education, they typically provide CME, they publish
clinical guidelines, and they publish journals. They are clubs and they are
somewhat parochial. They really represent their members.

The second is licensing boards and licensing boards are generally state
based and they are required for practice. And they are very bureaucratic, they
are just state regulated.

The third level is certifying boards. The overall certifying board governing
body is the American Board of Medical Specialists. But within that, you see the
American Board of Surgeons, the American Board of Family Medicine, the American
Board of Internal Medicine, Orthopedics, there are about 27 or so boards here.
These are not for profit. They are oversight organizations and they are not
membership organizations.

You do not become a member of the American Board of Internal Medicine. You
become a diplomat of the American Board of Internal Medicine. They do not
accept support from any kind of pharmaceuticals or device companies and one of
their responsibilities is to define the field. What is a cardiologist? When
does a new specialty become a specialty? So they are in a sense, a very trusted
agent.

So a little bit about the American Board of Medical Specialists, actually 24
boards — The American Board of Internal Medicine is the largest of these. We
count for about a third of all practicing physicians, many of the
subspecialists, the cardiologist, and the endocrinologist, the rheumatologist,
the geriatricians are all internists. An important fact is that about 85
percent of American physicians are board certified.

So whatever the boards do and whatever you have to do for maintenance of
certification has a lot of teeth, a lot of reach.

The ABIMs mission is of the professions and for the public. We feel that we
are accountable to the public. And we work through the profession to benefit
the public.

So how do we improve quality? Improving quality is a very important aspect
of the mission of the American Board of Internal Medicine. We have something
that is called maintenance of certification. And basically, in lay terms,
maintenance certification is keeping up. It is keeping up. It is to make sure
that the physician you are seeing is as capable on the day you are seeing them
as on the day they first got certified. This has been required since 1990.

So the majority of internists out there have to maintain their certification
continuously. There are four parts. The first is maintaining a valid license.

The second is a process called self-evaluation. And the self-evaluation is
basically, you would receive questions you would have to answer, you would look
up the material, you see how you do, if you do not meet a certain threshold you
have to take the test again. But it is a self study program. It is designed
that you would evaluate where you are in terms of knowledge and improve it.

The third component is a written examination of knowledge. This is a test
that is a high stakes examination. The way it currently works is that you drive
or fly to a testing center, you have to strip naked essentially, you cannot
bring a pencil in there, you cannot bring a piece of paper in there, you have
to get fingerprinted to get into the room, you cannot bring anything with you.
And you sit there for four to six hours and you take an exam on a computer
-– very high stakes. And you can pass or you can fail it. There is no in
between.

The fourth is something that may be the most relevant to today’s
discussion, and that is evaluation of performance and practice. Looking at what
you do in practice and improving it.

So board certification is not to be taken lightly. It is important. It is
important on the quality landscape. A series of studies has been published on
some of the value of being board certified versus not being board certified.
Better outcomes and more reliable care, better quality of care, patients who
are being treated high blood pressure. Actually the time since your
certification correlates with worsen care. So if you were certified a long time
ago and you had not maintained your certification, the quality of care you
deliver for high blood pressure declines.

15 percent lower mortality, myocardial infarction, higher rates of
preventive services, lower rates of mortality for colon resection, and fewer
low birth weight babies. So by in large, maintenance certifications is a
quality measure in itself for the physician. And those of you who have
physicians, next time you go there make sure they are board certified.

So let’s talk just very briefly about parts two and three which is the
knowledge assessment and the examination. And these are very complimentary to
the performance measures. First of all they test diagnostic acumen. So one of
the things about quality measures, if you have the wrong diagnosis, if you do
really well on the quality measures, it really does not make a difference. So
if the diagnosis is acute MI, and you get all the quality measures right on
acute MI guesses what? You probably have not helped the patient much.

So this is really important that you are making sure that you are working
with the right diagnosis. The other things that the examination and the
self-evaluation modules do are they test clinical judgment which is really
important. And also allows us to really explore conservative management, things
that should not be done maybe.

That said we believe very strongly that performance measures matter. So the
ABMS as the parent organization requires all the boards to implement assessment
to performance. And we will give you an example of the ABIMs practice
improvement module, which is web based, uses NQF measures when available and it
includes a rapid cycle PBSA to address areas.

Also included in these practice and improvement modules, are patient
experiences. The voice of the patient is, we feel, exceptionally important. We
also assess practice infrastructure essentially using the NCQA PPC to see what
is available there. Also included in this are peer surveys of how a physician
is doing? So these are very broad.

So when you think about the overall landscape here and where these fit in,
if you go down quality measurement. That first of all national priorities are
not set by the ABIM. They are set by governmental agencies the IOM, NQF this is
the goal line. This is where we are headed. These are our priorities. Then
guidelines are done frequently by medical societies, researchers, develop
guidelines, volunteer health organizations like ADA develop guidelines. And
they are operationalized and endorsed. NCQA, PCPI, think tanks like Rand and
then NQF endorses them.

The assessments are developed to see whether those measures are being
completed appropriately. NCQA does that and the boards do a lot of that as
well.

Finally providing reports and feedback, NCQA does as well as the boards and
finally reassessment to see whether improvement has occurred and to our
knowledge only the boards are doing that.

I am going to show you kind of an interesting example of that. So this is
the anatomy of a practice improvement module. It has three components to it. It
has the patient survey, it has the practice survey and it has a medical record
review, actually digging into the medical records. Most of this is done now
manually through manual, hand-written records because that is what mostly are
out there. But these can be adapted to using electronic health records. There
is the performance report. The diplomat has to have an improvement plan which
is a plan, do, study, and act. And then impact on what was learned. That is the
basic principles of driving a practice improvement module.

So let me give you an example of one. This is the Diabetes PIM and a lot of
work has been put into the Diabetes PIM. And I am going to show you kind of a
testing sample of these. This was a sample of 957 physicians, most of whom were
general internists. And to do this practice improvement module, a total of
20,000 patient charts, or roughly 21 patients per physicians and almost 19,000
patient surveys, about 20 per physician were required.

So one of the things that was done here was the composite score. How well
the docs did on these diabetes measures. And the way that was done was to
convene an expert panel –- what we mean by an expert panel we mean
internists, ophthalmologist, who actually are practicing docs and who are
academic experts — who review the performance on the measures.

How well did these docs do? We review the reliability of individual
measures, selected clinical and patient measures, weighted the importance of
these measures, reviewed the reliability and reproducibility of the composite
measure, looked at the actual performance on the composite measure, defined a
“borderline candidate”. So what is a borderline candidate? A
borderline candidate is the person who is the cut point between a pass and a
fail.

We do this all the time when we certify physicians on the examination. We
use a process called the Angoff Process that asks the people who are on the
committee or the expert panelists to say, what would the borderline candidate
do on this? And finally, setting a standard for performance.

So this is an example that goes from NCQA data and these are measures for
diabetes. An eye examination — the criteria for passing this would be 60
percent of their patients, and they would get a pass for that if 60 percent of
their patients had an eye examination. Similarly, a foot examination, 80
percent of their patients would have it. And then these have been weighed in
terms of how many points you get based on how important they are. So there are
100 points possible and those are the measures.

So here is actually some data, using the data on those 957 physicians and
their patients. And this shows both the mean and the reliability of the
abstraction, but the mean score with 58 percent of them, had eye examinations
done for 60 percent of their patients or more. And you see the overall clinical
measure score was 73 percent. And this is simply going through the NCQA types
of data.

And you see how they did. They did pretty well on some, like smoking
cessation, of course you do not know what they said was smoking cessation. But
they did not do so well on others like the foot examination.

So the reliability, how that was done? No, I am not going to invent
something. Rebecca Lefner(?) can tell you about that.

PARTICIPANT: (Comment off mic.)

DR. REUBEN: I cannot tell you the details of the reliability measure, I am
sorry.

So here is where the really interesting stuff comes. Those were simply using
the NQA, how they would score it. Here they used the expert panel to actually
do a couple of things. Here is to set the criteria for the minimum acceptable
candidate. What would you expect the minimal competent internist to be able to
perform? And on an ophthalmology exam, 28 percent would be the pass rate
essentially.

And then they weighed the relative importance of that, which are a little
different than the NCQAs, but they still add up to about 100. So this is the
minimal threshold. So that if you overall score on your diabetes patients with
less than 47 percent, you would fail this threshold. You would be below the
minimally acceptable candidate.

And here is how these 957 physicians scored. So you will see that the mean
score is about 66 percent here. And that the standard, the minimum threshold,
was 47. You have these physicians here who scored below what we considered
minimally acceptable. So 89 percent were considered competent and about 11
percent were considered not to be competent. And so this tells us, even with a
self-abstracted instrument, that you can differentiate physicians who are doing
pretty well with those who do not do well.

And guess what? They looked at these physicians, who we like to call the
bottom feeders; these are the people who are not doing so well. And they looked
at their exam scores and they looked at everything else that is measures in
maintenance of certification, and they do not do very well there. So this tells
us, this is a population that needs to be focused on.

So this is just the Diabetes PIM, the most popular ones are diabetes,
preventive cardiology, and hypertension but there are a whole slew of these
PIMs. And other ones that are under development. So just about any condition,
any specialty, there is a PIM for.

And PIMs make a difference. There are 11 articles published in print, that
show the validity of these and they have clinical meaning. I will show you a
couple of them.

Five studies, including two controlled trials, have demonstrated positive
changes in care. And guess what? The re-measurement aspect, the one we talked
about there before, is that the PIMs make a difference. If you get your data
back and you develop a plan to improve it and then you re-measure, these are
for hypertension, but this is the mean 28 percent improvement when they
re-measured — for blood pressure or lipid control, medication adherence,
non-pharmacological treatment, 50 percent improvement. So, in fact, this makes
a difference.

So what about the docs? How do they respond to the PIMs? The experience of
over 5,000 physicians, 73 percent with completed PIMs, says it changed their
practice. 82 percent would recommend it to a colleague. Now it does take time.
Let me just tell you, it takes time to do a PIM. And doing hand abstractions of
your records is painful to say the least. But in fact, the docs are doing them.
If you would like to test drive a PIM, the slide here has the demo site where
you can walk yourself through it.

So finally, I would like to end a little bit with board alignment efforts.
How are we working with other organizations? How do you change behavior? You
can regulate, you can use economic forces, or you can rely on professionalism.
And the professionalism is where ABIM comes into place.

So how have we aligned? We have aligned with health plans. Names of
certification have been integrated into some reward and recognition program.
So, for example, Aetna, Cigna, a number of the health insurance companies, if
you are board certified, they recognize you as being so. You are on their
preferred list; sometimes it actually includes increased payment.

The second are Bridges to Excellence programs. What the Bridges to
Excellence program has done, is take data from our PIMs, and we are working
with them in terms of providing some help, but they are using that to provide
bonuses to physicians for better performance, using the PIM platform.

Alignment with other quality improvement efforts -– now let us just say
that you have a great electronic health record and you are generating these
kind of data all the time and you are giving feedback to docs, there is
something called an Improved Quality Improvement Pathway, where we can actually
certify a healthcare organization, like a Kaiser Permanente, to do this
themselves.

And in fact, for Mayo Clinic, we have given them; essentially, a five year
blanket to continue their quality improvement organizations and those will be
recognized for Part 4. So the idea here is that, if an organization here is
really stepping up to the plate with this and doing it well, we are not going
to force them into this widget where we are going to make sure that, in fact,
those standards are being met.

With the public sector, we have been very aligned with PQRI, the board
modules function registry, and in the Senate Finance Bill, the participation
and maintenance of certification is currently included as a pathway to PQRI.
And we strongly believe that it should remain there — this is a way of really
putting some teeth into it.

Discussions are underway currently about alignment with Meaningful Use. Once
again, we believe, as Helen was alluding to earlier, EHRs that are going to be
marketed need to have the components of maintenance and certification in there
with the measures that are going to be used. Because what we have to do and
this is critically important for physicians, is to reduce the redundancy. We
cannot have them doing 50 different things to get credit for 50 different
stakeholders.

So those are some people who have, including the VA System, they can do
these practice improvement modules of data collection component in just a few
minutes. Because those measures are in the electronic record and can be spit
out so they can see what they are doing.

Modifying and building new MOC assessment tools that align with and support
meaningful use goals — the MOCs are always a work in progress.

So just to sum up, where ABIM is and where the boards are -– they are
aligned with where the quality field is headed. The efforts at ABIM are
complimentary; they are consistent with everything that you will hear from NQF
or NCQA. We are all in the same direction.

The board requirement for maintenance certification, reaching those 85
percent of physicians which is going to even be higher, engages physicians in
improving quality of care. We have this hook, a very powerful hook to
physicians, to get them to participate. So we are really part of the solution
here.

Names of certification and other tools are comprehensive. They include
performance measures, but more, the judgment, and the diagnostic acumen. We
have data now showing that PIMs change physician behavior for the better. They
are readily adoptable and not too burdensome. I am going to say not too
burdensome -– they are somewhat burdensome at this point, but in the
future, hopefully, they are not burdensome at all.

And finally, public and private payers can leverage this existing, well
regarded infrastructure to align QI efforts and accelerate improvement. We want
to work with other organizations to make this work. So I am going to stop here
and ask for any questions. Just do not ask me about the reliability.

DR. CARR: Thank you that was just very, very thought provoking, very
interesting, very great work, so I think there are a number of questions. Mike,
do you want to start?

DR. FITZMAURICE: A very good presentation and thank you for coming. I really
got a lot out of it. The first one is — are PIMs equivalent to board
certification or are they separate from board certification?

DR. REUBEN: They are a component of board certification. So there are four
components of maintenance and certification. The PIMs are one, the test is
another maintenance of licensure, and the fourth is a self-assessment.

DR. FITZMAURICE: You mentioned 85 percent of physicians are board certified
and so immediately it comes into my mind, why aren’t the other 15 percent?
And when I look at docs, I know I can go online and find out if they are board
certified. So why aren’t the other 15 percent of the docs’ board
certified?

DR. REUBEN: So there are a couple of reasons. One, are there some that are
not eligible to be board certified? A lot of the international medical
graduates cannot sit for board certification. Others not so much now, but in
the past in the 50’s and 60’s and early 70’s, did not think it
was very important, and never got certified. And then there is that last
category of people who wanted to be certified (actually two other categories)
and could not pass the exam –- just could not get certified. And then
there is another category and that is the lapse certification. So as I said
from 1990 on, and in family medicine this has been since the inception of the
field, is that you have to maintain your certification and if you do not you
lapse and you become uncertified.

DR. FITZMAURICE: Did I see that a physician has to be recertified every four
years now?

DR. REUBEN: Well that is a really good question. It varies by board and
where the field is moving, is instead of saying that you have to re-certify
every ten years, every five years, every seven years it is really what they are
calling continuous maintenance certification. So you have to do something all
the time. So, for example, you would have to do a PIM every two years. You
would have to do a self-assessment piece every two years. But you might not
have to do the exam but once every ten years.

DR. FITZMAURICE: So if you do a PIM, then you have got something like an
automatic reminder. I was low on this from last year, so every time one of
these things comes up, have a reminder somewhere to let me do this so I get a
good score the next time. Plus, it is better for the patient.

DR. REUBEN: Yes. And the way I would think about this, is that every couple
of years I would do a different PIM. So for a couple of years, I am going to
work on my asthma treatment and then I will work on my heart failure treatment,
because you are not only making a quick fix, you are making a permanent fix.
You are doing this PDSA cycle to change how you approach these patients in
general. So if you do five over ten years, you have got five big conditions
that you have addressed.

DR. FITZMAURICE: Thank you.

DR. CARR: Just one question. Do you have an assessment of what you said that
85 percent are certified, and that includes those that are grandfathered in,
prior to 1990? Do you know how many there are in the group that are
grandfathered in?

DR. REUBEN: I think it is 25 or 30 percent. Yes, 30 percent.

DR. CARR: I think that is powerful because basically it lowers that number
down significantly.

DR. MIDDLETON: Thank you, David, Blackford Middleton from Boston Brigham
Women’s Hospital. I guess the one sort of obvious question on the table
is, if we look at certification, 85 percent being certified, and then we look
at Beth McGlynn’s data, suggesting 54.6 percent of the time, we are not
actually doing what we think we know we should be doing. So where is the
disconnect? If we are certifying, getting up to snuff on the knowledge base,
but it is not being applied, how do we address that disconnect?

DR. REUBEN: Well, essentially, that is where I spend most of my time. The
same thing if you take a look at the ACO table, the exact number for older
people. And if you look at geriatric conditions like osteoporosis, and falls
and incontinence, it is even worse. So why does it happen? There are a variety
of reasons. One is knowledge but many times it is not knowledge. It is not
knowledge. It is systems and time.

So there was a study that was published five or six years ago by one of the
family docs, and they said if you followed guidelines for ten conditions, just
followed guidelines, and you did not do anything for anybody else, it would
take you 10.2 hours every day just to meet those guidelines for ten conditions.
And that does not include what the patient came in for. So that tells us a lot
— if you are a primary care doc, then you are on this continuous treadmill.

So what we have to do, we have to keep physicians’ knowledge up but
even more so, you have to make it easier for them to do the right thing. And
part of that are actually better systems. It may be electronic health record,
its team care, it is delegation, and there are many, many solutions to that.

But the PIMs really get at that. The PIMS get at how you are going to change
your practice, not just how you are going to prove your knowledge? Because
knowledge generally is not the issue, it is generally being able to implement
appropriate behaviors.

DR. MIDDLETON: I think you are right on target. I guess the challenge is,
that when we think about kind of overcoming the inertia of clinical practice in
the heat of the battle, and you are acknowledging it may not be a knowledge
based issue, how do we actually address then the clinical requirements for the
physician to be a knowledge broker, more than a knowledge manager if you will.
So on behalf of the care team, on behalf of the patient, he or she becomes more
expert in processes of information management, knowledge access use and
application, as opposed to the repository of knowledge. Where will a
certification go to address those kinds of issues?

DR. REUBEN: Yes, that is a terrific question. And I have to tell you, based
on my own practice, we have a group practice, there are some docs in our
practice who are really great docs but I will never be able to teach them about
systems. They are never going to be leaders in system redesign. And I think
what is going to happen is that there are going to be docs who are good at
that, who are going to be leaders within groups to do redesigning of systems
and the others are going to be just good docs and they are going to follow
these systems. But you cannot expect every doc to be a system change leader.

DR. MIDDLETON: Maybe just this last one, if I may, you know, I think in this
line of reasoning, you have to think about what are we measuring with quality
measures and certification? You eluded to this I think appropriately in your
talk, but I think sometimes about the concept of sort of attributable qualities
across a care team, borrowing from an attributable risk in epidemiology, how
would you actually say well, this internist versus this nurse versus this care
manager versus this sister, brother, mom, dad, cousin at home, is actually
contributing to the diabetes outcomes of interest? It strikes me that, in a
way, we have kind of a myopic view in the traditional, paternalistic medical
model. We are assessing the physician component, but perhaps ignoring,
currently, the rest of these components.

DR. REUBEN: I think you are absolutely right. I think the way we have to
think about this is working backwards. So working backwards is — what is the
outcome that you want? You want the process to be done, you want the better
care and how you get there. And it is both. You have to have a competent
physician working in a well-functioning practice.

Sometimes you have a really good doc working in a system that is not so
good. So the really good doc can offset some of the deficiencies. And sometimes
you have a really good system that can offset some of the deficiencies in the
doc. So you have this kind of wiggle room here. But when you get to the
extremes, when you have a totally dysfunctional healthcare system or practice,
or you have a really bad doc, then you have got huge problems. Then this doc
has got to be either remediated or eliminated or this system has got to be
scratched. But within that kind of a little bit out of range, you can
compensate one way or the other.

DR. FITZMAURICE: Thank you.

MR. REYNOLDS: Now this is for both of you. Excellent, I am not a physician
and so I learned an incredible amount this morning, so thank you. But as you
know, we were involved in a Meaningful Use Hearing and then I am sitting here
with a document in front of me, the latest one, and I look out at 2015 and it
talks about clinical outcome measures, efficiency measures and safety. And then
I hear these presentations on measures and then you have somebody who has got
to set them as a group.

In North Carolina right now we are working on all of this obviously, and
about 60 percent of our small docs do not have electronic health records, and
so I am sitting here thinking about them as I listen to this whole ecosystem.
So you go to accepted use and both of you mentioned understanding, people have
to understand it. And that is everybody involved.

And then it has got to get in their work flow and it has got to get in their
capture, it has got to get in their system that they are going to buy, which,
oh by the way, we are talking about where we are going to be in 2015 and they
may have to buy that now.

And then you talked about how measures might not last more than three years.
Now again, that was just a statement, that is not an indictment or anything, it
is just a statement. And so you find yourself in a situation that the measures
are extremely important, but also how we move this entire environment to a
place, and as we make recommendations and other things, it is almost like you
have a structure of measures so that the right information could be captured
this year and then next year and if one goes away, another one comes in, so
that you are not chasing a ghost, literally, in being an implementer of
systems.

I see a lot of ghosts flying around. Because about the time I figure that
that would be important, what you are saying is that this might be important.
And the way to grab it might be important and this percentage might be this or
that. And so help some of us that are not as attuned to this as you, how are
you guys going to be, when we think about really meaningful measures?

How do we build an ecosystem, that whatever you decide is a meaningful
measure can take it and use it and use it not in ten years and not use it in
five years. And oh, I got involved in this and got my incentive yesterday, but
oops, it does not work quite so nicely tomorrow. A long winded question, but it
concerns issues.

DR. BURSTIN: Incredibly important question though. I think two responses on
the big picture level. I think one of the keys issues is that I do not think it
is so much of an issue on embedding the measures into electronic health
records, as it is embedding the key data pieces, the key data elements and data
types. The key data elements and data types are not going to significantly
change.

We could add to that quality dataset going forward, but no one would argue
that the injection fraction with patients with congestive heart failure will
not be an important piece of a measure going forward, regardless of whether
there are varieties in the kinds of medications you may use going forward. So I
think that is the first thing.

It is less about embedding the measure of logic into the system, as much as
it is about making sure you have got the right data in your hand. And I do
think that quality dataset should evolve to take on the additional important
measures that come forward to capture that. I mean outcomes for example, may be
difficult if it is just the EHR on my desk as opposed to an EHR that is
interoperable to other systems and registries and things along those lines. So
I think that is the first piece.

So I think the second issue is that you know we increasingly are moving
towards the outcome measures piece of it. So I think the less we rely on the
narrow processes, I think increasingly what we are going to see, is that those
process steps become clinical decision support. At least that is my hope.
Remind me to do the key steps along the path.

Build that into decision support systems, the knowledge management piece
that Blackford was just mentioning, but in some ways allow the measurement
systems to rely on whatever the data source may be. And I think we are going to
have to accept that fact that probably depending on your optimist/pessimist
view of the world, the next five years, for example, I am an optimist, we are
going to live in an environment that is going to be uncomfortable. We are going
to have lots of different data sources measuring people, docs, and hospitals,
whatever the case may be. I think we are just going to have to live with that.

And I hope there are sufficient dollars around to, in fact, research and
help us to understand what those key differences are. But I have no doubt that
we need to build the right kind of measures we need for the systems that are
coming, even if we do not have them in our hands yet, Harry.

But I think that at the same time, we have to acknowledge the fact that we
are going to continue to have to have a library of measures that say, if you
can only do charts, this is what you do. If you have got administrative data
and you can pull in a couple of key pieces of clinical information
electronically, this is what a clinically enriched administrative measure may
look like. And this is what it is if you have an EHR.

But this next five, X number of years, I think, is just going to be a
difficult time period that we are just going to have to accept. And those 60
percent of providers in your community probably are not going to be ready by
2011 or 2012; even to turn the button on and immediately produce their EHR
based specs. But hopefully, they will be able to do a combination of clinically
enriched administrative measures or, as necessary for some of the outcomes, do
something short based.

MR. REYNOLDS: Yes, I think I agree with Helen in that perspective. Think
about a launch to the moon -– which is what this is, basically, you know
where you are at Cape Canaveral and you know where the moon is and you are
going to shoot for it. But there are going to be a lot of mid-course
corrections. You do not just set on one trajectory. And as long as the goal is
there, and the techniques and the equipment are there to be able to make these
mid-course corrections, we are okay. But we are not going to get it right the
first go around. There are going to be a series of revisions and revisions and
revisions until we get to the goal.

So as long as there are mechanisms and data elements, then you can revise
these things. If a big study comes out saying that estrogen is bad for heart
disease or bad for breast cancer, you can switch it from one to another. So
those are possible.

I would like to say a little bit of disagreement about outcome measures
because I am always a little nervous about outcome measures because the
relationship between process measures and outcome measures is not one to one.
It is not one to one for a variety of reasons. One it takes a long time to go
from a process measure to getting an outcome. And the second is that sometimes
they do not occur. Sometimes you can do everything right and get a bad outcome.
Sometimes you can do everything wrong and get a good outcome, so some of these
outcomes are beyond our control.

So, for example, my oldest living parent died at 62. My dad died of terrible
heart disease and this and that and he was doing everything wrong. He just did
everything wrong. And I am doing everything right. But there may be something
in my genetic code that at 62, I am going to flip off. And that I cannot do
anything about.

So outcome measures are a little less controllable. Process measures, either
by getting the physician to behave well and the system to behave well, you can
do something about. So obviously, you want to get them as closely related or as
distal as you can, but those are things that you can change. You cannot change
some outcomes.

DR. BURSTIN: I think the whole point that there is also an important interim
step as we get towards getting to look at more outcomes, which is also just the
process of assessing outcomes. It sounds a little odd, but for example, just a
clinical example from where I practice, you know, we have got a small community
health center, and just got an EHR about a year ago, actually. We now have our
nurses’ aides, who have at best a high school education, doing a mental
health assessment of patients, to screen for depression. They walk in the door,
it is color-coded right in my EHR, and it is color-coded in red if somebody
does poorly on their NHQ. I know exactly who I need to intervene on.

Again, the outcome may not be at the end of the day, that I have
significantly improved the outcomes for patients with depression, although
ultimately that is what I think we need to be held accountable for, but boy, if
nothing else, you are in fact doing the outcomes assessments of patients, the
screening, the functional status assessments, to allow up to even see what
patients are on that trajectory. Until we know what patients are on our
trajectory, it is just a difficult situation.

Our end-stage renal diseases project, a couple of years ago, had two
patients on the Committees who insisted, insisted, we had to have a functional
status measure for patients on dialysis, because you feel terrible. But a lot
of the docs say, I cannot be held accountable for how patients on dialysis
feel. But as an interim step, at least, there was a requirement that every
year, all patients on dialysis have a functional assessment done. So I think
there are steps on that path that get us closer towards what is meaningful to
patients, in terms of outcomes.

DR. MIDDLETON: Just following up that line Helen and David, one of the
things that is sort of akin to the attributable quality idea, is to recognize
that actually the terminates of premature death only are impacted to a small
degree by what we do in healthcare, of course.

Larry Green not being here, I have to speak on his behalf, that he would say
sometime about now, that 10 or 15 percent of healthcare has in fact a premature
terminates of morbidity, but the rest of the determination is based upon
community and behavior and genetics and, et cetera.

So I guess what I am suggesting is that one of the things I would love to
see research focus on, is how to then take this case-mix idea and really
account for the heterogeneity of the patient population, based upon all those
other determinates so the doc does not feel that he is doing the right thing
and getting a bad outcome. Or the patient is doing the wrong thing, but still
getting a good outcome — all those kinds of vagaries.

If we could better, actually tune the measure, to account for not the
typical connotation of case-mix, but a more subtle connotation if you will,
that accounts for genetics, for behavior, for community or social exposures, et
cetera.

DR. CARR: Thank you both. These have just been tremendous presentations. And
interestingly, they come from, in my mind, I am seeing them on two ends of the
continuum, where if we work hard and less is more to measure certain things
that NQF endorse, and yet I am impressed with the engagement of physicians in
this practice management. My observations are often that as we are pulling
together, whether it be in-patient or out- patient measures, maybe you are a
physician who is involved in one case or one off here or one off there, and
there is a little work around to kind of make sure that that discharge summary
does not go out without that input or whatever.

And I am impressed by the process of looking at your cohort of patients and
looking at what you wrote, and how you manage, because implicit in that, is
your ownership of the continuum. So that when you get to the one off measure,
you have a context to put it in. And I think there has been an asynchrony with
the physicians being involved in measurement and they are often just hearing
about what you forgot, what you did not do, what you should have done. And I
love this idea of really owning the cohort.

The second thing is the whole P4P campaign in Boston. People spend an
inordinate amount of work getting from 98 percent compliance to 99 percent. And
similarly, groups are tearing practices in Boston and those that fell below 98
percent might be tier 2 and those above 98 percent. So there is a, perhaps I am
exaggerating a little but not that much, the point is that we are wasting
effort worrying about that one thing that was perhaps not even preventable,
when in fact this kind of in-depth analysis is very rich. And I think as we do
these measures, we have to give some thought to how they are used in P4P,
because when we talk about waste that is a waste. And it is at the expense of
this kind of rich look.

DR. TANG: Well I think part of what Helen and David mentioned about the
goodness of a measure, was a reflection of an opportunity and 98 percent
represents very poor opportunity. It also, I almost think, defectively says
high gaming because it is just not humanly possible to achieve that kind of
quote performance. So can I go to the next topic?

DR. BURSTIN: One brief response and I think it is an excellent point, and
again something that does not have an option for improvement, would not at
least bring forward the NQF endorsement. It just would not be a measure that we
would consider important enough to make the effort to publicly report. You may
want to do it internally for QI to make sure you do not fall off the cliff, as
soon as you stop publicly reporting it. But at the same time, it probably does
not rise to that level.

I do want to make one another, I think important point, that I think has not
had a lot of discussion, which we often talk about just the absolute value of a
measure as opposed to the trajectory for getting there. And I think,
particularly for a lot of providers who may not be at the 90th
percentile perhaps, the trajectory of recognizing a huge improvement over time,
is something that should also be rewarded. I do not think we have seen a lot of
that. It is a bit in the value-based purchasing program from CMS, some
hospitals on that path towards the trajectory, the absolute threshold, get
payment as well.

And as a safety net provider on my day a week, I have the same perspectives.
I can have somebody walk in my door with an A1C of 15 and I can get them to
eight pretty quickly. Getting them from eight to seven can often be difficult.
So I think again, understanding the trajectory, as well as the absolute
threshold, I think is another consideration.

DR. TANG: Part of what I felt was so exciting about both presentations is
the direction towards better quality indicators of performance; let me put it
that way. So in the NQF way, it is looking at the quality of the data and what
David talked about, I think that is what is in the reliability column. But let
me check my understanding of the vision, particularly on the certification
side, the ABIM, is just like in CME we know how ineffective it is to plop
somebody in a seat in Hawaii. And there is a trend towards going towards
incremental CME, using adult learning methods of, gosh the best time is when I
have a question with a patient in front of me could I get credit for looking
something up right at that time? That would be reinforcing good behavior and
better care. Your counterpart was the PIMs, because in a sense, what you have
done is, I think, is created a notion of self-examination and doing something
about it, versus sitting down for a written test in front of some computer.

Now, I think you talked about continuous accreditation or certification. If
I interpreted you correctly, you are trying to instrument peoples practice to
figure out whether they are still practicing up to date, good quality medicine.
And if to the extent that we can both on the NQF endorsement site and the
accrediting site use the same measures that depend on the same data in the same
EHRs, to essentially like, Carolyn Clancy F7, would get me both my quality
reports and my certification, if you will. Is that sort of the vision?

DR. REUBEN: That is the vision. And the vast majority of the PIM measures
and we have I think 525 PIM measures or something like that, are NQF endorsed.
Now not all measures for all conditions have NQF endorsement yet. So there are
some that are not. But, in fact, we are trying to align with NQF, so that it is
the same measures. We are all working towards the same goal.

DR. TANG: So that is where I go to, is I really am interested in the
reliability, because it stuck out that that high gaming smoking cessation
counseling measure, I believe, has a low reliability and we should probably go
away from things like that. Because it is not clear to me that the evidence
says forcing something into the documentation about smoking cessation
counseling, improves the smoking rate.

And so why create the gaming because with EHRs as you know, it is instant
verification. You can cause it to happen. And the other things that I looked
at, I noticed in your diabetes PIM, I think you showed, is the A1C greater than
nine measure, so as the not bad measure. And in a sense we are trying to go, I
think, away from those kinds of measures in NQF endorsement process, because it
does not align well with what physicians think about their own performance. So
you do not think you want to be not bad. You know what the guidelines say and
you want to head towards those guidelines and to the extent that your mental
model of what it is to do good quality measures, lines up with the measure,
that, essentially I think, is positively reinforcing in terms of the good
behavior. So in some sense, it may be instead of going after the current NQF
measures, we want to go towards where the puck is going, for the measures that
are aligned with practice and the way physicians think, and I think can really
create a positive reinforcement move.

DR. REUBEN: Yes, we have actually thought about this a lot. Not so much with
ABIM, but certainly with the ACO measures which is really a floor. That is, if
you are not performing these -– they are not a ceiling, they are not the
real goal. But if you are not performing these things, you probably are not a
very good doc kind of thing. And as you get higher and higher up that
aspirational ladder of what you really would like to have, you get much more
push back. You get much more push-back because other things happen. You cannot
achieve that 98 percent, that theoretical 98 percent. But if you say, you know
this is a floor and if you do not meet this floor, that you are not doing a
good job. How can you argue if you are in that group that is the bottom
feeders, that less than 44 percent? How can you justify that you are doing a
good job as a doctor when the floor is low.

DR. TANG: So maybe that is my real question. Philosophically, is a
certification organization like APIM going after the floor to eliminate people
or going for the goal that motivates people. And I almost think that you are
going to capture far more docs on the latter.

DR. REUBEN: So you are absolutely right. And if you take a look at our
strategic plan, our goal is to continuously set the bar higher. So here we are
today in 2009 the bar is here. In 2011, the bar is going to be a little higher.
And the bar is going to be higher in 2013 so you keep dialing it up.

DR. TANG: So the question is are you setting the bar for the floor or the
top part? I feel better about getting 60 and then 70 and then 85 towards my
real goal, versus passing over the 20 percent goal.

DR. MIDDLETON: I think Paul and I have a similar vision or aspiration here.
In many ways, I would much rather be pulled up than smacked from behind. In a
way, the low bar is the smack from behind. The aspirational goal or the stretch
goal might be that in fact, certification takes on a new flavor. It is actually
continuously monitoring, continuously educating and providing instantaneous
feedback at the point of care, so that I actually know, is my diabetic
population within the guard rails or is Mrs. Smith actually within the guard
rails of her diabetes care.

There is a problem though, I think, that we have to sort of recognize here.
If history tells us anything about the Internet, I am concerned we may see some
cognitive substitution in healthcare, in ways that have happened in other
industries. That is, many more decisions may be made by many more different
types of people. So rather than raising one bar, I would suggest maybe a
spectrum of pro-active certifications, if you will, in the manner in which we
just described, that might actually delineate a range of professionals rather
than an increasingly god-like physician.

In other industries, of course, many different types of folks are making
many different types of decisions based upon tools and utilities and knowledge
and the access on the Internet, all the disintermediation stuff, et cetera. So
they can gain knowledge, rather than accessing the provider, per se. So it is
just a thought, thinking about the high bar and a range of professionals, all
perhaps warranting certification under the ABIM, but in different roles.

DR. REUBEN: We have talked about this. ABIM is famous for talking and being
very careful and ruminating and considering many things. And in fact, we have
talked a lot about whether currently it is a dichotomous threshold you are
certified or not. But the whole idea of caring physicians and having physicians
who are exemplary and fine and good and whatever, we have not crossed that
threshold yet.

It is not that it is out of the question for the future, but it is not thee
in 2009. And part of the reason, it is really interesting, this came up
yesterday at the board meeting, is do all physicians have to be superb
physicians or can they be fine physicians? And is it okay to go to a fine
physician, or do you have to go to that person who scores in the top 10 or 20
percent. For me, I am fine with my physician being certified and being
competent. I do not need that top one or two percent. I do not gain that much
from it. So these are really good questions, but the idea is to move the bar,
continuously raise the bar. And that is right in the strategic plan.

DR. BURSTIN: I want to follow-up on that. I think that beyond even
certification, the capacity of that magical F7 button, allows providers within
a group, and I say providers in the broadest sense of the word, Blackford. I
mean, all clinicians sitting down together at a systems level reviewing the
data out of your practice. I mean to really have that in real time obviously,
with my time at the Brigham, when I headed quite a measurement, I had that. I
would hand out to all the docs our entire internal medicine practice, here are
the measures we initially had it A through Z. That lasted about ten minutes,
because we all unblended in about 30 seconds because we wanted to see who was
best.

But that is that kind of real time feedback that allows for really dramatic
systems improvement and helps you figure out — it really should be the medical
assistant who just routinely does flu shots without an order from me. I mean
that is the kind of stuff that, I think, becomes a systems piece of it and
becomes so apparent when you have the ability to rapidly report. Not a year
later get the reports from an external body saying this is how you did. But in
real time and even without necessarily having to do your own chart review,
seeing this is how you are doing on a continuous basis.

DR. REUBEN: And that is the Holy Grail for us. The Holy Grail is that
doctors in real time can monitor the care for a number of different conditions.
That said that is not going to be possible without electronic health record.
You just cannot get there.

MR. QUINN: Thank you both for your presentations — This sort of gets into
one of the thoughts that I had. The PIMs and the MOC is a really compelling
lever for motivating clinicians. And many well-intentioned and well-designed
quality improvement efforts, have not really gained physician buy-in for a
variety of reasons.

It seems to me that if you can align these two, the PIMs as well as the
measures that gain buy-in, there is an opportunity for a synergistic effect. Is
it the measures that have to gain buy-in from clinicians themselves or how do
you design this so that it just makes sense, and that you do gain maximum
uptake?

DR. REUBEN: I do not think the measures have been a problem. The measures
are based on randomized clinical trials. They are based on guidelines. They are
not the problem. The problem is getting the docs to be able to do it.

And there are a lot of issues why you know, for example, if you have an old
person who has got 12 problems, for some reason or other, doctors feel
compelled to do a lousy job addressing all 12 at each visit and then spend an
entire visit addressing one problem, doing a good job. It is just somehow or
another, how we were trained as doctors.

And there is a tremendous amount of inertia in what goes on in a
doctors’ office. A little quality improvement project I did a few years
ago, I sat and watched every one of our docs interact with patients and saw
what went on in the visit. And there was a lot of time with talking and a lot
of time with counseling there was a lot of time addressing those issues, but
the kinds of things that would be measured under quality performance just were
not there. They did not have the time to do that.

So I do not think knowledge is the issue. I do not think what the measures
are, is the issue. I think being able to get that behavior, the knowledge into
practice is really the limiting step.

DR. BURSTIN: I agree with David. And just to add to that, I think that while
the measures themselves may be okay and they are evidence based, I think, many
clinicians we view, just do not go far enough. They are not even thinking about
the 2013, 2015 kinds of measures that are envisioned for meaningful use, care
coordination, patient and family engagement, does take us to a different place.

And those are not the traditional measures that most docs have had fed back
to them or even the ones that we have had as part of our assessments, other
than patient experience, which I think many would argue, has been really
transformative for a lot of health systems to have that HCAPs data or other
kinds of patient experience data.

But I think going beyond that to get at even other levels of how often do in
fact, do you get a discharge summary back in a timely manner? Are all the labs
followed up at discharge? I know what happened. I mean, those are the kinds of
things that I think providers, and I do not want to say docs, because I think
it is bigger than that, it is really the team, would find so meaningful.

So it is not just measurement for the sake of measurement, but it is
actually measurement that is meaningful to me as a clinician to do a better job
because it gets more at the systems.

DR. FITZMAURICE: You have painted a picture of really how difficult it is
and yet how far we have come from NCQA to NQF to the board certifications and
holding people more and more accountable. I sense it is not going to get
easier. We have the quality measures and I work with Floyd at the HITSPE
population technical committee, where we worked on quality measures and
surveillance measures and since then there is this continuous, this is not
precise enough, we need to send it back to NQF to the HITECH Committee that was
chaired by Paul to get more precise measures of just what was meant in the
quality measure.

Now as we come up into 2010, 2011, 2012, 2013, 2014, we are going to be
moving from ICD9 to ICD10 and we are going to be changing some of the claims
information. Beyond that, we may be moving more and more to SNOWMED as
underlying coding system. What that means regarding difficulty for quality
measures and for meaningful use measures as well, is that the kinds of things
that the electronic health records are massaging are going to be different.
They are going to be split into parts and some things will have to be
aggregated.

Is NQF working with the quality measure developers to bring some realization
to this? I have seen some really good work that Floyd has done in seeing a
piece of the ICD9 codes, these are the exclusion codes. It is all going to have
to evolve into something better and better as we get into better and better
coding systems. Is this on the radar screen, Helen?

DR. BURSTIN: It is completely on the radar screen and in fact we are also
doing some additional work under our HHS contract that is an expert panel that
is meeting shortly to begin understanding what are those transition issues and
how do we even envision them? When all the measures come up for maintenance,
for example, by 2013, they will have to be in ICD10 or SNOWMED, part of the
determinations of what happens with the HIT Policy Committee. So that is
definitely the trajectory.

And again, this idea ultimately of saying, if you are offering a tool from
an EHR to NOVA in 2011, 2012, 2013, and you have the capacity to go to a
measure offering tool, it will automatically pull up the appropriate code set.

I think part of the challenge is that we do not often have all the
crosswalks we need between ICD9 and ICD10 and SNOWMED. But I think, to me,
those seem like technological issues that are achievable, as opposed to, I
think, the harder stuff which is actually around capturing some of the logic
and are we actually staying true to what was intended through the guidelines,
through some of those transitions.

And I think some of the measures that have been developed, again, have been
based on clinical guidelines that are not as precise or pristine as we would
like, and so some of the logic of trying to translate some of that to an EHR is
actually the harder piece than the coding. I would argue, last time, known well
of when stroke systems began. It is pretty darn hard to put in an EHR. So you
have to really rethink from scratch, how do you develop a measure that gets at
the timeliness of stroke treatment as opposed to starting with the measure that
we have, which was developed for a very different environment that is not IT
based?

DR. FITZMAURICE: Could I ask another question? And that has to do with
preparatory quality measures. If I made a health plan a group of providers, and
I want to measure quality, I may say, this is really a good quality measure,
but do I have to pay 25 cents a patient every time I apply it to the quality
measure developer for 1,000 patients that is $250?

Is there some way to handle the economics? We did it in one case with
SNOWMED. We bought out the license for five years, here is a $5 million payment
and so much per year, to use it and then at the end, we can use it as it stands
in perpetuity. That is one subscription model that could be used for paying for
quality measures. There may be others. Do you see having to pay for the use of
quality measures to be a large or a fairly small barrier to the use of quality
measures?

DR. BURSTIN: It is an excellent question, Mike. To date we have all the
current NQF measures are available without fee. There is definitely a move
towards some more of these proprietary systems, especially if you look at the
complex proprietary systems where there have been years and years of investment
in a complex risk adjustment data base. Some of those coming through may have
associated charges. I think one of the challenges going forward, if for
example, the grouper methodology is something we want to move forward with,
maybe should that ultimately become a public sort of analogous situation to
SNOWMED? I do not think we know the answer yet. But I think you know,
increasingly, we are seeing that without the public support for the measurement
side, and that I think has been a challenge for us, you know as the clinical
registries have been developed, oftentimes by specialty societies with
specialty society funds, as they transition that to being for public reporting,
how does that evolve? And I do not think we know the answers yet. But I think
it is something we are all, kind of, keeping a close eye on. Hopefully, the
additional support, as I mentioned early on that is hopefully in some of that
health reform, for measure development will get us to maintain more of that
critical measure development expertise in the public domain.

DR. REUBEN: I would just like to echo that. I think the way to do this is
really to have these measures in the public domain. And to do that, you are
going to have to pay for the development and the maintenance of these measures.
And rather than doing that on the back end after they are developed as a for
profit industry, that this is a wise investment for the government, is to
invest in the development and the maintenance of these measures. It is just
going to make it much easier to distribute and much more widely accepted.
Otherwise you get into the terrible issue of the best measure may be
proprietary, but you cannot use it.

DR. CARR: Thank you so much for this very rich and exciting discussion.
Could I ask that we could get copies of your power points? Thank you and we
will distribute them. And so I think we are going to break now actually, for a
long lunch break. A number of folks are flying in and will arrive at one. So we
will reconvene at 1:00 p.m. Thank you.

(Whereupon, the meeting was adjourned for lunch at 11:32 a.m.)


A F T E R N O O N S E S S I O N (1:10 p.m.)

Agenda Item: Current Measure Development, Endorsement,
and Adoption Process

DR. TANG: On behalf of the Quality Subcommittee of NCVHS, I would like to
welcome you back to the second part of today’s program and we had really
good testimony this morning. I am sure we will be continuing this afternoon
with really excellent panelists. I appreciate them being here.

We talked this morning about the process of identifying high priority
conditions, NQF for example, and the vision of sort of continuous certification
from David Reuben, the Chair Elect for ABIM. Now we are going to move into sort
of the measure development. We talked about there are lots of measures out
there but maybe not enough good measures and so we are trying to hear from this
panel what is the process for getting a measure developed and endorsed and how
can we even encourage, promote, and pull some more good measures out of this
system so that we can ultimately get -– now that we have this window of
opportunity with the HITECH Act and saying we are actually going to pay for
some of these measures in a substantive way in terms of relief or compensation
for implementing EHRs. This may be a big moment of opportunity where we can put
some more effort into this area because there is money on the table so to speak
for the adopter side.

With us today we have Karen Kmetik who is the Director of Clinical
Performance Evaluation at the AMA, and works with Bernie the PCPI, Physician
Consortium for Performance Improvement. Sarah Scholle is the Assistant Vice
President for Research Analysis at NCQA, a major measure developer, and Dr.
Bernie Rosof is the SVP for Corporate Relations and Health Affairs at North
Shore Long Island Jewish. I think you are the hospital that made news saying
you even up the ante even more, almost doubled the potential incentive.

I think we are really at an inflection point where we can use if we had good
measures, we can really change the face of quality and quality improvements.
And Frank and we will let Justine do because you are such good friends.

DR. CARR: Frank Opelka superb surgeon and President Executive of the
Louisiana State University Healthcare Network, physician executive, and
recognized national leader and patient-centered healthcare for the surgical
patient. Thank you all of you for being here today.

DR. TANG: Is the order that we introduce you a satisfactory one or you have
a different order that would make sense?

DR. ROSOF: Thank you very much, Paul, for inviting me and for the
opportunity to provide some comments, which I hope will be useful to the topic
you have just introduced. From my perspective performance measurement as a
science has many purposes and you have probably heard a few of them already
this morning. It should be an integral part of all efforts to improve the
quality of care. It should encourage performance improvement with the ability
to benchmark individual or a group performance against regional and national
standards. It should advance efforts to support quality improvement at the
point of care specifically by integrating measures into electronic health
records and electronic health record systems, which you will hear a little bit
more about in the a few moments. And when the data is valid, when it is tested
and it is risk adjusted where appropriate, it can be used for public reporting
to help share decision making and ultimately choice.

Now the Physician Consortium for Performance Improvement and other
organizations, are beginning an effort to advance the alignment of performance
measures to be integrated into maintenance certification programs. You probably
heard that from ABIM this morning. All of this clearly will enhance our efforts
to provide patient-centered care, the object that we are all after.

The tools necessary to accomplish this include commitment from the
profession, further research into the science of performance measurement and
quality improvement, education of healthcare professionals providing care as
the value of performance measurement, adequate funding, health information
technology integrated into the work flow of the providers of care, and a true
commitment from the academic educational community to incorporate into the
curriculum of medical students’ learning about the principles of quality
and measurement. No longer is it appropriate to ask the question is it
necessary to teach quality. It is.

Assistance for some of this is already underway with funding appropriated
for comparative effectiveness research, information technology, and a clearly
stated commitment from healthcare professionals, from professional societies,
and of course from the American Medical Association.

How does one go about selecting a high priority clinical area to develop
quality measures? There are competing pressures to accomplish this; CMS and the
PQRI program, consumer purchases for public reporting, well-defined gaps in
measured development as articulated by many medical specialties, particularly
as they become necessary for PQRI and other pay for performance type programs,
and of course accrediting bodies and maintenance certification programs among
others.

Now recognize also that the smaller specialty societies perhaps have a
little bit more difficulty in this area having not had the resources that other
large specialty societies do and so this may add to the problem additionally.

The selection process must include the necessities of developing
specifications for multiple data sources including EHRs are protocol to test
measures, are compendium of clinical guidelines and best practices to
facilitate measured development, and truthfully a plan for implementation never
should forget the necessities as we begin this process for a plan for
implementation.

A hierarchy to enable decision making in this complex setting as determined
by PCPI and others, including the National Quality Forum, includes importance,
includes scientific accessibility and acceptability and the evidence becoming
available, usability, and not only usability but interpretability of that
usability, feasibility, and to determine where gaps in measurement exist.

Now the gold standard would obviously be that the measures would be
generated from a strong evidence base, that they be clinically rich data, and
that they would employ strong risk adjustment mechanism.

Now a real concern and somewhat of a danger that I ought to mention at this
particular point is that we move forward conceptually this kind of a group and
leave behind the essential drivers both absolutely require to affect and create
change and will maintain the public trust. Those are the practicing physicians,
nurses, and other healthcare professionals. We need to be certain in that what
we do we don’t lose them as we move forward in the agenda that we are
about to move forward in and be sure that we incorporate both education and
learning at all levels so they are involved in this decision making in addition
that we don’t make the decisions without them being involved.

If we look at the first slide, what we are interested in doing is increasing
these numbers. Only one physician in five receives process of care data. It
kind of builds on what I was saying just a few moments ago, and less than one
physician in five receives clinical outcomes data. We all know that physicians
like to respond to data. They like to have accurate data. They do not want to
be outliers and they respond very, very effectively to data if it is provided
appropriately.

Also, in terms of meaningful use of electronic health records meaningful use
needs to be meaningful to all healthcare providers and that includes all
healthcare providers. The goal probably would be to engage all physicians and
healthcare professionals from varying specialties in meaningful use of measures
and meaningful use of electronic health records.

Now the PCPI is convened and staffed by the AMA. Let me go through this
although many of you may know this already. Membership consists of more than
125 national medical specialty and state medical societies, ABMS and its member
boards, CMSS, the Council of Medical Societies, AHRQ, CMS, Joint Commission,
NCQA, and NQF, are also active participants and represented at PCPI meetings.
Experts in methodology are an integral part of the PCPI and 13 non-physician
healthcare professional organizations participate actively at PCPI meetings. In
addition we now have a healthcare consumer/purchaser panel, which I will chat
about in just a moment.

Our current measures portfolio consists of 42 measurement sets, 260 plus
individual measures, and approximately 70 percent of the measures in CMS PQRI
were developed by the PCPI.

In terms of the consumer/purchaser panel, fresh out of the press we had a
meeting with the consumer purchases, Peter Lee, Debra Nest, David Hopkins who
were part of that meeting. As part of that discussion we brought up the issues
of measures that matter, consumers and purchaser’s views of uses and
users. There was some consensus related to this, so I thought I would bring
that to you this morning and the consumer purchases felt and that would be a
good opportunity to make this part of the presentation.

The consumers and purchasers want measures that are useful for
accountability and for quality improvement, performance improvement. These
include outcomes, for example, functional health status, morbidity, mortality,
et cetera, composites of multiple process measures, resource use, care
coordination, patient experience, measures that taken together provide a
comprehensive picture of provider’s care that has become more and more
important as we move this forward, measures that show gaps and/or variation in
care, and also measures that show disparities among different populations. We
are the gaps in equity that impact the delivery of care and impact
specifically, outcomes. A very important point, maybe a little difficult at the
moment to include as part of all measures but has to be a goal as we move
measurement forward.

In terms of the criteria for a topic selection the required characteristics
I mentioned briefly, but important also are gaps and variations in care.
Remember that the measures we have been able to accomplish up to the present
really are measures that are more appropriate for large specialty societies.
The small specialty societies have been not involved as much, but must be and
those gaps are obvious and we will move forward to correct those gaps. Evidence
base and high impact.

In terms of high value characteristics, care coordination is extremely
important. Patient safety is involved where care coordination is eliminated,
specifically when it comes to transitions of care. So care coordination is an
important aspect, patient safety and appropriateness and overuse. As you know I
chair the Overuse Committee for the National Priorities Partners, and part of
the work flow for the PCPI going forward is to coordinate the effort between
the National Priorities Partners and our development of measures as we move
this forward.

Our portfolio of measures, again, to make mention as you can see. There are
measures across all specialties recognizing once again that there are clear
gaps that we will make efforts to fill as we move forward, but we have tried to
create measures that are cross specialty and cross cutting so that it can be
used by a variety of practitioners.

Now the current work plan is to fill those gaps to include specifically
appropriateness topics. Now we can say appropriateness. We can call this in
addition, overuse and appropriateness. For example, the surgical and
nonsurgical management of back pain, which is a target that we have with the
National Priorities Partners, a percutaneous intervention for chronic, stable
coronary artery disease, maternity care, for example, induction of labor and C
sections, sinusitis with antibiotic prescriptions and sinus radiography, and
diagnostic imaging. Diagnostic imaging seems to be a theme as we move forward
in the overuse arena.

Important to recognize when we discuss overuse, it is always overuse while
delivering appropriate care. It’s not overuse in itself as a single word,
but it is overuse in the delivery of appropriate care. We don’t want to
misuse the word overuse without understanding that this is for the delivery of
appropriate care.

Currently also in the work plan is care coordination. Phase I is transition
from hospital discharge to home or other facility, transition from emergency
department, this area of transition, which is so important, and phase II
transitions across ambulatory care, outside of the hospital environment where
transitions are as important.

I think if we look at the current work plan to round out our measurement
sets, let me give you an example for the Heart Failure Measurement Set because
that is exactly what Karen is working on or has been working on recently, they
want to round that out by including outcomes, intermediate outcomes, and
process. Once again we get into the area of appropriateness. But we can’t
always talk about appropriateness and overuse without considering under use in
addition. There are many areas. A quick example is in the treatment of
pediatric asthma where under use of medication is a specific concern and so
under use is sometimes as important as overuse. And also we want to be certain
that we include both the inpatient and outpatient arena.

As we move forward the current work plan is also to integrate into EHRs.
Karen is going to talk to you about that in a moment. We are working
collaboratively with NCQA. We have a very specific collaborative relationship
with NCQA in developing certain measures particularly the measures of
appropriateness, and with NQF once again to try to coordinate and align our
efforts with their efforts in the National Priorities Partnerships goals.

We are working with the EHRs vendors to incorporate the use of measures
within the electronic health records, and also we haven’t forgotten the
physician users of EHRs. Actually here is the group that is going to implement
and make things happen we hope.

That is briefly the overview I would like to present. Karen will go into a
little more specifics related to some of the work we are actually doing. If
there are any questions we will be happy to answer that in addition. Thank you.

DR. CARR: Thank you. That was a great overview. I think what we will do is
go through all speakers and then come back with questions.

DR. KMETIK: Thanks very much for the opportunity to share some further
information with you about what we are trying to put in place and
operationalize so that we do have those measures that are meaningful to
different stakeholders and we are getting them into electronic health record
systems. On the next slide is where I want to overview with you today is a
model that we put in place into our measured development process to try to move
forward in this arena.

I also wanted to throw out a proposal about how we might take some of the
lists that are out there now of measures and maybe make it a little more
tracking of progress, a readiness kind of list, and then lastly, to go into a
little more detail about Bernie mentioned about specific areas that we are
focusing on to try to fill the gaps.

When we thought about this activity of building on the core foundation of
the PCPI, which is we have all those specialties around the table. We have
those 13 other healthcare professionals around the table. We have a consumer
purchaser panel. We have consumers, patients, and employer groups on every work
group. We have that foundation. How can we move it forward in the right
direction?

We put this model in place where we said for thinking about the measures and
putting them into electronic health record systems and taking advantage of that
new rich clinical source that we have all been hungry for so long, we said
first we are going to start with our core base that we have these folks around
the table and we have this growing portfolio of measures across specialties and
subspecialties.

We want to make sure then for the measures that they work on that they are
covering those critical areas. We want to then make sure we are developing the
appropriate specification so that those measures can be integrated into EHRs.
We want to vet that with those physicians who already have the EHRs, as well as
EHR vendor community, sit down around a table, and roll up our sleeves. Does
this work? And then our notion as part of our model is to have a series of what
we call incubator groups. I will explain a little bit more about that. That is
sort of the model we have in place to make sure we have the right people at the
table where advancing, evolving the types of measures that everyone is looking
for and we are building them in a way that they can be integrated into EHRs.

If I just go to the next slide, this is just a visual representation. Now, I
am from Chicago and we lost the Olympics, but we are in denial. We are using
rings in everything that we do because we are just in denial. But this is just
a physical representation of the model I talked about where in the center there
the one and two, that is our body that we work with to develop the measures. We
specify the measures. We then vet them with the EHR vendor community in circle
number three, and then around the periphery there the fours, are these
incubator groups, which we think are so important and again I will explain
those in a minute.

In the next slide if I just take an example and I am building off the heart
failure example that Bernie mentioned. This is a measurement set that we
developed with the ACC and AHA and it has been around for a while now and
primarily up until now included process measures.

What you will see coming out soon then is a new set of measures for heart
failure that includes inpatient and outpatient, as Bernie said. It includes
measure of overuse. It includes a functional assessment type of measure. We are
building them out as we said and it is really gratifying to be able to pull
those groups back together who have learned and are now ready to go in that
next direction.

We are also benefiting from some of the researchers. I don’t know if
you have seen the article by Steve Persell from Northwestern who is saying,
let’s take of that data. Maybe it’s not just what is the blood
pressure but how many drugs is that person on. What are the doses of those
drugs? What have you done recently to try to control that blood pressure? We
are trying to take advantage of the research as well. If I look at one measure
out of that set just by way of example, is the ACE/ ARB measure for LVSD.

What do I mean by saying we are trying to develop the right specifications?
In working with Floyd at NQF, we are using this language so we can talk to each
other and not get tripped up too much. We have level 1 EHR specifications we
call them. That means we are going to take that measure in words and we are
going to turn it into all of the code sets, the algorithms, the calculations,
the rules, that one would need to be able to use that measure in an EHR
environment. And I will be honest. Right now we are going to provide every code
set option because the nation has not exactly landed on the code set nor have
these code sets been built into every EHR product yet. We are tracking that
with HITSP and others, but right now we are going to put it down there so
everybody can see. We are going to look at are there SNOMED codes that work,
are there the ICD-9 codes that we need, are there CPT codes in the area of
drugs. We have got to give both NDC and Rx Norm right now. Happy when everybody
jumps on one bandwagon we will be right there with everyone. We are giving the
CPT-II codes also, as an option as well as some of the codes there that can be
used for the exception reporting. That is what we call level 1. That is now a
product of the PCPI and we are doing much of that in concert with NCQA and NQF.

Level 2 specifications we call something different. That is putting it into
-– I just call it an IT savvy format and that is for the programmers at
the EHR vendor companies to be able to take our specifications and put them
into their products without having to rewrite code. We came up with a prototype
for that. It is now going to a standard development organization with NQF
sponsorship and we are tracking that and we are ready to take that step when
again everyone lands on what that should be. Those are two parts.

If you go to the next slide then, the evaluation. We think this is an
important part and that is why we built into the PCPI’s model. We want to
talk early and often with the EHR vendors, not that we want to be limited by
what the products are today, but to be able to have that conversation to say
you know, we really need to someday have in that system not only prescribed
data but dispensed data. If you are doing the ePrescribing can that be
connected. We are trying to have those conversations again toward being able to
move toward a next generation of measures. We are having those conversations as
Bernie said, with physician users right now to say how are you using the EHR
now? What pops up on your screen? Where in there are you reminded about the
aspect of this measure and can you report the data? We think that is an
important part of the model if this is going to be successful.

The next slide then talks about these incubator groups that I mentioned,
which are just something that I love personally because it gives us a chance to
really sit down with a group of physicians who have electronic health records.
We give them the actual national specifications. We give them those level 1
specs I talked about. We collect data. We have it sent to warehouse and we can
analyze where are the issues. And again, we think this needs to be part of the
model of measured development that it needs to be yes, the expertise, the
evidence base as Bernie mentioned to have the measures, to put them into the
specifications, to vet them with those who are going to use them, but then we
need some real live groups that can give us that rapid feedback to say you know
what, you described it this way and that ain’t working or how about this
way.

We feel honored that we put this group together actually four years ago now
with some grant funding, different practice sites, different specialties,
different EHR products and that is critical, different EHR products.

On the next slide just share with you a little bit about what we have been
able to find from that. Again, if you just stick with that measure as an
example, the ACE/ ARB measure, discrepancies between the NDC codes and our
measure specifications and the NDC codes that are in different EHR products in
those practice sites. Even if we could have the exact same coding, first of all
it depends on the EHR vendor sending the NDC update to the practice site, which
is on all different schedules, and then the practice site actually installing
the update.

We found that we had errors not because anybody was doing anything
intentionally wrong but because we may have had what we thought was the most up
to date NDC list for ACE and ARBs, but that list had not yet been uploaded in
those practice sites. That sheds a great light on something. RxNorm could
really help with that and conversations I know, are going along nationally
about that.

But we would never be able to articulate this without those incubator
groups, without actually trying it seeing what happens. In this scenario for
this incubator group for the cardiovascular measures we have data now being
sent quarterly from these practice sites to warehouse. They are able to use the
data themselves of course, first and foremost in their practice. We can
calculate performance rates. We can calculate exception rates and we can
validate the data.

In this scenario for exceptions reported to the warehouse, we went in and
took a sample, manually abstracted the data from the EHR, had a hundred percent
agreement. It was reported we found it. The same for if the data warehouse said
the measure was met, we were able to validate that 90.48 percent of the time.
Then we also wanted to validate what appears as a failure. When the warehouse
said this measure was not met, we went back in and actually there we see a big
disconnect. In only 19 percent of the cases when it was a failure in the
warehouse reported to the warehouse, was in fact a failure when we looked at
the data and the mismatch was things like the NDC code. The drug ACE/ARB
actually was prescribed, recorded in the medical record, but it wasn’t
exported to the warehouse because it didn’t pick up the right code.

These are things we certainly want to figure out and solve before we go
national with this and put different aspects of importance on the different
data. I wanted to share that with you and that is what we mean by incubator
groups. We have one cardiovascular care. We have one HIV/AIDS measures. We
would love to be able to have a dozen of those in different disciplines. That
we believe would be very powerful.

I will move now to another item I wanted to mention, which we want to build
into the PCPI process, which is tracking the progress. It is one thing to say
these are the measures that are important and we have covered all the bases of
a set and that is down the left hand column there. You want process. You want
intermediate outcome, cost/utilization so we can track that. But I would
suggest that we needed to also then track the readiness for sort of going live
in a national meaningful use program. This is just a simple way in which we are
trying to keep track of it to say we got the measures. Have they been
NQF-endorsed or is there a chance to go through NQF. Do we have the level 1 and
level 2 specs? Have they been vetted by the vendor and physician communities?
Have we tested it in the incubator group? And then are all the fields that we
need in X percent of EHRs today? There has to be some comfort level we have
that on the measures that we land on, these are ones we want, have all these
steps occurred and then we at least have a comfort level. I am suggesting that
maybe it would be helpful to us as we move toward defining more the measures we
want to define in meaningful use we start to track it this way as well.
Everybody is a comfort level and it’s not well that’s a great measure
but we are nowhere near. Well, let’s see. What pieces along this continuum
do we have or are missing?

Lastly then, there was a comment about we got a lot of measures but maybe we
still don’t have the compliment that we need. Again, we feel like we got
the people around the PCPI table. We are going back to all the groups and
saying where we have sets let’s round them out as Bernie and I use that
term meaning as we did for heart failure, let’s add what is needed there
to make that a full set, bring in the inpatient as well as the outpatient, et
cetera. Some particular areas that we are putting a focus on in addition what
Bernie said just to put a fine point on it is, the care transition measures
that we have done and are through NQF right now going through the process are
from the hospital side to another location or to home. And a big part of our
effort going forward are ambulatory care transitions, just the hand offs, the
referrals to specialists, et cetera. We think that will be very valuable to a
lot of different specialties and sub-specialties.

We are also putting emphasis on pediatrics. I think everybody would agree a
big shortage of measures in that area, and then again looking at the different
sub-specialties that still don’t even have that core yet that is going to
be meaningful for that patient population.

I will wrap up. I am sorry if I have taken too much time, but again our
thought is for the measure development side, to have a model in place to get
where we all want to go and we feel that it includes working with the
specialties and sub-specialties, building on the measures that we have,
rounding out those sets, new measures where we need them, making sure we have
the level 1 and level 2 specifications, vetting them with the vendors, testing
them in these incubator groups, and tracking this progress. I think it is very
transparent then for everyone to know where are we and where do we need to
focus next. Thank you for the time.

DR. GREEN: I am Larry Green. I am the member of the committee, member of the
subcommittee and no conflicts. Is that the routine?

MS. SCHOLLE: Good afternoon. I am Sarah Scholle from NCQA and I am happy to
talk with you this afternoon about the work that we have doing in thinking
about how we can expand our measurement opportunities using EHRs and health
information exchanges. What I wanted to talk about today were the steps in
creating eMeasures and some of the activities that we are working on to try to
develop new measures that take advantage of the capabilities of electronic data
sources and also our process for updating.

Today what kinds of measures do we have and what data have we been working
from? Our measures have focused on retrospective review where we are looking at
care after it has occurred, usually with a single point in time over a set of
period of time where we use a specific threshold; for example, is blood
pressure less than 140 over 90, and where we are thinking about multiple levels
of healthcare but that means different data sources for the same measures
because we are looking at different organizations or people.

The data sources that we are using are most often claims data, visits,
procedures, and labs. To some extent we have electronic lab results available.
Sometimes less frequently, we have clinical data, like the results of labs and
radiology or CPT category II codes or medical records data or patient survey
data. The measures that we have today are framed in our data sources today, and
what we could do with those existing data sources.

But as we look to the future we are thinking about a different measurement
setting and different capabilities, different data sources. We are hoping that
the measurement will be concurrent with clinical services so that we will be
able to influence care at the same time that we are measuring and monitoring.
It will be linked to real time clinical decision support so that we will be
giving clinicians an opportunity to look at the guidelines and say, what should
I be doing. We will be working with the data source that is not dependent on
who you are measuring. We are not using health plan claims data to measure
health plans and medical records for physicians, but trying to think about
electronic data sources that can build up and be used at multiple levels, and
it brings us the opportunity to look at more clinically relevant measures. We
can look at change over time. It is really hard to do in a chart review or with
claims. We can look at actual levels and the amount of improvement, instead of
trying to say did you pass the threshold and then if you are one point below
the threshold we don’t give you any credit.

We can look at multiple values. We can look at treatment intensification. We
can try to stay not just is the blood pressure at goal but also we can try to
take into account the different situations where well we tried. We did
everything the guidelines said we should do and this person just hasn’t
come to goal. We are giving more benefit and recognizing the efforts of
clinicians to really do well by their patients.

The data sources that are available in the future we think will be claims
combined from multiple health plans. There are some real advantages to claims
data. That is where you get the payments and some services that are better from
those claims data but also electronic records, electronic patient surveys,
personal health records. In the future we think the data sources are going to
be richer and that we can even dream of an environment where it is all linked
together and would allow us to really look at care over time, patient in a
patient-centered and population-based way.

What does this mean for measured developers and evaluators like NCQA and
PCPI? Immediately what we need to do is convert our existing measures into
measures that can be used in this electronic environment. At the same time we
need to be thinking about creating new measures that can really capitalize on
what electronic data can offer. And moving to evaluation models that take into
account the electronic data collection and outcome measures, so that we are
able to look at the full range of care.

But there are a number of issues that we need to be thinking about as we try
to move forward. What are going to be the formats for EHR-based measures? Where
do we look for the data in the EHR? Does the diagnosis come from the diagnosis
field in the problem list? Does it come from the medications that were
prescribed from the lab results? And in EHR you can use any of those. What is
the hierarchy for data searches? Does problem list trump medication list
whatever? What code sets should be used? Should it be concurrent or
retrospective? Should we be thinking about care for this individual patient at
this point in time? Did you do the right thing today or should we be looking
retrospectively? One of the reasons that will get complicated if everybody has
sort of a different look back period because we are looking at one patient came
in May and another patient came in December and then we are trying to put that
information together. Should it be visitor population based? How do we update
the measures, the codes? Research changes. The guidelines change. We need to
change the measures. What is that whole process? As we move forward these are
going to be some new challenges for us.

How does meaningful use in an EHR setting in this new world how does that
change our measure development process? This slide shows you on the left the
traditional measure development process from the review of evidence, develop
clinical logic, data sources, evaluate feasibility, field test, specs. In that
traditional measure development we have different specifications depending on
where it is being implemented. We have heat is for health plans and we have
heat is for physicians.

In the meaningful use environment a lot of the steps, the basic steps are
similar, but we are going to have to after we review the evidence and develop
the clinical logic we are going to have to identify the data elements that we
need, where we want to pull them, what are the source codes. We need to put
those into an XML format, a format that is machine readable, test with EHRs the
way that Karen described the work that they have done with these incubator
sites and then provide vendors standardized and encoded measure specification,
machine-ready specs. We are really looking at moving into that process.

There is a proposed draft standard for what this should look like, the
eMeasure or HQMF which is Health Quality Measurement Framework, and that is the
model for how we are would get to this electronic measurement. It is a
structured representation of the performance measures using XML to tag the
elements and this is what would allow us to import data elements and measure
logic into EHRs. This is the specification that needs to happen to allow the
EHR vendor to say I have taken this logic and now I can spit out a report of a
performance measure. What it will look like in practice is that you go from
what is on the left here, which is the HEDIS specification for A1c poor
control. It has all the instructions. And then on the right side that is what
the XML code would look like that. I can’t explain it to you, but Shane
can read it.

What is our path for retooling our existing quality measures? Right now we
have a number of quality measures that have been developed. They have been
endorsed. They are in implementation now. Our steps are we need to make these
ready for pulling from EHRs. We are actually working with PCPI and with NQF to
convert the specifications to basic EHR value sets and logic using the level 1
kind of EHR that Karen described. We will be reviewing those converted
specifications early next year with NQF and beginning to test those measures
and incorporate them into EHRs. This depends on vendors working to incorporate
those measures and for vendors to be able to report those measures out in a
standard reporting framework. That would be the path to get from the measures
we have today into measures that can be reported out of EHRs.

Right now this support from HHS and NQF we are actually doing this. We are
beginning to convert our high priority measures to EHR ready measures. We are
looking at 35 existing measures that would be available for use in 2011 going
through those steps I just mentioned.

That is sort of taking what we have and making it EHR ready, but what we are
really excited about are the new opportunities. With these electronic data
systems we have an opportunity to really think about measurement as enhancing
working on a number of different fronts. If you think about the measurement
process I said, it started with the evidence development and with guidelines
and then we develop measures. But what we have here in this middle of the chart
you have clinical decision support, performance measurement, patient decision
support, and patient education materials. With electronic systems our vision is
that these things would all work together and that the guidelines and that
information would be available to the clinicians and to the patients and it
would be used as the basis of performance measurement and that you would use
the information to track the results and track outcomes, track what happens
because a lot of those guidelines aren’t going to apply to every patient
and a number of subgroups of particular importance of low income or different
racial and ethnic groups are not going to be represented in all those
randomized controlled trials that are used for guideline development.

So having information that comes out of the electronic systems to help us
understand and learn what is happening in the real world, and then being able
to feedback that information into updating guidelines and again updating
performance measures, clinical decision support, patient decision support so
that you really get to an ongoing process of quality improvement that builds on
as we are learning.

There are a number of areas where we are really interested in trying to use
the capabilities of electronic systems to support new measurement and
development, overuse, care coordination, treatment intensification, and I will
talk about these in a minute. I am going to hold the discussion of care
coordination because I will be talking with you about that later this
afternoon.

But we wanted to talk a little bit about the priorities for meaningful use
measures that have come out and where our work or the work that we are doing
with the PCPI is trying to address these issues and where it is building on the
opportunities that EHRs bring to us. In imaging that is where we are looking at
overuse measures building on the kinds of appropriateness criteria that have
already been developing. Patient experience, how can we use electronic systems
for that? Some of these items actually may better fit in tools like the
physician practice connections, patient-centered medical home so they become
structural requirements for how EHRs are used. That applies to some other
things like home monitoring and comprehensive patient data. But we are also
working on readmission measures, care coordination, and things like preventive
services, having comprehensive preventive services.

I wanted to just touch on a couple of these activities that we think are
particularly important. First, overuse and appropriateness. With support from
AHRQ, we sponsored a national working meeting in June to think about the issue
of measurement of overuse and appropriateness. This is a difficult topic. Some
of the early work that RAND did on measuring appropriateness of care that was
done 30 years ago and it hasn’t led to measures primarily because of the
difficulty and the lack of feasibility of doing the detailed chart reviews and
then also the reliance on an expert consensus database and the challenges of
doing that.

Nonetheless given the concerns in the economy and the national priorities we
heard a lot of interest in moving in this direction but that we should proceed
with caution and focus on overuse measures in a particular area and we are
working with PCPI on that, but also thinking about how we might take those
appropriateness criteria, building on research like the work that American
College of Radiology and American College of Cardiology have done, that
required detailed clinical data, often detailed clinical data both from
ambulatory and inpatient settings and to be able to develop new measures and
think about the opportunity for measurement and decision support.

Another opportunity that we are involved in is looking at the Archimedes
model, which maybe some of you are familiar with. What is interesting about
this model and it is very different from the way our current HEDIS measures are
set. This approach combines clinical decision support with measurement of
outcomes. Instead of saying the guidelines say everybody should have a blood
pressure less than 140 over 90, you calculate a specific risk score for a
patient and you adjust and it provides decision support to the clinician based
on that patient’s particular characteristics. We are working with Kaiser
in Hawaii to test this.

Then some other opportunities for measurement relate to developing new
outcome measures that take into account risk adjustment at the physician level
and then looking at treatment intensification and the opportunities for using
electronic survey models to get to patient experiences.

Finally, I wanted to mention the idea of updating measures. In some ways
having electronic measures there should be some opportunities for making
updating more feasible because you can make it electronically available to
vendors but it depends on having a process for updating. NCQA formally
reevaluates all HEDIS measures at least every three years but if there is new
clinical evidence we might update it more frequently. We have updated our
diabetes measures every year for the past three or four years. It is going to
be important that there is support in opportunity to do that kind of updating.
It is an important role for metric developers. Thank you.

MR. OPELKA: For those I haven’t met, Frank Opelka with the American
College of Surgeons at Louisiana State University in New Orleans. Going last is
great because all the important things have already been said. I will try and
shorten this and hit some highlights that might perhaps open up some dialogue
in some other areas for measure development to enhance specific issues for
quality improvement and to help patients with patient decision making as they
move forward through the healthcare system.

I am involved with several different aspects of performance measurement at
home and I am in a 10-hospital safety net hospital system. We have a very
elaborate ambulatory patient measurement system that pulls clinical information
together on about 8 to 10 chronic diseases and then in house, in the hospital,
the inpatient side, we have a very elaborate system that looks at current
measures that are being used and that whole program is in a learning network
across all 10 hospitals for quality improvement. That is separate from some of
the other aspects I will talk about today, which is from the College of
Surgeon’s standpoint, where we have multiple registries, some of which are
very old. We have the National Trauma Registry. We have the National Cancer
Data Bank that the college runs in conjunction with the American Cancer
Society. We have a new registry that is out there, the Trauma Surgical
Improvement Program. We have an old system that is about 10 years old for the
college, but it is about 20 years old given its VA history, and that is the
National Survey Quality Improvement Program. Those are all measurement systems
that we have that are up and running today. Some of them have high validated,
highly reliable data. Some of them have very poor data entry points and thus
are problematic when you get to the point of performance measurements and
quality improvement because of the problems with the data itself.

I will speak more about the procedural-based care because we have talked
about global aspects of measurement and I want to just focus a little bit since
there is a whole different realm of I think, performance measurement when we
get into procedural-based care and it has a much harder drive for outcomes.
This is much more active, aggressive, short care that needs to have some kind
of aspect of outcome assessment. We are very strong proponents of those being
risk adjusted.

The National Surgical Quality Improvement Program currently over 30 days
captures about 130, 135 data elements. About 30 or 35 of those are just part of
the risk adjustment. We have now actually taken that system and deployed it. It
is in 300 hospitals across the country. We find a lot of problems with that
kind of intensity of measurement.

First of all we have learned that we don’t need 35 risk adjustors. We
could probably get by with -– in fact we have shown we can get by with
about six to eight risk adjustors in the care of the surgical patient. That is
a big help.

Secondly, we currently sample multiple procedures, but the sampling is only
around maybe 20 or 25 percent of the total procedures that we are looking at in
a hospital. One of the things we have done is said, can we assess a hospital
and its physicians by decreasing the number of samples we get and increasing
the number within a given particular condition. If we did 80 to 90 percent of
all colons, rather than 25 percent of all colons, and picked up a whole bunch
of other background noise of very low volume procedures, we have actually
intensified the high volume, high risk areas in their sample size and shortened
the number of data elements that we are collecting for risk adjustment, which
allows us to actually create a much more I think, detailed view and a less
expensive view from the hospital standpoint for performance measurement.

I think the second key area once you get into procedural-based care, and I
didn’t include patient safety on here because I think that one is already
out there, but the one area we haven’t put enough focus on is
appropriateness. There are huge challenges in appropriateness of care, but
perhaps we are trying to take too much of the elephant. Perhaps we need to take
a smaller bite and get down to something that is actually more easily
collected.

Then finally I think if you are going to put this into some meaningful
aspect on the patient side and perhaps other stakeholders in this, it is to
create a composite and put composites together to try and bring it all together
on a particular condition for a patient in the procedural-based world.

I have mentioned the NSQIP. It is facility level measurement at this point
in time. We have done some modeling at provider level measurement, but you
really run into sample-sized problems. If somebody has only done 8 or 10 of
something in a given year, can we reliably talk about performance and can we
reliably assess where quality improvement efforts need to be trained in one
individual? We have not found that to be successful. But if we take the entire
group of people in a given hospital who are doing this we can literally make
movements for huge efforts in quality improvement. We have demonstrated that
now by creating several different learning networks across many of our
hospitals that are in the same region. We have a learning network in Tennessee.
They all collaborate. They see their own data and then they see the group
aggregate data and it has made huge efforts towards quality improvement and
enormous cost savings.

If you were today to take all the complications that NSQIP has prevented in
300 hospitals, and put those in the major hospitals across the country, you
would be saving somewhere in the neighborhood of $50 billion to $80 billion per
year in this country in the cost savings from quality improvement. That is not
hitting every hospital. That is just hitting some of the major hospitals. The
potential savings are astronomical.

What comes from this though most importantly, is this is trusted and
meaningful to the providers. The reports they get and put in their hands are
actionable. In measure development we really have got to be thinking in terms
of actionable measures. We have measures out there now in perioperative
antibiotic use and then we try to correlate that with SSI, Surgical Site
Infection. Well, there are so many other drivers for surgical site infection,
perioperative glucose control being one of them, but the patients’ other
comorbidities, all of those factors, the surgical skill and judgment, which we
don’t have a good way of measuring today.

When we look at antibiotics and SSI and we try to make that correlation
since it doesn’t correlate to the provider there is not a huge effort in
there to correct it. It is more to the hospital get this standardized and
reliable fashion but it may not have the impact we are looking in SSI. In
looking at the current CMS data the fact it is showing there is no correlation.
Great antibiotics still had significant surgical site infections. Poor
antibiotic usage didn’t necessarily have significant site infections. We
have to make sure that what we are measuring meets that level of
appropriateness and meaningfulness at the provider level.

We talked about a lot of the comments have been about updating all the
aspects of measures and how we put them together. The risk adjustment actually
requires updating. We have looked at the NSQIP today and we went in and we
looked at the beta weighting of all these different data elements that are
collected. They were based on a VA population and there in the VA population
albumin was a huge clinical driver of outcome, but it is a poorly nourished
population that has a lot of alcohol abuse and it certainly does show up that
albumin is a key driver and it gets heavily weighted. So that beta weighting
needed to be changed to meet a particular market that is being assessed for
that improvement and those kinds of changes are ongoing if you look at the
different variables that are a part of the risk adjustment.

A couple of comments on appropriateness. I think the best way to approach
this is to be condition specific, evidence based. From a procedural basis I
have thrown up two examples just to think about this and Bernie already
mentioned this. Spine surgery for low back pain after failed medical trial in
patients with no urgent triggers. There is a definition of appropriateness and
we could enhance that through a panel of experts with the evidence behind it,
but that is the kind of thinking that if we put that out there as a measure of
appropriateness that we think that that will actually improve the quality of
care that people get at the right care at the right time for the right reason.

Cholecystectomy, a very common operation. But if you look from laparoscopic
surgery forward, there has been a change in cholecystectomy. We used to operate
on gall bladders that had gall stones and if you were diabetic and
symptomatical. If you were diabetic we did it if it was asymptomatic because of
the problems in discerning symptoms in the diabetic. If you were in the ICU and
you septically ill and you had gallstones and an inflammation of your gall
bladder, we called this acalculus cholecystitis and it was an indication for
surgery.

Today a lot of patients present with right lower quadrant pain and they have
biliary dyskinesia. There are no stones present. There is some mild thickening
of the gall bladder. They might or might not have a gall bladder ejection
fraction determined and they might or might not have that gall bladder ejection
fraction related to reproduction of their symptoms. But there is a lot of
cholecystectomies performed for this acalculus cholecystitis and in fact that
is an area that we should explore for –- is that most appropriate. Do we
have the parameters for defining that in a most appropriate fashion? How does
this level of appropriateness merge and fit into the issues of comparative
effectiveness and how do you then push that into patient shared decision
making. These measures are being developed for the entire downstream effect and
the point of care and I think it makes a clinical difference for patients and
it is a big gap in what we do.

Patient specific factors, however, really confound us. That is where
appropriateness gets to be a mess. If somebody has an inguinal hernia, you
could say well it is appropriate to operate on someone with an inguinal hernia,
but then you start to change the story. They have an ejection fraction in their
heart of less than 20 percent. They are 90 years old. They are bed ridden and
that hernia never bothers them. Perhaps it’s not appropriate to operate on
that hernia.

Those factors in trying to develop measures for that level of detail I think
will fail. There are just too many confounding variables to try to put that
together. But a good beginning is to start with condition specific evidence
base appropriateness. I think that can be done in a meaningful way.

I also think that when we start with developing measures particularly in
areas of procedures, we have to look at this as it all comes together so that
when we are beginning the measures we can see how it is actually going to
complement and fulfill what that patient needs and their decision making, what
other stakeholders are looking at and want to see for the patient. There is
value in structure from a procedural basis. That value may be volume driven. We
know that there are some procedures out there that have volume association with
them and we have to be mindful and respectful of that on behalf of representing
that to patients.

The process measures. There is not a lot of enthusiasm for these because we
can’t cleanly tie them to outcomes, but it is still a good important step
in creating standards and reliability and care. It has a role but I don’t
weight as heavily as I would risk adjusted outcomes. However, I still think
that measure is at the facility level. When you build the composite the first
two structure and process are provider level and some facility level but
outcomes are more facility level based.

We have completed and just put into the public domain or it is about to be
launched into the public domain through AHRQ, a surgical CAHPS, which is much
more specific for the surgical patient. It was developed by those patients
through a proper facilitator to represent those things that they are seeking
and not the general CAHPS that you see in the hospital, which if I tell you
much about the parking and the painting but not so much about the expected
outcomes and did we achieve that and were you apprised of what you thought you
needed to be apprised of when you went into your operation now after your
operation. And then the appropriateness I mentioned, and then finally
efficiency.

Currently under the NSQIP forum we can measure efficiency. We can add in
appropriateness either within the registry itself today or we can add it in a
parallel registry that is about to be launched in conjunction with NSQIP, which
is referred to as a Surgical Quality Alliance registry. But these are different
pathways than these other pathways we are talking about for measures, and yet
all these pathways are important. We have to bring all of this together to
represent the structure and process with these risk-adjusted outcome measures.

That completes my remarks about this. The last thing I would say is that I
think from a procedural basis as we develop the measures not just based on
priorities that we set but this has to be in conjunction with the payor
community. We have gotten a lot of information and help by sitting down with
payors and asking them specifically even market by market, where are they
seeing variance, where are they seeing gaps, and how do we partner with the
measures that they have to complement those with the clinical measures?

We are working with the vendors who run the current clinical databases and
all of those now are dropping patches into the EHRs that run through the
practice management system, to at least bring in some of the initial data. We
can pull in all the demographic data and reduce the burden of data aggregation
through those patches. That is working. To get it to the point where we can
pull in necessary clinical information such as in NSQIP, we have actually had
to use certified nurses who are certified to the process to do that data entry
so that we have auditable, reliable integrity in the database. Thank you.

DR. CARR: Thanks very much. A really excellent presentations and remarkable
synergy in terms of where we are headed. I would like to open it up to
questions now and I will start with Paul.

DR. TANG: Thanks to the panel for excellent discussion. Maybe starting with
Bernie in terms of some of the measures, what do you see the timeline or maybe
the process of transitioning from the kinds of – well, actually I think
Karen addressed that in terms of you are working on the EHR-derived measures.

One topic that came up earlier from Helen Burstin actually was the whole
ACE/ARB. Let’s say you have the ACE/ ARB measure. How would you propose we
account for let’s the ACEs on the formulary but the ARB is not and they
have an adverse reaction to the ACE? Is there a way to account for those kinds
of things or how are you viewing that in terms of measured definitions?

DR. KMETIK: I don’t have the specifications in front of me but I am
pretty sure we acknowledge that with ACE/ARB there is a way to document
intolerance.

DR. TANG: I really liked your matrix where you talked about checking off
where a measure was with respect to the meaningful use. Have you by any chance
done that for all of the MU measures?

DR. KMETIK: We are in the process. I would be happy to share that. I think
it makes an important point -if I could elaborate on that a little bit, which
is we all appreciate the need for deadlines, but to arbitrarily sort of and I
don’t know if it is arbitrary but to pick a date and say that’s when
we go live is troubling because we want to all be assured that we are there and
I think that tracking exercise helps to acknowledge where are we and we can
focus our efforts on the things that still need to happen and let that drive
the appropriate dates of things.

DR. TANG: I got to go back and say fine, that Medicare trust fund is going
bankrupt and it is fine to say let’s wait until we are ready. How do we
make that compromise with when do we think at the current pace we can be ready
versus when could we be made to be ready?

DR. KMETIK: I’m not saying we sit back and just wait until it all
happens. I am just saying that laying it out that way I think, makes it pretty
transparent to all of us and in some cases we are going to need to set some
dates and say this what we need to achieve by then. But without that
specificity, you tend to get people a little frustrated sometimes and they will
throw up their hands in frustration. I think it is better to articulate this is
the goal – this is the goal we want to reach. Here is where we are in
different things so let’s try to put some aggressive but reasonable dates
to each aspect of that because without the realities sort of tracking
mechanisms, it is hard to engage in a conversation.

DR. ROSOF: I can also say that this is a resource intense kind of work
especially with the extensive number of measures we have on our plate, plus the
measures that the consumer purchases would like to put forward, et cetera.
Adequate funding obviously is an issue to help Karen along in this quest.

DR. MIDDLETON: Let me join Paul in thanking you all for really great
presentations and thanks for all for coming. I guess I am sort of sitting here
wondering where is the break through. If each of you could respond perhaps with
what you see as the top one or two or three, break through opportunities for us
to develop measures, to accelerate the development of measures, to implement
measures. I am feeling a little under whelmed so far, in the sense that we have
a lot of activities going on. We certainly know that we haven’t impacted
to a large degree yet, behavior change and change quality in outcomes in cost
of care. Try to be a little bit provocative, but gently, what are the top three
breakthroughs you would like to see in your own work or the work of others?

MR. OPELKA: I think that is a great question and in fact I think we have
asked that of ourselves many times. I think that one very important
breakthrough that we have to have is harmonization across the payors. It’s
just not working to have multiple different payors hitting these providers with
multiple different quality programs. It is highly inefficient. If we want to
talk about removing efficiencies or improving efficiencies in medicine, let us
not be the cause of more inefficiency at this process. We lose credibility with
the providers immediately. They already see these unfunded mandates coming in
as one more burden and it’s not an opportunity for quality improvement.
They really want to give good care. I have never met anyone in medicine who
didn’t want to give the best care everyday and if these things come in and
they are just burdensome and they get one from United, from CIGNA, from Blue
Cross, from WellPoint, from all the different plans and then there is CMS. They
are not harmonized. That is not helping the American patient one bit.

I think that is going to take government intervention to tell everyone you
are going to get in a room. We are going to get this together. It is very
important.

Secondly, just from the surgical perspective, we do have –- it’s
not compelling evidence, it is overwhelming evidence, that we have improved the
quality of care in the NSQIP hospital. It is huge. It is not longer denial. The
return of the investment is also there. Business case. So the up from funding
is returned. Good quality care is cheaper care and not only that, there is
massive buy up that the surgeons love this. There is information that comes
back to them is scientific that they see risk-adjusted data that is being
shared. They see themselves. They see their hospital, and then they see here
are the high-performing hospitals, here are the low-performing hospitals, and
there is this gaggle in the middle. But that starts the learning network and
the quality-improvement process.

I would say there is an upfront investment to make that happen and I think
that is also an all payor responsibility. I don’t think it is a government
responsibility alone but I think it is an all payor responsibility.

MS. SCHOLLE: When I think about the breakthrough I think about two pieces.
One piece is having the information that we are collecting be something that a
clinician can use and can explain to a patient. That is the piece of trying to
look at how these measures can tie to decision support and could be reported
out to patients. Choosing a small set of measures that you can really build
into, get out of the problems of free text and other stuff so that you can
actually get a report that is meaningful to the clinician that there is
decision support that can help guide that clinician and that they can also use
that to help patients understand what their care needs to be I think is an
important piece.

That can only happen when you have an HIT system that can support the
collection, the use of these data elements into a measure and when you have a
workflow process that supports that and the work flow that staff that are
trained that have responsibility for printing out the report or for entering
the data and using that. Our experience as we have been looking at measurement
with the EHRs is that those are the critical first steps and so that happens by
having structural measures that say this is what the system should be capable
of doing and these are the tasks that the team needs to be able to perform and
need to be assigned so that you can get those reports that they can build into
your decision support that they can be shared with people so everybody is
working on the same page and maybe trying to do that on a few really good
measures that makes sense to everybody would help us to understand how to do
this bigger and better for different commissions.

DR. ROSOF: So as Frank said that is a very good question and it is a
question that I asked when we started doing the work on overuse not knowing
where to start. I started with the clinicians and specialty societies. Their
response to me was try to eliminate the confusion. Try to develop national
standards that would be agreed upon by the providers, by the payors, the
consumers, the specialty societies, and the boards. Yes, you can build that
into my workflow. That would be terrific so that I don’t have to do other
things during the day but it would harmonize current performance measures and
perhaps I could use the same for my maintenance of certification as I would for
my pay for performance, PQRI, et cetera.

That whole issue of confusion amongst the already overburdened clinician is
something that we would really like to accomplish. If I could look at one
thing, I would say let’s try to develop national standards agreed upon by
all those stakeholders.

DR. MIDDLETON: This is really terrific because I think you guys are laying
out kind of the inner thinking behind your presented thinking, if you will. I
wanted to follow up Frank with you. In a way we had a conversation this morning
with David Reuben from the UCLA and ABIM, and it was very interesting to think
about how in fact measurement of physician performance through quality directly
or indirectly through other means, process assessments, what have you, may
raise the issue of whether or not we are establishing a floor of competency or
whether or not the process can be turned on its head to establish a goal, an
aspiration or reach or stretch goal, which then changes the dynamics, not only
the human dynamic and the psychology, but also the measurement process in
responding to measurement and assessment as a tool for me to continue self
improving and lifelong learning and a learning network as you described, as
opposed to the kick in the butt when necessary if underperforming or over
performing untoward ways.

I wondered if you could comment on the learning network and whether or not
there is an individual component to this, which has this feature or this
characteristic or whether or not you think something like that might be
possible?

MR. OPELKA: I have had this discussion many times and with many of our
stakeholders and it is fascinating, one of the stakeholders Peter Lee, he and I
came to the same conclusion that at a national level we needed a baseline. We
needed the floor. But at a personal level, we needed the upper 10 percent, so
you have to do both. I don’t think we can escape that. I think all of us
want to know where the floor is, that you have to be above this. If you are the
outlier on the low end for a particular condition or treatment of that
condition, it has to be identified and the public has a right to know. This is
their health. And at the high end, we need that so that we can get into those
systems, look at those systems, and see what is happening.

What is so different about that system because most of those successes are
not one individual who is doing something absolutely extraordinarily great and
miraculous. It is the fact that they put together those systems that created
the standardization and reliability. If you look at what Geisinger does that is
so good, all their goals and all their targets are standardization and
reliability. It is nothing more than that and they let performance of the
individuals distinguish themselves beyond that. I think you have to have both.

The reason you have the learning network isn’t so much to ferret that
out because that is intuitive I think in everybody. Nobody wants to be on the
bottom. Everyone wants to be on the top. But we all are coming to work already
thinking we are doing the right thing. Medicine moves and changes so fast,
particularly in this day, that we have to teach each other on a continuous
basis going to a monthly or even a semi-annual society in listening to some of
the latest and greatest doesn’t translate into getting back into home.

I think the quality improvement efforts have to be driven by these measures
that these are specific to where our issues are and then we have to own them.
We need a team effort to own that to figure out how we are going to change. It
is very hard for an individual who fully believes in what they are doing to
change a law. You need the learning network.

DR. KMETIK: I want to just echo the themes and maybe say in terms of a
breakthrough. I do worry that we are missing in our vocabulary today in our
conversations about meaningful use et cetera, both these pieces that I think
you are getting at Blackford, and that Frank has articulated. I worry that we
have had past national programs where the emphasis has been on get the data
somewhere and we have done that. We have done that exporting.

That is done pretty little to get timely data into the hands of physicians.
I think the vocabulary needs to make sure we are giving those two things equal
weights. We got to get the data in the hands of the physicians. They are going
to use is as Frank said. If it is good data they are going to react to it. We
understand the need for exporting data and sending it to others but if that is
all we do and we haven’t built this first, we have missed the golden
opportunity but we have just replicated what we did with claims and that would
be so frustrating I think to all of us if we look back five years from now.

DR. TANG: I am picking up on Blackford’s and I really liked this last
discussion. I think Frank answered the question about why the floor because I
challenged ABIM on let’s stop raising the floor and let’s start
shooting because I find that individuals and is to your point of facility
versus individuals, maybe the floor is for the facility but the individuals
react very positively embracing on even transparent measures about themselves
and that has been our experience.

The breakthrough might be to make sure that our quality measures align with
the physician’s line, which is towards the individual. I am finding less
and I can understand why we developed failure quality measures. I just
don’t see much room for them anymore. An example is the hemoglobin A1c
greater than nine. It doesn’t tell me any way – it doesn’t tell me
how close I am to the goal of less than seven. It seems like we want more
measures that help individuals because I think the public reporting and that is
what the evidence shows is that consumers maybe because we don’t present
them in a usable way, but that is not what is changing their choice. It is
actually changing the docs behavior to more improve it. Maybe that is an
example of an improvement. Design the quality of measures to help docs perform
at a higher level.

The second potential breakthrough and this in reversed order because Floyd
is going to talk about the quality dataset and I know some of you are familiar
and some of you aren’t. Do you think that kind of -– because you
talked about efficiency and each health planning to have their own definition,
et cetera, and the problem of getting that into the EHRs and clinical decision
support. Do you think the idea of a QDS, a quality data set, that all your
organizations could draw from would be one of those breakthroughs to the extent
that you understand what is going on there.

MR. OPELKA: Let me just ask a question to help me answer that, Paul. If I
were to get a feed that gave me a list of data metrics, that feed could then be
mined locally even further.

DR. TANG: It is more that the -– somebody showed the supply chain and I
don’t know if that was Helen from before. Was it Helen? Okay. You start
out with creating the evidence, to creating the guidelines, to creating the
things that would affect behaviors like the rule sets, clinical decision
support to creating the quality and the performance measurements and then
feedback. Each one of those steps is done by different part of our sector and
they all invent their form of the data definitions and the measured
definitions. Wouldn’t it be great if we had one place where everyone could
go and draw out from the very start of the design of the protocol and the data
you collect all through and spit out the quality measures. In that sense that
would facilitate the data and hopefully would agree on 6 instead of 35 data
elements to feed into your registry, and case adjust these things and spit out
the quality measures that would go back to the EHR. Maybe that is the
bi-directional aspect. The outcomes that you get from your registered data
could sit in front of my face at the time I am making these decisions.

MR. OPELKA: I am thinking of two things with this right now. I have to
measure data sets that one is this LSU data set on, take my diabetics. Our
diabetic measures are pulled from the lab. The hemoglobin A1c is pulled from
the lab and we track less than 7, 7.5, 8, 8.5 all the way up to 10. When we
report we don’t say anything. We give an individual doc all their scores
and the system has all their scores, but the system then was allowed those
interested parties to mine the data further. When we mine the data further is
where we found the answer.

What we noted in our own population was — just as an example, if you were
over 65 in our system your hemoglobin A1c tended to be pretty well controlled.
You were more mature. You paid attention. You read other problems. Your doctors
were important to you but for the group that was under 35, which just happened
to be the musicians of New Orleans, the lifeblood of the city who drink and
smoke all night long but make beautiful music, none of them cared much about
their diabetes. Then we put in improvement programs focused. In this instance
performance measurement to physicians meant something because it came home to
treat and make a patient better.

Looking at that data if the data allows me to re-mine it and the NSQIP data
does give you a dump and you can do a lot of mining and re-looking at things.
If there was a system whether it had the parameters you could pick and choose
from allowing you to get the other analytics so you can do a deeper dive. As
you know that is how physicians think. Okay you give me this but now answer
this, this, and this.

DR. TANG: I think that is exactly right. The QDS would allow you to have
this is an A1c, this is what it means, this is how I calculated it and how you
interpret it so you can do that mining that you suggested. The breakthrough I
think I am thinking about is our quality reports – the quality measures we
currently report on actually take away that ability and that is the fallacy. I
think the breakthrough is to report differently. In our organization where we
are improving quarter by quarter because of the timely reporting back to the
docs we have a separate team to create the measures that we use and then a
measure that we have to report to all the various definitions that the health
plans have and that seems counterproductive. Maybe the breakthrough is pay
attention to what it is that will change the physician behavior. I think the
rest will follow versus the other way around. Report to health plans and then
expect physicians and patients to change.

MR. REYNOLDS: Frank, you commented earlier about the payors. The quality of
whether somebody has quality shouldn’t decide on whether they are a blue
patient or an – patient or something else. As I sat and listened to this
and I do a lot of standards work, I sit and listen to this and I hear we a lot
around the table, and every single presentation has been fascinating to me, but
each one of them has a different slant and even the most recent discussion in
Paul’s comment about what they do in their institution.

How do we take what all of you are working on and again as you look at the
chart from ONC and you look at the states that everybody in every state is
asking for money to go ahead and implement these things for all their docs and
everything else. What is the process that you all see that allows this to be
raised above whether it is payor provider or anything else, and you set the
quality standards and you have it in an open forum and as you say it is agreed
to like a lot of the other standards that go on and then once that is done then
–- so for example as a payor, I’m not speaking for our company, but
it would be awful nice if there was a set of those out there that everybody
agreed on that was the right thing and oh by the way it played for Medicare, it
played for Medicaid, it played for everybody else and you can use it and
improve North Carolina one patient at a time, regardless of who they are
covered with.

But in listening each one of you is doing something really good and even
some of our members on the panel are doing good things, but left as it is then
others are going to come and say, here is what we are going to do or here is
what a state is going to do and everything else. That is what worries about
getting to the meaningful use and getting to the things that are going on is
that really, really good work but no process to actually vetted in an
environment. Being NCVHS we get stuff all the time. Not recommending that there
would be it but it is vetted. It is vetted in one environment. It is vetted in
a way that is agreed to, and then once it is agreed to it could be adopted by
HHS or others, and that is how we start rallying around.

I chair a committee of all the payors on eligibility claim status, the same
kind of thing. You know you get the one. Any comments you could make because
that could be a way NCVHS could make some recommendations as to how to help you
get to that point, if that is a worthwhile point. If not then it is probably
going to be one state, one payor, one issue, one situation at a time, and
that’s not going to get us any of us I think where we all trying to be
when you put up any of these thoughts.

MS. SCHOLLE: I think that is the major challenge. We have had with HEDIS, is
that everybody wants to make a little bit of change specific to their state or
their population, and it is really hard to get people to agree to a standard
and working with NQF to get to endorse measures should really help that. But I
think this idea of having standards and then getting vendors to agree to put it
in because we have had – the vendors don’t want to put the resources
into any kind of measurement standard unless they think this is going to work
for everybody.

MR. REYNOLDS: Where do you see the process that says vendors you should do
this? I know some of you talked about meeting with vendors, but where is the
one body that says okay, this is it guys, go do it?

DR. TANG: We had a discussion of this at lunch. This moment or opportunity
where the vendors all said, well our customers aren’t asking. Well all of
a sudden because of the meaningful use dollars the customers are asking the
vendors, which the measure developers could never get the vendors to do. Now
the vendors have to listen to the measure developers because –

MR. REYNOLDS: If the measures are accepted across the board as that’s
what it is and let’s go.

DR. TANG: That is our moment of opportunity and I think maybe this hearing
is a plea to the measure developers and the people on that supply chain, hey,
this is time to hit the breakthrough measures and meaningful use is the excuse
to do it.

DR. ROSOF: This is not a new discussion, obviously. When I talk as a call
for national standards it was the discussion. At the beginning of the effort at
health reform this discussion began. Who is going to be the organizer of
national standards? Who is the recipient that is going to make the rules that
this is the national standard for X, Y, and Z or for the vendor situation?

That led to a lot of confusion, internal concerns, et cetera and still not
resolved. There are some answers that could be accomplished. Organizationally
this could be accomplished within the framework that is already established;
however, it would have to be agreed upon by so many stakeholders that unless
there is somebody who says you must do it. Unless there is somebody who says
national standards are the gold standard, it’s just not going to happen
meaning the discussion started perhaps with national quality forum being the
individual who would be the group that would handle that that created a lot of
discussion and didn’t go to where it was going to go. The same thing with
the national standards related to development of measures. PCPI could have that
role. There is no question about that. It has the background with the medical
specialty society, the clinical side, the methodology side, et cetera. It is a
group that should have that responsibility. It would have to be agreed upon by
so many stakeholders that it becomes – it is not a forum where you say yes
or no and 50 percent you win or 60 percent like Senate. We will see.

DR. CARR: A quick follow on and then Larry –

DR. MIDDLETON: Just two quick thoughts. Just a quick reaction is sufficient.
In a way I am struck with the assertion that the physicians of course are
interested in providing high-quality care and aim always to do so but of course
the result isn’t that. I am concerned about we are developing a new
measure of framework and all the stuff we have been talking about but absent a
sustainable business case for quality. I am just concerned it’s not going
to happen. Not to suggest that docs are mercenaries who are working only for
quality or the almighty dollar but could you each react to you what is the
sustainable business case for quality post stimulus?

MR. OPELKA: I think you have to stop paying for volume. If you are going to
have a quality system and you want this transparency in all the key pillars of
healthcare reform, there has to be a realignment of incentives across the
board. Until you do that I don’t think all of this stuff will go exactly
where it has to go and be as successful as it can. I also think you must
address the issues of defensive medicine. That if you are going to incentivize
me and my practice by $2000 of PQRI money and I put myself at risk for a
$20,000 increase year after year in my malpractice because I no longer practice
defensive medicine, and I was caught in a court system that isn’t educated
to quality metrics, it is easy for me to decide which risk I am going to take.
I am going to keep my malpractice insurance rather than $2000 on quality. That
isn’t necessarily something I bought into as the best quality.

You have to have meaningful quality and really something that can at the
local level create quality improvement, excitement about the practice of
medicine because of the state and this information. You have to have aligned
incentives that have to include payment realignment that fits that and you have
alignment incentives that correct the disparities that we see in the way we
practice defensive medicine.

MS. SCHOLLE: I think you also have to think about the accountability not
just being with an individual clinician for an individual patient of individual
point in time that care happens over time to patients and they are getting care
from different people. The accountability needs to recognize that it’s not
just the surgeon. It is the primary care physician, the hospital, the health
plan, that are all taking responsible for the care for this patient. That is
the other kind of financial alignment that needs to happen particularly since
some of the measurement that you might want to do at the clinician level it is
useful for quality improvement, but in terms of knowing – it is useful to
let that clinician know what is happening with the patients but to really get
something that is stable, you are going to need a lot more observations.

Like Frank was talking some things make more sense at the facility level.
Some things are going to make more sense at a community level to try to
understand how well the community is doing; the community of healthcare
providers and the healthcare system are doing for a population of patients.

DR. KMETIK: I will just add a little bit to Larry’s previous point as
well. I think we should use this moment of meaningful use. We should leverage
the heck out of it in a thoughtful way because it has enabled us at least to
begin to have a different dialogue with the vendors then before, as Paul said,
step one.

Another important step though is I think we can begin to have the different
conversation with the payors because as opposed to an old conversation of you
want to merge your big databases that you have. It is a new day. We are both
getting data we never had before. If done right, the physician has data they
have never seen before at the moment of the patient care and the plans, the
payors, they never had cancer staging data before. They never had prescription
data to marry with dispense to see is the patient picking up the drug or not.
Again, our vocabulary maybe needs to have a shift to say what do we all win
here if we do it right as opposed to just we are going to get backed and that
is what you have to do. What is the thing where we all win and it is
information we never had access to before and that is even important to the
plans.

DR. GREEN: I want to try something here. I want to ask all of you to respond
to questions that in my mind tie back to prior NCVHS work, the work of several
of our subcommittees over the last few years. All the questions I am going to
ask are capable of being answered with yes or no. The first question also I
built off of Sarah’s slide for she had the word dream and she also had
another slide that had the word stewardship on it. This question would be are
you experiencing in the current measured development endorsement and adoption
process difficulties that relate to privacy confidentiality or a research
community would call IRB issues. Yes or no? Or is this a nonexistent problem.
Privacy and confidentiality are not in play here.

DR. ROSOF: The answer to your first question is no.

MR. OPELKA: Yes.

MS. SCHOLLE: Yes.

DR. GREEN: Two yes’s, two no’s. Then we in the last hour and a
half heard Harry’s “we”. Does the “we” share a common
data model? The “we” of the vendors, data developers, the payors, the
purchasers, the government, providers, et cetera. As we massage this thing
anywhere lays a common data model. We are all working on that type of data
model.

MR. OPELKA: No.

DR. GREEN: No, that doesn’t exist. Then what was not said in this
section is there current active work on measured development endorsement and
adoption of measures of function or a patient feeling like they got some
relief? Four yes’s. Thank you.

DR. TANG: As a closing comment I just want to thank this panel so much
because I think there was such really good information and discussion. The
other part is maybe on an upbeat note. Bernie said that you need this power. I
think that you have more power than you realize because of the moment and it
really is leveraged through the meaningful use and really NQF because it is a
voluntary consensus standards organization, CMS has to use those when they
exist. The measure developers have a lot of power at the moment. Maybe that is
the upbeat way of saying this. I think the work is really good and to just to
accelerate it not because it is comfortable to but because I think the country
needs it.

DR. ROSOF: Some need the encouragement to utilize that power.

DR. CARR: With that I think we will just take a 10-minute break and then
reconvene for Floyd and Blackford and I add my thanks to this group. I know you
made this timing work and I appreciate it very much and thank you so much for
enlightening us.

(Break)

Agenda Item: Building Meaningful Measures –
Adoptability

DR. CARR: We are running a bit behind but really very enriched for all the
discussion. The next session is building meaningful measures adaptability and
Floyd and our own Blackford Middleton. Floyd Eisenberg and Blackford Middleton
are going to present. We will start with Floyd and I suspect we are going to
run over. Do we have Kathy McDonald on the line yet? Not yet. We can just let
her know that we are probably running 25 minutes behind but looking forward to
her testimony and if that is a problem for her just let us know.

MR. EISENBERG: Thank you very much. I thank you for the ability to speak
here today and I am Floyd Eisenberg, Senior Vice President of Health
Information Technology at National Quality Forum. I am very pleased to discuss
the topic of building meaningful measures and adoptability. Just in our last
discussion I heard a lot about vendor participation with using measures and
implementing them and also is there a model of information. What I am going to
be addressing in this presentation is pretty much a model of information for
quality. It was developed by the Health Information Technology Expert Panel,
chaired by your own Paul Tang, and one of our work group co-chairs was
Blackford Middleton. I feel like I am home here. It was funded by the Agency
for Healthcare Research and Quality, where iteration as the health IT panel.

Our original goal, the task we were given, was to develop a quality data set
and to identify workflow for quality in a clinical setting. I will be
discussing that with you as well as then how do we get this implemented, which
is the rest of the presentation.

To start with, in thinking about quality we call this a quality data set
thinking of the way – if you remember when Helen Burstin presented this
morning the quality tree, everything starts with guidelines and evidence and
the branch points are where guidelines recommend different components and the
leaves are the measures. It starts with the study designer, the guideline
developers that start with a concept. This is all about diabetes, as your
example. As we look at that we want to understand what do we mean by diabetes
and here the measure developers often provide a list of codes to say what is
diabetes. In the example provided it is ICD-9, could easily SNOMED, and will be
SNOMED at not too distant future. the list of codes that represents diabetes.

In order to provide additional meaning for any part of the measure I need to
know that there is an active condition of diabetes. I have now merged together
a concept, the codes, and the data type, the type of data that is active. It is
a problem or a condition and it is active. All of these together make up a
model of what we call a QDS element or a quality data set element. That element
then can be stored and properly identified, be reused in a database of other
quality data elements so any time we need to understand for a measure in a
numerator or a denominator the use of diabetes active diagnosis that
combination of code list, data type, concept together as a QDS element can be
identified and reused. Understand I am talking here about the quality side. I
have had a lot of interest from those in public health for public health
reporting and public health identification of illness, as well as a research
side either folks working with the same concept concern at National Cancer
Institute and elsewhere. The quality data set in these elements can be reused
for data out of electronic records and electronic data streams for multiple
uses not just quality, but we are going to be addressing quality mostly here.

Once I have done that I now have my quality data element and in the context
of a measure I want to know where is it coming and what is its source. What is
the workflow of it? When we talked about workflow in the HITEP we were talking
about the source of the data, that is, the originator. It could be a device. It
is a reading of a blood pressure off of a blood pressure monitor. The recorder
could also be the device electronically. It could be a clinician and it could
be the source as a patient and the recorder as the patient. We are talking
about those two elements.

We also want to identify in the flow what is the setting in which it
occurred. Is it in a hospital, in a home, ambulatory care setting, skilled
nursing setting? And in what health record field do I expect to find the
information especially if I am submitting data from one EHR to another, and in
this case we want to know that this is an active diagnosis of diabetes on a
problem list. The data flow would identify the location and also the source and
health record field. Once I have that multiple ones of these, multiple elements
make up a measure and those measures then each are stored in a measured
database.

In order to, again, looking at the model I have the quality data set
elements on your right, the measured database on the left, which is composing
those into measures and each of those measures then can be used to manage
measures in EHRs, registries, health information exchanges, et cetera. I give
credit to Danny Rosenthal, who is tremendous with graphics and he did a very
good job on this.

Once we have identified all that and we have our data types. If I were
thinking of something like medication the medication is represented by a set of
codes and it is either medication administered, which would be one data type
medication ordered or prescribed being another, each one of our concepts has
multiple data types. If I look at those as the center bar in this particular
drawing, that is the kind of the Rosetta Stone, the quality data set against
with measure developers and identifying the elements for the denominators and
their numerators for those providing clinical decision support to identify how
to define my population and what intervention should have occurred can reuse
that quality data set rather than an EHR vendor on the other side looking at a
text set of criteria in a measure reading every sentence in every line and
trying to figure out what that means in their EHR can identify I know this is a
QDS element. This is active problem of diabetes or just active problem. I know
where active problem is in my EHR. EHR 1 has it mapped once, and anytime I have
an active problem whether it is diabetes, congestive heart failure, whatever, I
know I expect to find it in that same location in EHR 1.

That might be slightly different in EHR 2 because of some innovation in a
way that is configured but they handle problems in a standard way also. Once
they mapped that to the active problem then they have mapped to the Rosetta
Stone on your right. The measure developers and users and the seekers of
information, map to it from your left. Using that same model without having to
individually look for the same element every time there is an element in a
measure or a guideline or somewhere else, to go find it individually in EHR as
long as you find it in the center on the quality data element and the quality
data type then you are able to do that mapping much more easily.

In order to do that we had to link. The next step to implement is how do I
link each of these data elements to a model of information that EHRs might be
using. Now HITSP has done a lot of work on this effort. The Quality
Interoperability Specification, known affectionately sort of by some of us as
ISO6, it has a table in the appendix, which we will be going to panel, I think
in the next month, that identifies all of the standard elements, the HITEP data
elements, a definition for the HITEP quality data types, and identifies where
in an interoperable way that information would be found in electronic setting
and it identifies the terminology that most effectively could be used to do
that.

HITSP’s terms for that would be C83, which is the location in a
structured document, C80, which is the vocabulary. There will be new numbers
for after comment. I won’t comment on them. I think it is C154 is going to
replace one of those but let’s not confuse the issue.

This was to take the HITEP elements and definitions, make sure there is a
setting in the record that represents each of those. A measure developer
doesn’t have to understand the entire model of electronic records but
understands the data type. The electronic record doesn’t have to
understand everything quality developers are thinking about. They know where
that information is mapped to and they can deal with that. That is the process.
The table is obviously longer than two rows, quite a bit longer, but I have
just showed that as an example.

What we did find was in the HITSP side there were some gaps. Some of it was
in thinking about transferring data and using data for individual patient care
the discussions and standards go so far but when we think about sometimes we
need more granularity where it can be free text and it doesn’t have to be
specifically defined for individual care as long as they know it’s in a
packet that deals with physical examination.

It does matter if which part of the examination should be identified I
SNOMED and which in LOINC or is there a difference because right now what HITSP
will tell it’s one or the other and what the EHR vendors told us is which
one and when. There is increasing granularity that needs to be built into the
standards work and that is where there is some more harmonization and
specification that is required that was one example.

There are also areas where there isn’t a good clear evidence of where
that should be or is in an EHR setting and some of that patient functional
status, a functional status survey exam. Where do I include that? It’s not
necessarily in the same place in any one electronic setting that needs work;
the patient’s care experience, the provider’s care experience both of
which are HITEP data types, communication to and from the patient. Devices.
There are standards for devices but sometimes measures are actually looking for
the use of devices where there are no standards. One of the examples was TED
hose or antithrombotic devices put on the leg. There wasn’t a good
standard for that. That is where there are some gaps.

To understand a patient has declined without asking a physician to enter a
code to say the patient refused this or declined it, how do we get that
information one from the patient and two how do we standardize how that is
represented so we can reuse it or treatment offered, which should happen before
it is declined we would hope.

Another area where there is a gap is we do have and Blackford shared for NQF
a team on structural measures so we have a number of those. Most of them are
basically indicating I have the capability in my office that I can do this
procedure that I can do ePrescribing and sometimes I can identify when each
patient left the office. Did I sent any prescription or did I not and the
reason?

What our next step and this is one of the panels that NQF will be calling
for nominees very shortly is looking for folks who can tell us what kinds of
information can I get out of routine EHR utilization to determine that the EHR
was used for this purpose. Rather than have a doctor have to put in a code to
say I just wrote an ePrescription, which sounds like double work, how do I just
get that stream of how many of those happen and how many prescriptions were
written directly out of the EHR? The reason for that is to identify if we think
of meaningful use, what are you actually doing with your EHR? Are you using it?
That was a gap that we are working toward filling.

In order to then now take the quality data types and allow them to be built
into measures – this is a very early picture of a prototype. This is not
what the tool will eventually look like but to try to give an example of what
would happen is a measure developer will be deciding on a new measure. They
will say I have a new measure, in this case hemoglobin A1c management. They
will look through a QDS dictionary where there already are quality data set
elements and they can pick one, see that it is there, the hemoglobin A1c. It is
a lab test. It is identified by LOINC codes. Here are the codes. If it
doesn’t exist they can then add it here. All labs in this tool should be
in LOINC and then the ability to add them in so that they can add elements that
don’t exist or reuse those that don’t. Once that is selected all of
the quality data elements for this particular measure will be down in this box
to show here is everything related and there will be a place for describing the
logic of what goes in numerator, what gets added, the mathematical operators to
fit those together.

The intent is once that is all defined is to put that into an electronic
format. In the background this measure authorizing tool will create an
electronic measure. There is now going through HL7. It has gotten through
ballot. It has not had a final vote to determine if it is accepted or goes to
second ballot but is an electronic measure. It has a representation of all the
elements within the measure mapped to the reference information model, the
model of information within EHRs and all of the measure elements come from the
data types. It then provides logic and with it provides a human readable style
sheet and an XML version from which EHRs can then extract the components they
need to implement the measure. That is now in progress. We will be testing that
in the next couple of months with some meaningful use measures that are being
retooled.

Let me just mention on that and I think I will talk about that more
tomorrow. There are 72 measures that we were asked to have retooled with
existing measure developers. The original stewards of those measures who
understand the logic, the evidence and what went into it to create them more
into a QDS format and electronic format meaning that hemoglobin A1c greater
than nine becomes hemoglobin A1c show me the value and not check a code that
says it is greater than nine but A1c give me the value.

Now there have been folks who asked us in a measure like that can you tell
me hemoglobin A1c six months ago and the one now and what is the delta. There
is a lot of value in that but that would change that measure. In order to
retool we have to keep the measure intent similar to what it was. To create new
measures we are very happy to look at those and send them through the
endorsement process but in order to retool we have to keep the meaning of the
original measure to keep it the same.

What are some new data types, new sources? Some of these were the gaps I
showed you but functional status, care experience, communication. How do we get
information from the patient? How do we get information transmitted to manage
the different new types of information we need? What we will see as we work on
the national priorities and look at the patient’s care experience, the
patient engagement and care, we will find new data types and new elements that
we need to add to this and address so we are expecting that.

I mentioned the HITEP data flow. The reason I included those here was new
data sources will be coming directly from devices. How can we identify if I see
a blood pressure what the source was? The more we can persist the origin of the
data with the data as it goes through the system it might be a health
information exchange having come from an EHR which came from a PHR which came
from a device in the patient’s home. If I know it is from a PHR I
don’t know if the patient entered it. I don’t know if it was the
device. How can we manage and I don’t know if it is clear yet how that
happens. Here is the new source. How do I keep persisting where it originated
so that I know the value of this data to use it for quality research or other
purposes?

Now this is purposely complicated because it is. One of the comments I was
asked to talk about a little was data collection. In the standards committee we
were asked what are our transactions and we tried to look at well we have to
look at what this flow. I have a measure source so the measure developer
provides this measure. Let’s assume it is electronic. It goes to an
electronic record. It might go to a personal health record, PHR. It might go to
a registry. Perhaps that EHR does everything. It assembles all the data in a
registry within it. It creates the report and sends it out, which is why you
are seeing all these different boxes that connect it. It may send the
information and work directly with feedback with the data assembly system, a
third-party warehouse, a health information exchange, a registry like American
College of Cardiology, and that might be collecting the data. There is a
transaction here that needs to be addressed. Sometimes the processing entity
that reviews the data to validate it is correct is separate from the assistant.
This is drawn to show there are various architectures that exist today. Perhaps
one architecture is best but I don’t know that we can get necessarily
easily to one. I think what we want to do is address the transaction from the
direct care delivery to whoever is assembling the data and feedback has to be
frequent so that the care site knows how to improve care and not just get an
end of year report. But this one is done to look at what are the different
levels of data collection. It can be complex but I think what is really
important is this collection of the data directly from its source.

In order to keep them current one of the issues as we talk about the quality
data set is maintaining currency of the very atomic particles for those
measures. That is the codes. If I am looking at RxNorm, for instance, as a
source for the codes for my medications, that changes frequently. Every time it
changes that code set or value set as some IT people would call that, is going
to change. Does that change the measure and at what point does it? We have a
team scheduled to meet in a couple of weeks to help us set criteria for when
does that change or make it a new measure. When does it make a new code list
and how do we version those and how do we manage them?

How do we maintain the QDS is our next job is to make sure we add to it and
modify it as we learn more through creation of measures. Measure maintenance
process, I think you heard some of from Helen Burstin this morning, and that
what we do have is a regular endorsement process. Every measure has to be
reevaluated every three years and every year, if there is a change is
reevaluated. All measures now being tooled electronically will go through a
modified process because they are not changing the measure. So the process we
will evaluate one did I change the measure and if it did change then that means
it is a new measure. Did it change the meaning of the measure that it was
retooled and two, was it retooled appropriately so address the QDS correctly?
That is an alternate process for retooled measures.

I am trying to keep to time but that is a broad overview of a new model for
managing measurement, which I think will work well for not only quality but
also public health use of data coming in and out of the EHR and also research.

On the code list issue I think what is important is there is currently in
public health something called PHIN VADS, Public Health Information Network
Vocabulary Access Directory, and they create value sets or code lists for all
reporting for public health. NCI has something similar. I don’t know the
meaning of the acronym as clearly, but I think it is CDAR and they do something
similar for research.

What we don’t want to do is a create another silo of creation of code
lists but our intent is to talk to the National Library of Medicine, CDC, and
NCI and get together with stakeholder groups to figure out how to do this best
nationwide and very shortly. Thank you.

DR. CARR: That is great. Thank you. It is so helpful the way you organize
the data.

MR. QUINN: Mike Fitzmaurice isn’t here but there is a thing called
USHIK, which I don’t know what it stands for but it is a big meta-data
repository.

DR. EISENBERG: US Health Information Knowledgebase. Let me just say I did
not mean to leave AHRQ and USHIK out of the picture. USHIK actually is doing a
lot of work with HITSP to store all HITSP documents and to create a kind of a
data repository for all of this and they have offered to do some of this for
the quality enterprise. What I think is important is that we all sit together
and figure out what is the way to do this. I don’t know if it would be
productive to have all the quality data sets or the quality information sitting
in USHIK and the public health sitting in PHIN VADS and the research sitting at
CDAR(?). I think it would be more helpful if there were a way that even if that
did happen how they are all coordinated.

MR. QUINN: To put them all in USHIK.

DR. EISENBERG: If that is the answer that is fine.

DR. CARR: Blackford, go ahead. Do we have Cathy McDonald on the phone now?

MS. MCDONALD: Yes, I am here. Thank you.

DR. CARR: It is Justine Carr. We ran a little bit over because of various
schedules. Are you okay to stay on the line?

MS. MCDONALD: Yes, I am.

DR. CARR: Great. Thank you.

DR. MIDDLETON: Good afternoon everybody. Blackford Middleton. I am from the
Brigham and Women’s Hospital, Partners Healthcare, a member of the quality
subcommittee and a member of the full committee. I was invited this morning to
share a few thoughts with you about these quality measures adoptability. I was
jotting on the spot and put together a few slides.

I guess when I think about this and I am very happy actually to have the
opportunity to share with you some thoughts really from the implementer’s
point of view. I really tried to take a cut at this as an EMR developer and
implementer having worn that hat more than a few times. It struck that may be
there are four core components to think about when assessing these measures and
their implementability or adoptability in HIT. I just thought I would consider
HIT broadly. It might be that these are implemented in a variety of HIT
systems, EMR and hospital-based records and other hospital systems as well as
ambulatory EMR, et cetera.

Anyway first and foremost the quality of the measure, secondarily
implementability or the adoptability, thirdly practicality of use in clinical
practice and then lastly the maintainability or as I think Justine said this
morning perhaps intentionally adaptability in addition to adoptability. Let me
talk about each of these very briefly.

I think many of these issues have already been discussed. This is a big of
an eye chart and I apologize, but clearly we have to start with a high-quality
measure, well specified, clear conceptual understanding, interpretability as we
heard before, measurable and discriminating among the candidate population
being assessed, precise enough but not too much perhaps, statistically sound
and valid, independent repeat measures and coherent and composite measure and
the like, free of any bias structurally or random bias error that might occur,
and what is the underlying distribution of the population? Is the measure
appropriate for the underlying distribution? Is the requisite instrumentation
from the HIT well understood? How is this data gathered? Is it gathered in the
process of care or as a byproduct of care or in some other automatic way
through an interface to an ancillary system or what have you? Are these
different instrumentation requirements considered by device or by interface
appropriately? Is the customer of the measure well understood? Is there use and
intent with the measure well described whether it is payor, medical management,
quality assessment, et cetera, and is it maintainable? Does it recognize
changing, evolving code standards in nomenclature work flow processes, roles
and responsibilities in the clinical care environment, and accountabilities,
again, as has been mentioned already and of course has it validated?

There are fewer words per slide going forward. To address the
implementability part though really getting at the task at hand in HIT, I think
as we have seen in the QDS work and the NQF work here that already as Floyd
described can we assess the numerator and denominator using standard data
elements? Is that even possible? Is it realistic? By the standard data elements
therefore then if they are to be used in the EMR typically and even if they are
in the EMR, are they well populated? There are several hurdles to scale,
mountains to climb. Does the measure rely on a particular HIT feature or
function for its acquisition? If it does is that being used? Does this require
use of a standard template or a form or a documentation widget? Or if a
physician dictating, is the data element achieved in another way? Are these
functional requirements considered actually at the time of the measure
specification? It could be clear to the HIT implementer how the designer
expected the data to be gathered.

Practicality of use in clinical practice with HIT. Are the standard data
elements well populated as I mentioned? Are they being captured automatically
or as a byproduct of care? Do the methods of data capture measure or are they
biased in any way as the measure is being assessed? And of course two critical
forms of error are biased both systematic, nonrandom, but systematic and then
random as well and they will of course bias the results. Are they captured as a
byproduct of care or is it outside the routine clinical workflow? If it is
outside the routine clinical workflow, who is going to get it into the system
so we actually have it as a source of measurement? Does the workflow in which
the measure is captured bias the measure in anyway? And were these workflow
considerations considered when the measure was being specified?

Second and last on the practicality of use of these measures in HIT. Does
the data source itself bias the measure in any way? One lab versus another may
have systematic, nonrandom, or even random error and they could be different.
When these data are pooled they may therefore bias measures. Are there
different coding schemes with partial concept consonants? I think that is a new
phrase, Paul. I have not heard that before. If the terms defining the elements
being measured are stored in the record aren’t exactly the same thing even
if they have a standard code and label applied to them. If they are not exactly
the same thing we will have this concept dissidents rather than consonants.

In the manual data sources does the quality of the data used in the measure
vary by which person and which role is gathering the relevant data? A classic
example here I think is the nurses are trained to gather the BP only measuring
at the 5’s. We are trained as physicians depending upon the level of
intensity of your professor of medicine to either measure on the even numbers
or even on the even and odd numbers. The nurse might take the BP and get 95
over 65 whereas I might take it myself and get 98 over 62 whatever.

Can a measure report be implemented? It is one thing to have the data even
if we have gotten this far in this system. Can the measure report be
implemented in a useful way for each use of the measure or each user of the
measure? Can the same measure scale for multiple uses? Can the measures be used
for different purposes, et cetera, at the point of care for the medical
director, for the payor, for public health reporting? I think this is one of
the visions in the NQF CDS work or QDS work rather that there will be a
standard toolkit, a set of measures which could be used and reused in multiple
different ways and have the same validity and integrity hopefully.

The last point thought is about this maintainability or adaptability and
this is something I think those who implement EMRs and try to derive data from
them wrestle with on a daily basis. Even if you have gotten this far and there
is some kind of data in the record and you are trying to get it out of the
record to make reports, does the measure support quality reporting at the point
of care? Is it biased in all the ways we have talked about or is it biased
actually in use in some interesting and usability type ways?

The key problem we face routinely is simple things like provider’s
assignment is the report about my patients only or is it about any number of
patients, which I might own or not own in my provider panel specification, et
cetera.

But equally important I think and from a sustainability point of view, can
the measure be updated easily and practically. Is there a way to change
numerator and denominator specifications easily and practically? Can we change
coding standards and evolve and maintain the history of evolution of definition
or coding standards being used and can we evolve the different nomenclatures or
vocabularies that might be used, SNOMED or other? And even within a terminology
can we actually handle the way they do the update process? In SNOMED this is
actually pretty difficult because SNOMED will do things like move an entire
branch of a tree from one place on the hierarchy to another place on the
hierarchy and any time you have any definition of anything using that branch
and if you use any of the hierarchical relationships, you have to go and look
back at all of your data and all of your specifications. For example, if you
define diabetes as a subset of a variety of trees in SNOMED and they move one
tree or one sub-tree from one place to another, that whole subset
classification, which could be a critical element of the measure, has to be
updated and that can be very tricky.

I guess lastly I just want to point to the issues therefore more broadly of
semantic and syntactic integrity. How do we maintain these concept definitions
and how do we evolve and migrate concept definitions as our needs change based
upon measure requirements? I think Floyd raised the issue or rather Larry asked
the question of is there a common information model for measures. Certainly the
HL7 QRDA reporting data architecture is kind of working towards that end and I
don’t know a great deal about this. It’s not yet been relevant to our
work but I think they are attempting to kind of define a meta-model if you will
or a common information model for quality reporting. I think that might
actually make sense in the long run but it may simplify things in ways that we
really need to have them simplified.

Lastly, is there messaging and reporting where standards methods for the
extract transform and load procedures that have to be done in the HIT with the
measure specification to get the data out and report on it? What is the
architecture for knowledge management and curation? One of my favorite topics
at Partners Healthcare. We have built a group there to worry about this kind of
problem and as definitions and terminology and specifications change on a
regular basis it’s not just define it once. It is define it and update it
and update it continuously. Thank you.

DR. CARR: Excellent. You didn’t need that time to prepare. It was
excellent. Let me open this up to questions. Maybe we could get the lights back
up as well. Could I start off with one question and maybe it is a naïve
question and maybe I missed something. As we are looking toward our future
state, certainly having the elements in the EHR will take a long way, but as I
think about registries and taking that data and turning it into specific
measures and/or even to do drill down, it seems that there is a need for an
intermediary data aggregator who can take and pull data from perhaps different
sources or can do manipulations and apply risk adjustment. Is that a
fundamental piece of the big picture?

MR. EISENBERG: Yes, I think a data aggregator definitely is a fundamental
piece of the picture. In enterprises, say hospital enterprises, it may be that
the hospital system acts as the aggregator as opposed to necessarily using an
outside registry. In other cases it would be a health information exchange or
registry. But I agree. There does have to be an aggregator capability.

DR. MIDDLETON: And I would agree too, Justine. From an architectural point
of view it is clear that the sources of data on which we wish to report will
come from a variety of different places even within one institution and even
potentially within one clinic. Depending upon the instrumentation of that
clinic, laboratory data sources, other types of physiologic assessment, BP
assessment tools, whatever, the data will be aggregated once for clinical
purposes in the EMR in some form or another but often times for reporting it
needs to be combined with administrative claims data or other sources of data
again. And further not only is it combined but also it really is cleaned up. I
haven’t seen a record yet and I have seen a few in details that really
have data in the clinical record in a way that is ready for this quality kind
of reporting.

DR. CARR: It seems that as we think about that for all the things that our
earlier panel talked about in terms of risk adjustment, are there 35 risk
adjustors or 6 and are we all using the same ones because even today I think
that is an issue across various roads. Do we have resources or committees or
panels focused on this?

DR. TANG: Focused on what?

DR. CARR: On the data aggregation function and risk adjustment. We had EHR
and meaningful measures.

DR. TANG: Somebody I’m sure know more about this than me but there is
PQRI registries, for example, and I believe some vendors will actually extract
the information needed and will act as the PQRI registry and extract the
information from your EHR for submission. That would be almost the best –
I mean that is an intermediary, but the nice part about that is it is coming
from your EHR. Now we need to have an API so that you can have some kind of
aggregator act for multiple EHRs but that is at least a model for how things
should be.

DR. CARR: In the administrative data world UHC is that kind of thing where
there is this standardized risk adjustment model and a cleaning up of the data,
but even hearing today about the variability and risk adjustment that seems
like a whole body of work that will be critical to meaningful measurement.

DR. TANG: But even without risk adjustment though –- I mean the whole
clean up that –- I guess my question back to the panelists are how
feasible is it to get to the F7 and the other is Blackford you had a long list
of considerations for how to consider which – the make up of the quality
measures. Who should be responsible for all that to consider all that?

DR. MIDDLETON: Well the NQF obviously. Right, Floyd? First of all there is a
question about the method if you will on a national scale. What we are figuring
out on a local scale is somewhat arcane and laborious procedures that you have
to go through to get the data from all the different systems in this
aggregation place. We call it quality data warehouse. It is parallel to and
corollary to the clinical data repository and even another one called the
research data repository. But whatever the architecture I think this is an
excellent line of thinking. What do we actually need to have by a way of an
architecture nationally for quality data reporting and management. Starting
locally with QDS and specification, implementation, and use, but then that is
all the afferent limb on the efferent limb. What do we do about data
aggregation, extract, transform, and load and et cetera? The problem I think
will be ameliorated to a degree when we have much more standardization of the
cardinal data types in the record. When the problem list has to be SNOMED if it
is going to be SNOMED then theoretically over time the problem list will clean
up itself in use; however, when the problem list is in evolution from ICD-9 to
ICD-10 or maybe partly to SNOMED and all of the rest of it, we are going to
live in this room of cleaning up the data all the time. I think it is going to
be extraordinarily valuable service if you will of the second question is who
pays for that if it is coming from small office environment, et cetera. How
does that get managed all the way to CMS and all the way back down?

DR. TANG: It almost seems like we have to in Floyd’s term the workflow.
We almost have to prescribe from the measure developer point of view or
upstream the clinical trials folks deciding who enters the blood pressure
therefore manifest in the EHR has to be a place for that kind of person to
enter the blood pressure so that information can be stored and passed on to
fill that measure definition.

MR. EISENBERG: Well I think when I was talking about workflow is what
information do you want. If I wanted a patient engagement I really might want
not blood pressure from – I’m not trying to create a measure here but
just hypothetically. If I want to make sure the patient is doing their own
blood pressure at home then I want it to come from the patient and I need to
find a source for that. If there is not a source obviously that is a problem. I
don’t know that it is prescribing what the EHR looks like as much as I
know the data I want and how the EHR captures clinician entered or patient
entered data if you will is in some ways up to EHR as long as what we are
looking for is identified as long as we are clear what we need because then
both can kind of build toward what would be the Rosetta Stone in the middle.

DR. GREEN: I want to thank Floyd for me was a pretty darn elegant model and
I appreciate it very much. As is often the case someone presents an elegant
model and you realize how sloppy yours is. I still practice medicine and as I
was looking at that model I kept trying to map it to a mental model why care is
obtained and rendered. Then your last comment just then about if you want a
blood pressure. The entire asynchrony of healthcare that is our future seems to
me has to be harmonized with this process and it sounds to me that NQF is
further along in its modeling than physicians and nurses and other folks are
with their models. I am ventilating now.

I really have a question for Blackford. Could you just say more about two
things on those slides? One is you asked the question about are repeated
measures independent and can you just talk a little more about what the nature
of that independence is to you and the other one is about your scaling.
I’m not sure I understood your point. I thought you were talking about
scaling them for multiple uses across settings and sectors and that sort of
thing, independence in scaling.

DR. MIDDLETON: These are relatively off the cuff remarks because I literally
did prepare them this morning, but here is my concern. In many ways the
statisticians will worry about kind of whether or not measures either repeat,
one time or composite are of course actually reflecting the underlining concept
being assessed. The issue with repeat measures is if they are not independent,
if they are dependent, there may be a false assurance that you are actually
measuring something – the measurement may be better than the state
actually is because the independent measures will confound one another.

Similarly, what did I say the first time? The other one. You can be over
confident or under confident based on the independence or the lack of
independence of measures. This is the same kind of thing that happens in
decision making actually. When you ask physicians of course about their a
priori assessments of the probability of disease and relationship to findings.
If you find the coexistence of elevated ALT, elevated AST, and maybe a total
bili elevated, it doesn’t give you much additional information then simply
seeing elevated total bili for hepatitis or some obstructive disease in the
liver. Similarly, I worry that if we have multiple measures of quality that are
highly dependent, we may not actually be measuring more than one central
concept. If you have independent measures and you can assess them as being
independent, you are actually get much more information.

The second thing about scaling sort of the simple thought there I think is
does the measure have the same applicability when measuring 20 or 100 or
100,000 or a million, does the same measure apply to populations in some kind
of way? I’m not sure I am going to be able to express well as well as can
it be reused in multiple context, multiple purposes. There is a scaling issue
about measuring more than just my clinic how I measure entire populations.

DR. CARR: Just one comment on that. Just to get back to one of the comments
from the earlier session again the importance of being able to drill down on
data because having this data source as well as other descriptors of the
population who is being measured becomes critically relevant. In other words,
if you have a bimodal cut population and your composite is average then you
have missed the opportunity to improve the low performers. Again, it goes back
to this intermediary aggregator. How do you configure the data and make it
available in such a way as to take it to the next level?

DR. MIDDLETON: That was the notion of understanding the underlying
population distribution for whatever the measures is trying to assess. If it is
not normal you may miss completely different subpopulations in a bimodal thing
or other types occurs and simply miss it.

DR. FITZMAURICE: I guess the second part of my question has been answered
about how do you get the distribution reported. The first part is the data
aggregator, is it the aggregator a part of the enterprise that is doing the
reporting that is part of the single hospital, the single physician group, or
the single chain of hospitals so that they have control or maybe they get to
see it before it is reported as opposed to some comes in from here, some comes
in from there and then it goes up and the person who is responsible for it at
the enterprise offices. I don’t know where these numbers came from. It
seems to me that it has to be aggregated at the enterprise level first then
when reported up, you have to report at least the denominator so that people
can multiply the numerator by the n and aggregate across, get the total
numerators then total denominators. If you just get fractions you are getting
averages of fractions rather than and aggregated the whole population.

And the second part I think Justine brought up and you answered is that how
do we get the distribution? You find out a lot of things from distributions
then just from one point.

DR. MIDDLETON: This is helpful actually to discuss these thoughts, which are
completely off the cuff. In a way, Paul, maybe a heuristic Justine that we can
take away is at a level we want to argue for a parsimony of measures. We want
to have the minimal set of required measures to assess quality because more
than that is just burdensome in all the different ways we have been discussing.
Is there one diabetes measure that will really cut the mustard?

DR. TANG: One of the challenges is that we have multiple siloed measure
developers and diabetes of course is cross cutting among specialties. And even
though there is NQF endorsement criteria that ask them to look at the
contribution of the next measure, we have a challenge when it comes to -–
who is making the decision? Is this one better but actually it would be better
yet if the such and such – and then you actually have to retire the other
measure otherwise the public reporting folks are going to ask you to do that
and just adds to the measure burden. The cycle time for that is every three
years. It is timing, et cetera. As you were talking I was thinking about how to
have some kind of coordinating body over the measure developers and that is
where it actually goes back to QDS. If the QDS and its authoring tools somehow
at least in form, the different silo’s creating measures perhaps there
would be less new marginally contributing measures developed.

DR. CARR: We have time at the end to come back to some of this discussion.

MR. EISENBERG: I was just going to agree with Paul, that I think the QDS
will help that because one is it will keep the elements and the sets of codes
you are looking for somewhat consistent because without that reuse of the
measure with different levels and versions of coding is going to change
reporting and is going to affect that. I also think that there are sometimes
and the additional data elements are often used for risk adjustment or for
adjusting performance. Those additional elements can be added to the measure
and reported even though they are not calculated in it. As long as that
availability is there, I think that may keep measures fairly consistent so that
you could do some analysis based on some variation with additional elements.
Does that make sense?

DR. TANG: Maybe NQF can offer that service as proposals come in using the
QDS and instead of measures to try to view its own sensitivity analysis in a
sense. Is NQF funded to convene the supply chain to start talking about the use
of and population of QDS?

MR. EISENBERG: We are. We are funded to continue and that is advice we are
looking for you as our HITEP chair, where we move forward for this.

DR. CARR: Did you have a comment? With that then, I would like to
reintroduce Sarah Scholle and then Kathy McDonald is on the phone. Kathy, why
don’t we start with you?

Agenda Item: Meaningful Measures for Care
Coordination

MS. MCDONALD: Would you like me to just go ahead and start with my remarks?

DR. CARR: Yes.

MS. MCDONALD: Do you have my slides there?

DR. CARR: We have handouts. Let me make sure everybody has them in hand. I
think they are in our blue books here. They are in these blue folders.
Meaningful measures for care coordination. Kathy, can you just say a little bit
about your background and then start right in. We have our slides.

MS. MCDONALD: Are they showing too, because there are a few spots there in
color? If they are not in color I can make sure to point out what I am talking
about.

DR. CARR: I think we are set. Yes, we have them on the screen. Just say when
you want us to change and we will take care of that.

MS. MCDONALD: Okay. Thank you. Can everybody hear me okay? Okay great. I
work at Stanford University. I am a Senior Scholar here and Executive Director
of two research centers that operate under the umbrella of Stanford Health
Policy. I have been involved in quality measurement development for about 12
years, and will sort of do some further introductions in my remarks here.

I appreciate the invitation to speak on this topic of meaningful measures of
care coordination. On the second slide I can introduce some of the work that I
will be drawing from as I make my remarks. If that slide were up that would be
great.

I have been the Associate Director and an Investigator at Stanford and the
UCSF Evidence-based Practice Center for the last decade and during this time we
have produced numerous evidence reports on healthcare quality and patient
safety. The three that are the most relevant building blocks for today’s
discussion are listed on this slide.

The most recent evidence report focused on care coordination as part of a
series called Closing the Quality Gap that reviewed quality improvement
strategies from the 20 topics identified by the Institute of Medicines National
Priorities Report as most promising for improvability. A care coordination was
listed as one of the cross cutting areas and our report included background on
the topic of care coordination especially stakeholders’ suggestions about
areas that required more research including measure development, a working
definition of care coordination, conceptual frameworks, and systematic reviews
pertaining to care coordination quality improvements.

This work will serve as a foundation for a new project recently started and
supported by the AHRQ Quality Indicator’s Program on care coordination
measure development for ambulatory care.

The other two reports listed in my background as building blocks for this
conversation constitute the initial evidence reviews for the first three AHRQ
quality indicator modules. The inpatient quality indicators, the prevention
quality indicators, which were both covered in the 2001 report and the
patient’s safety indicators, which were presented in the 2002 report the
evidence on those.

Next I would like to give you a little history of the AHRQ quality indicator
program since this is the home for the new care coordination measures project
that I am leading. AHRQ’s QI program and the evidence-based practice
center program each represents wonderful examples of producing research that is
tied to the actual needs of the fields. The EPC program routinely and publicly
asks for nominations of topics where the expectation is the stakeholder will
use the product of an EPC’s work, that is the evidence base synthesized
well and comparatively were applicable.

The AHRQ’s QI work started within the EPC world after AHRQ and HCUP
partners requested an evidence project to refine the original HCUP quality
indicators. The AHRQ quality indicator development has always been grounded in
evidence-based medicine methods applied to measurements.

In addition the measurement work has always been tightly coupled to
user’s needs since AHRQ already had its HCUP partners who wanted tools to
work with their data. The motivation for the original work was to satisfy the
needs of those who were collecting the data and who were working in their
states to supply hospitals, legislators, policy makers, and of course the
public at large, patients and their families with something meaningful based on
the routinely collected administrative data sets available at the time and
still in use.

As the program evolved AHRQ initiated the support contract to assure ongoing
refinements of the indicators, and this represents the guiding philosophy of
the program that of continuous quality improvement based on user experience and
changes in the medical evidence. In addition, the program includes expansions
within domains and data sets initially covered as well as expansions to new
domains without ties to any particular data set in reflecting new priorities in
the healthcare field, such as, the new project on care coordination measures.

All along AHRQ and the QI team have continued to innovate to expand
measurement methods always evaluating measures from initial assessments to
implementation of the measures and then feedback and support throughout the
life of the measures in the field.

I would like to suggest two analogies based on these observations of the
history of AHRQ quality indicator program. First there is an analogy between
continuous quality improvements for industries that delivery goods or services
and a similar need for measurement to involve continuous improvement as well.

Second, and more apropos for the current session there is an apt analogy
between coordinating care to deliver seamless and effect of delivery of
healthcare to patients, and coordinating measurement efforts to delivery a
non-burdensome and effective picture of the system’s ability to deliver
high quality coordinated care. In those cases it is crucial to minimize
important gaps, gaps in care or gaps in measurement. We should be most
interested in gaps in care that has the potential to contribute to bad outcomes
for patients, and similarly in the analogous case we need to pay special
attention to gaps in measurement that could result in missed opportunities to
improve the quality of care. It is often more difficult to pay attention to
what is missing as opposed to evaluating what is present.

I’m going to shift gears a bit and give my perspective on the four
questions posed as the goal for this hearing with the focus on care
coordination measures. The first question that was part of the agenda was how
do we approach building meaningful measures. For care coordination I think it
is important to think carefully about this domain and how measures might best
promote the best care. For care coordination I think there are four main
concerns. First in care coordination we have to have a working definition of
that is. In our previous work in evidence-based practice center we found over
40 unique definitions of care coordination, which we used to develop. The
working definition I would be glad to share that if asked.

Second, in considering care coordination measure development we need to draw
from one or more conceptual frameworks for being able to deconstruct activities
that lead to well coordinated care so that we can determine the desired
outcomes that might be related to that and establish a causal chain about where
in that process measurement would be most valuable. I know that in an earlier
session it looked like there was discussion of structure process and outcomes,
the classic Donabedian model. That would be one framework. With care
coordination we might also want to think more creatively given the conceptual
challenges of this particular domain, and I will share some of our thinking on
this in a moment.

Third, in terms of developing meaningful measures we require some research
evidence that shows that any measurement proposed actually maps to components
of any of the frameworks that allows us to say that there is some base validity
to the measure.

Finally, in building meaningful care coordination measures we need to make
sure that a measure set adequately covers the areas most likely to drive
quality improvement efforts, have transparency, and ultimately gain the
outcomes that we believe are sensitive to coordination or failures in
coordination.

It was suggested that we needed to have a conceptual framework or frameworks
of care coordination as the basis for measurements, development and evaluation
in this domain as in any domain. I offer two examples. Go to the next slide
with conceptualization option number one. These aren’t meant to be
definitive or exhaustive. They are simply to exemplify the importance of using
some logic regarding the connection between what might be measured and how that
measure could monitor the situation.

In this slide we see a diagram adapted for management sciences and
organizational design. The key point of this framework is the care coordination
of the product of a good fit between information processing requirements of a
particular care situation and information processing capacity of the system or
in many cases the non-system delivering care. We can see that an area prone to
failure’s coordination such as hand off to transition of care as we think
about that we can keep in mind this organizational design framework and are
reminded that we might want to monitor to the fits. That is what is in the
middle circle between information processing capacity and information
processing requirements. On the left side of the diagram processing
requirements depend on the situation and empirical work has shown that varying
levels of interdependence, uncertainty, and complexity can be addressed best by
varying the mechanisms on the right side and used to provide information
processing capacity.

Eric Coleman from Colorado has led the development of the Care Transitions
Measure or the CTM. There is a 15-item version and 3-item version and also
interventions to improve transition from hospitals to home. One of his studies
about an intervention to improve these transitions offers a good example of
taking stock of the care transition situation by getting input from patients
and caregivers about the areas of interdependence, uncertainty, and complexity
similar to that shown on the left side of the diagram. He refers to them as the
four pillars: medication management, personal health information, follow up
visits, and specification of red flags. This in turn allows for establishing
and testing the mechanisms to address the coordinating needs of the situation.
In the case of some of his work a transition coach is a key feature of one of
the interventions and Dr. Coleman has evaluated and could be thought of as an
example of structural linking between settings of care.

Another key feature is a personal health record, which is part of the
operational processes box on the right hand side as a supporting tool to
provide coordination and it is used along with specific casts for patients and
coach aimed at specific goals.

For measures we could target any of these steps in assessing the fit between
the needs of situation and the mechanisms to address those needs, or with
measures we could focus on understanding patient’s preparation for the
discharge transition as is the case for the care transition’s measure,
which is based on direct questions to the patient.

Another approach to conceptualization is shown in the next few slides that
again try to reduce care coordination as the components that might be
measurable. We built is the next slide up conceptualization option number two.
We built upon NQF’s work and mathematica’s evaluation of care
coordination demonstration projects and lists essential care tasks in bold and
associated coordination activities in italics. Measure development could be
organized around ways to observe or quantify the activities in their efficacy.

Then on the next slide we note common features of interventions to support
coordination activities in bold and list examples in italics. The extent that
there is evidence of the presence of the supporting feature improves care
coordination it would make sense to consider measuring the presence of the
feature. For example, in a 2008 article in Health Affairs by Diane Rittenhouse
and colleagues they used data from the Second National Study of Physician
Organizations and the Management of Chronic Illnesses or the NSPO2 to quantify
the extent of adoption of infrastructure components of the patient-centered
medical home. One of these measures is the care coordination integration
component consisting of several questions and resulting in an index from zero
to five, which asks questions about use of electronic medical records, exchange
of information across settings and presence of registries and nurse managers
for specific chronic diseases. All of these concepts and examples are shown in
our second framework.

These frameworks can be used by measure developers to consider the
relationship of potential measures to the ways the measures might be meaningful
to improving care coordination.

Now turning to the second question posed for this hearing, what is the
current process for developing measures? Does it adequately address measure
development for key national priorities and subpopulations? I would like to
give a quick thumbnail sketch of the AHRQ QI process and its application to
care coordination and measure developments. The remaining three slides provide
an overview of the process and evaluation criteria.

The first slide labeled indicator set development shows a standard AHRQ QI
development process, which starts at producing a list of candidate indicators
based on a variety of sources and leads to an evaluation of each individual
indicator, the development of a set of indicators after a selection process,
and then evaluation of the indicators and practice, which leads in turn to
further evidence and ultimately feedback to improved measures.

For care coordination given the dynamic nature of this area with lots of
work underway we have added the box in red call in development to highlight the
importance of looking for measures and development.

The next slide shows the steps, sort of a stair step, to evaluating each
promising indicator in our candidate list. We start at the bottom and work our
way up the steps. If an indicator doesn’t capture an aspect of quality
that is important and subject controlled by the healthcare community, it
isn’t going to be meaningful. For care coordination we have added the red
text emphasizing the patient in the first step. This is an area where failures
in coordination often only experienced by the patient. This is particularly
true in transitions across settings for example. Of course there are also times
where the patient does not know immediately or perhaps ever that a failure of
coordination has occurred as in a missed diagnosis based on test result that
the provider missed getting or seeing or reacting to.

The rest of the stair steps are no doubt were understood by this group and
already covered to some extent in remarks from earlier speakers. I can go
through these in more detail if desired later. The main point is that we apply
these evaluation criteria of all AHRQ QI development efforts but can emphasize
new aspects of the new domain called for such enhancements.

Similarly, in the next slide labeled evaluation stages for indicators we
undertake three main activities of background research, external input and
supplementary research. Again, items in red suggest enhancements that may be
necessary for the care coordination domain. Two of these coding consultation
and empirical analyses lead me to the next question posed for this hearing.
That question is how do we introduce new data sources, clinical data from EHR,
user generated data, et cetera into the measure development process.

It is important to start with that anchoring to any one data set, keeping
the options for candidate indicators wide open and looking for data sources
that support those indicators or concepts. Expertise from those who have worked
with identified data sources has been critical as empirical analyses to test
alternative definitions, assess rates, variation, and relationship among
potential indicators and also to test risk adjustment methods.

A given data source sometimes starts a measure development process as is the
case of the HCUP data and the initial AHRQ QIs, but in the case of care
coordination I believe it is vital to draw from a variety of data sources to
reflect a patient, clinical, and organizational knowledge and experience.

The last question for the hearing is how do we maintain and update measures
and what are the health IT system implications. Here I would suggest that we
draw from the experience AHRQ QI program, which follows a continuum for measure
development to translation of measures into practice and then support for users
and feedback to the developers about opportunities for improvement.

In conclusion I anticipate that we might expect care coordination
measurements to follow at similar paths to what our team experienced with the
development of the patient safety indicators, one of the AHRQ QI modules. Eight
years ago healthcare’s heightened attention on patient safety resulted in
numerous research efforts to understand practices for improving patient safety
and ways to measure progress. Our group worked on a patient safety practice
evidence report called making healthcare safer and at the same time develop
patient safety indicators, which started from a modest evidence base that built
over time with validation efforts by our team and others ultimately achieving
NQF endorsements.

Similarly, AHRQ QI care coordination measures project will draw on our
experience understanding the state of evidence and care coordination based on
our research for the EPC report on care coordination. Given the increasing
level of activity in this area we plan to form a stakeholder and
informant’s work group to assist in identifying all relevant current
activities. In addition to our usual practice of inventorying and assessing
measures from prior published measure development studies.

In addition we recognize the importance of the research community in this
developing area. Specific interventions to improve coordination are still under
development and testing either as part of research studies or demonstration
projects. As a result evaluators and care coordination improvement strategies
need guidance about what is measure in their evaluations and how to prioritize
potential measures. To address this need the AHRQ QI team’s care
coordination measures project will form a second work group of evaluators,
evaluatees, and research experts to assist and develop an evaluation tool to
guide choice of measures for research purposes. This part of the project
responds directly to one of the conclusions of our care coordination evidence
report and that is studies of care coordination quality improvement strategies
have immense heterogeneity and measures making comparative assessments of what
works best to improve care coordination challenging. It’s not impossible.

Any QI development process must adapt to the needs of a particular domain
understudy. We have a toolbox of AHRQ QI development methods from which we
choose approaches best suited to a new area. As needed we also develop new
research methods or tools to tackle unique aspects of a new domain such as care
coordination. In contrast the dynamic and adaptable nature of our growing
toolbox, the AHRQ QI development approach has kept the same standards and
criteria for achieving meaningful measurements and I think that this experience
is highly useful for the goals of this hearing.

Thank you for the opportunity. I am honored to convey my experience and
thoughts on this important topic and I would be glad to respond to any
questions at any point.

DR. CARR: Thank you very much. I think what we will do is go straight onto
Sarah Scholle and then come back and take questions for both speakers.

MS. SCHOLLE: I would like to talk with you about some of the work that we
have underway with colleagues at Johns Hopkins and Park Nicollet to look at
care coordination and I just wanted to start off with key points that I will
try to make in this discussion. First of all to suggest that care coordination
can be measured by thinking about the structures, the processes, and outcomes
and that processes are most actionable but that is where we have the fewest
measures. What is challenging about the measures is that when you are measuring
this process, you really need to have it embed in care. Your measurement needs
to be embedded in the care delivery and it needs to help with decision support
as well as be something that leads you to measures that can help you monitor
care and improve care. In particularly when we think about measuring care
coordination in an electronic environment we need HIT systems that can track
essential data elements and can support the care coordination process, but we
also need the workflows that use the systems. I will try to expand on that.

What is care coordination? It is the information sharing that happens across
providers, patients, different services, sites, and across time. What are we
trying to do when we coordinate care? We are trying to make sure those
patients’ needs and preferences are respected and that care is both
efficient and leads to good quality outcomes. We know that care coordination is
most important for people that have the more complex situations because they
are seeing multiple doctors, because they have more complex needs but it is
likely to change over time.

Care coordination. It can be structure, process or outcome measures. I think
structural measures are an important starting place. They lay out what
capabilities need to be in place, information systems, staffing. They
articulate expectations. Process measures try to get at what is happening. How
is the information being exchanged? Is it being exchanged? Outcomes of care
coordination are probably the things that are most relevant to families and
policy makers. These are measures like readmission rates and failures of care
coordination are probably easier or more relevant to families and policy
makers, but the problem is that you need risk adjustment. Some of these things
like risk or readmissions are rare and they may be difficult to attribute to
particular actions and players.

We have an ongoing grant right now where we are trying to think about this
whole measurement framework in the context of vulnerable children, children who
are at risk for developmental delay. This diagram tries to lay out. It is a
starting point for thinking about this structure, process, and outcomes. What
do we mean by structure? Well it is for care coordination. In a primary care
practice it could be a process for tracking referrals and it might be different
in different levels of the healthcare system. At a state it might be service
capacity. Within a community it might be whether you have someone to act as a
navigator to help families understand what are the service opportunities.

The process measures. Here are some examples of process measures and I will
talk about it some more but it is really the information gets sent from the
primary care provider to the specialist as the information may get back. Does
somebody act on it? Is there a care plan that identifies who is responsible for
what?

Then outcome measures could be clinical outcomes. Poor control of diabetes
could be a failure of adequate care coordination. Patient and family
perceptions are an important factor but I’m not sure. The challenge right
now is that we don’t really have very good measures of care coordination.
I think we are starting to have better measures on the structural side. I think
we have some sense of what are good or bad outcomes like readmission rates but
they are hard to measure. We have measures of care coordination from patient
surveys, but that is probably one of the weakest points of a CAHPS survey today
and part of it is that I’m not sure that people really know what it is
like to be in a system of care that is really well coordinated. It is hard to
measure that.

In our research work we are really going to try to hone in on this from the
perspective of looking at children with chronic conditions.

I said that we have been doing a lot of work on the structural side and I
think that physician practice connections, patient-centered medical home. These
standards for what practices should have in place to organize care well through
managing their patient population, try to get at some of these structural
aspects of care coordination by looking to see whether practices have tools and
processes for tracking tests and referrals.

To some extent the measures that are in PPC reflect – are limited by
what we think a practice should be responsible for, is accountable for. It
looks to see did the practice have a way to track whether information went to
the specialists but in some ways that primary care practice can’t really
be responsible for getting the information back from the specialist because if
the specialist doesn’t want to do it. That is one of the challenges that
we have with measures of care coordination is like where is the accountability
and how do you measure in that set?

I have been working with Jonathan Winer at Johns Hopkins University and
several other of his colleagues there as well as Jinnet Fowles from Park
Nicollet, to think about measuring care coordination specifically in the
ambulatory setting and to look at the process of referrals and consultations,
communication between a primary care practice and medical specialists. We are
just finishing up this work that was an initial phase of trying to identify
potential measures, develop preliminary technical specifications and then to
work with some practices and clinicians to try to understand where these usable
and acceptable measures.

This is the model for ambulatory care coordination that we developed in our
project through a number of iterations with clinicians and key stakeholders and
on the left side you see the logic model, the process flow of care coordination
from the physician discusses options for referral with the patient, identifies
relevant information that needs to go to the specialist, sends the information
with the request. The specialist receives the information, sends the results
back, and shares it with the family. The primary care physician sees the
information acts on it whether that is updating the care plan or talking with
the family.

That was the process flow that we developed and then we talked with our
expert panel and with clinicians and others in practices about what are the
meaningful points along this process where it would make sense to measure that
the measure could be useful for informing the care process. It could be useful
for monitoring care and measuring.

What you will see is that I have these in different colors. The measures
that have the dotted lines are measure concept that we thought about but
dropped and the reason we dropped those, actually several of them have to do
with communication with patients about shared decision making. Our panel said
focus on a narrow set, a small set of measures that we can really get
implemented and they said that is really important but it is not happening so
don’t start there. Updated care plan was the same way. They said really
important but there’s not one yet so don’t try to measure it yet.
Start with the pieces that are most reasonable today and that is where we get
the piece of critical – it is really a referral feedback. It is that the
PCP sends critical information to the specialist. The specialist gets the
information. The specialist sees the patient, takes care of it, sends the
information back, and the primary care physician gets it back, referral loop.

Now what we have also added in here is that some of that information is
communicated to patients both from a primary care physician this is why I want
you to go see the specialist for this reason and also that the specialist
communicates the results to the family. The green ones are the ones that we
thought were primarily on the accountability of the primary care provider and
the blue are on the accountability of the specialist. These are the ones we are
moving forward with.

I also want to mention the one that visits schedule within a requested
timeframe. One of the things that we are interested in from a community
accountability point of view was did this visit happen when a referral was
suggested. We had to drop that for a couple of reasons. One because it
wasn’t clear who would be accountable for it but the other because of this
idea about a timeframe or did the referral happen within a particular
timeframe. This is not something that people document and that is what we
found. I want to talk about some of the measurement issues that we have
identified. This applies both in a paper-based world and in an electronic
world.

First of all, really do any measurement here of completed referrals if that
is in your goal whether you get completed referrals. You have to kind of know
which ones you really expect to happen. When we talked with clinicians and they
said basically there is some referrals that are stats, some referrals where you
really want them to happen maybe in the next couple of weeks, and then other
referrals are just kind of when you think about it or they might be I want you
to get a colonoscopy in the next six months but they are not triggered to
timeframe. That makes measurement very hard and that information we found it
but if it was in there, it wasn’t defined and it wasn’t in a way that
would lead to measurement.

Effective communication with patients and families. We saw where practices
were using clinical summaries, visit summaries that they print out from the
EHR. When we talked with them about that being measure that like did the family
get the results, they said I could tell you that the report was printed. I
don’t know whether it actually got to the family or anybody talked to them
about it. That is a challenge for measurement.

Accountability is an issue that I talked about a little bit that there was
concern about are you going to hold me responsible for getting the completed
report if that specialist never sends reports. What should those be?

The other interesting thing is that we did our work in both integrated
settings and in non-integrated. Communities where we were just talking with
private practices that were interested in they were using an EHR and we saw
really different concerns about care coordination that affect measurement and
affect – one issue is are you worried about patient dumping or patient
stealing. In the integrated systems they were concerned that the primary care
physicians were sending them to many patients that really didn’t need to
be seen in a specialist setting. In the non-integrated settings they were more
concerned about well if I referred this patient to see a cardiologist just to
get a question answered, I will never see that patient again. The distinction
better referrals and consultation – Medicare pays different for a visit if
it is a consultation than if it is a referral. They pay more for a consultation
than for a referral. The integrative settings are really concerned about making
sure that they identify what is a consult versus a referral. A consult meaning
that you are just asking a question and you intend as a primary care provider
to provide ongoing care for that patient. Where a referral is you are
transferring here to the specialist. The integrated systems were really
concerned about this and in the non-integrated settings we really didn’t
see a lot of sensitivity to this. This could really change in an accountable
care organization model where those payments would differ.

The other piece is that actually in talking with integrated systems this
idea of the feedback loop and the information – we share a medical record.
They know that they can see in the medical record whether the patient saw the
specialist. They can look at the report. What does it mean? How do you know
whether somebody actually looked at the relevant pieces in the share medical
record to be able to say that the information was exchanged? This is so hard to
measure.

We actually have done site visits in three integrated settings and two
non-integrated settings to try to think about how to measure this and we have a
grant that we just submitted to AHRQ that we hope will be able to conduct some
research to try to understand this. What we learned is that even in all of
these settings where we had leadership wanting to improve care coordination and
HIT services to support it, we found that either there wasn’t electronic
functionality to report measures or the workflows didn’t really support it
or both. Even in integrated settings where they had this referral requesting,
all the data was free text. We didn’t have that timeframe. We didn’t
have the referral consult. It was very hard to construct a measure of completed
referrals and the same was true in the non-integrated settings where sometimes
the staff workflows didn’t support it.

As we go forward in thinking about how to use electronic health records to
measure this concept, we are really concerned about a couple of things with
EHR-based measurement. One is whether we might have underreporting of the
numerators. If you want to look at a completed referral in the EHR, often that
information is going to come back from a scanned document. It is the specialist
report coming back. It is scanned in, which means somebody has to take the scan
and document and report yes the doctor reviewed it and that is different from
the referral coming back. You might have parent quality failures that really
and truly the loop was closed but the EHR can’t identify it.

The other concern is whether the eligible population. Even when you have
this referral tool available, will people use that and can you track all the
referrals that happen? Those are some of the challenges that we have seen as we
have really tried to get into the details of measuring care coordination and
the ambulatory care setting.

We do think it is valuable to measure this process of care coordination and
to try to figure out how we can get some of these measures to come out of EHRs
but we think that having structural measures to support what the capability
should like and what the workflow should like and the staffing is going to be
important. Monitoring the outcomes will be essential as well.

We are hoping that these kinds of measures could really help to trigger
quality improvement, improve decision support, and help us to monitor the
quality of care.

DR. CARR: What about this? Could you just say did you see the last note from
the last provider and did you do what you were supposed to do? Just have it
sort of yes, no. I am being facetious but I mean we heard about that this
morning where we have so much complexity we become paralyzed and we lose sight
of the immediacy of what we are trying to say.

MS. SCHOLLE: I think part of what we are going to be trying to look at is
where would you put that question.

DR. CARR: Every time you log on to a new patient the first message that
comes up is did you see the note from their last provider. Who was their last
provider? Did you read the note? I am being simplistic.

DR. GREEN: Can I ask you to go back to one of your prior slides? It had
figure one on it, model for ambulatory care coordination. I feel a little
premature here in what I am about to do. What is setting up for me this
afternoon is I want to think about this overnight and come back to it tomorrow.
But what is setting up for me is I am learning something important I think,
which is how essential it is that the development of meaningful measures move
in lock step with explicitly articulated statements about the care process
about which the measure is to be pertinent. Justine, your question about that
and other comments that we make is teaching me how each of us at the table
carries around with us an understanding of the care process in which we
participate as a provider or a patient some way or another and that always is
coloring the way we think about measures and their development.

If you will indulge me I want to make two comments. Most of the
afternoon’s presentations have appealed to me as a health services
researcher type but Sarah’s and also Kathryn’s they appeal to me as a
physician trying to take care of folks. Let me use this figure to exemplify
this. As a physician who also served on the board of directors of a
multi-specialty group practice, this model for ambulatory care coordination for
a person with what we would take as part of our concept is that it was a
dermatological condition. It had to do with skin. It would be completely
different depending on one piece of knowledge. Is this person being in a payor
system in which dermatology providers are capitated or are they in a fee for
service model? Without knowing that piece of information this is nonsense. I
assure you. This is nonsense. You can work on a lot of measures that will have
no meaning if dermatology is capitated and you will be missing the measures
that will be crucial if they are in fee for service.

Jump from that to just what I saw happen a few days ago. Imagine that you
are seeing an 88-year-old mostly deeply demented Alzheimer’s patient who
has an unhealing lesion on the top of his left ear. He has already disrupted
the entire practice because he is confused about where he is. He is getting a
little noisy. He is getting hard to handle. Now we need to decide if we are
going to cut this guy’s – how much of this guy’s ear are we
going to cut off? That is the crucial question. Or are we just going let this
thing keep eating away at his ear until he dies of the complications of his
other morbidities? The way this problem actually gets done is how many of you
have a camera in your pocket right now? Debbie doesn’t seem to have a
camera. The way the ambulatory care coordination occurred was photograph, email
to the dermatologist, phone call saying look at it now, 90 seconds later
dermatologist says if that guy is going to live longer than 6 months he should
have more surgery on this and we can do it Monday and he goes home. Now look at
the things on here that don’t exist to measure what happened. I would
argue that that was optimized care.

I want to go back to my two examples to developing measures that are not
contextualize into the care process almost certainly compromises
meaningfulness. What a humbling presentation you have made.

MS. SCHOLLE: Does it work for many situations of those outliers that you
just described?

DR. GREEN: They work for many situations. Like I said I felt like I might be
premature and I think I am a little premature. I would like to think about this
probably the duration of my tour of duty on the NCVHS. This strikes me as a
pretty significant challenge in getting to the meaningful use of measures.

MS. MCDONALD: Can I just comment on that comment? That is why I actually
shared the somewhat complicated conceptual diagram of organizational design
frameworks because I think that does capture the statements that were just made
that you really do have to think about the exact setting and the exact patients
and what the interdependencies are. You mentioned the form of payments, the
types of contextual factors that would then relate to what is going on that
would then cause you to think about different coordinating mechanisms like
taking a picture and sending that to a dermatologist and there under those
circumstances you had a good fit, the best fit possible perhaps and what is
hard to do is to measure that, but we can intuitively tell if that was a better
option under that kind of context and sending the Alzheimer’s patient over
to the dermatologist. That is why I showed that diagram because I think
thinking in a linear Donabedian framework may not be as appropriate for care
coordination as it is for some other areas.

DR. CARR: Once we have the measures who is accountable?

MS. SCHOLLE: This was clearly a concern. We talked primarily with primary
care physicians and they were interested in having the denominator changed so
that the question about whether the critical information was referred that was
sent with the specialist, that was fine for everybody for whom they thought
needed a referral. But on the last one about the primary care physician reviews
as specialist report, they wanted that denominator to be for people who
actually saw the specialist but then what happened to the people who needed to
see a specialist but didn’t get one and that is really the concern about
accountability. At a community level it should be all referrals.

The other question on the specialist’s side was what about these self
referrals. When patients go directly to the specialists, is the specialist
responsible for trying to get critical information from the primary care
physician when they didn’t get a referral? It wasn’t really a
referral. It was a self referral. The patient just went to the specialist.

MS. MCDONALD: We talked about this in terms of starting up our projects on
inventory and care coordination measures and audiences and accountability and
such. We thought that maybe the first step is to look at those patients that
are in some sort of system of care in thinking about measurement within that
system since a lot of care coordination failures occur in the white space where
no one is accountable. That would be the next level of measurements that would
be needed but then there is this problem of but no one is accountable. If we
measure it what will really happen. Our thinking is that maybe what is most
useful is the start at least with patients for whom they are saying a patient
is under medical home, practice setting, or some sort of integrated system and
that does not provide a full picture of everyone and all care but it would be a
reasonable starting point.

DR. TANG: I am hung up on Blackford’s challenge about looking for
breakthroughs. It looks like the descriptions of decomposing these processes
into steps and then try to instrumental the steps. One question I have is has
the cost of complying with reporting been studied already? Yes. You are nodding
your head yes. You are nodding in agreement. My concern in today’s world
and that is the breakthrough I am talking about. You used to have to chase down
people with paper and pencil and then come in it. That seems so 20th
century and it seems like today care coordination is an outcome. I know it is
made up of processes but it is sort of like what Larry was saying. We all have
our own. They work locally. They work or they don’t work. But we want care
coordination. We don’t want a letter with the date and time stamp and a
– all that does is add to the cost more of trying to subtract cost.

It also does not leverage. In this office you might have the CNS, the
clinical nurse specialist who just drive your practice and thank goodness for
that person and that is how it is going to get done. But if I measured all
these other things not only would I pay for all these other things, I actually
am not helping pay for the person who is driving my care coordination.

Isn’t the time now to focus on the output? For example, readmission.
It’s not such a rare thing. It is 20 percent. Plenty of opportunity to
measure and to detect changes in whatever it takes for you to get care
coordination done. The reason I say the other style is 20th century
is I think we have new tools and whether it is – there is new tools to try
to figure out what our processes are and how to streamline or redesign those
processes but they are local. We are really trying to measure the final goal.
Is now the time focus a lot on the final goals and worry less and burden less
the precious resource we have? Primary care providers just aren’t a dime a
dozen and if we spend a nickel chasing after processes that may not be relevant
in a practice, I worry about that.

MS. MCDONALD: I think that is a really important point. There is a measure
out of a group in Australia. I think it is Clara McGinnis and Beverly
Sibthorpe, simply asked the question of a patient. In the past three months how
often did you feel the care you received was well coordinated? Of course the
issue was having not be the outcome as sometimes patient’s perceptions are
not going to pick up everything. But certainly prioritizing something related
to the perception of coordination on the patient’s part, the family’s
part and the health provider’s part would be important in this whole
picture.

DR. TANG: And I know Sarah wants to say something too but that is what I
thought about when Justine asked her question is in Amazon it pops up and that
is how I like that vendor or how I like the product. Well, when you open up
that consult result request we can ask, did it meet your needs? Just like,
Kathy, you were saying the patient did it meet your needs and to the specialist
did it meet your needs. That actually might be a whole lot closer and more
meaningful in feedback to each other then all these processes.

MS. SCHOLLE: I guess I have been making a point that these are process
measures but in reality they really are looking at depend on structure.
Essentially what you want is that you want this structure to be in place so
that somebody’s asking – when they are sending a patient over for a
referral for a specialist’s visit that you have given the patient enough
information, you have given the specialist enough information to know what
services requested so that the patient understands why they were supposed to go
there and that somehow the information gets back. I do think your point about
are we putting too much – are we investing too much into getting a measure
out is a big challenge to us. On the other hand, we are going to lose these. In
an integrative system this is where the difference between integrative and
non-integrative is huge. In the integrative system okay. Well, the medical
record is there. You are going to have to assume that somehow the primary care
physician is going to look at the data that came back and it is going to be
there and have a chance to do it. But what happens in a non-integrated setting
is that that information doesn’t come back or if it comes back it may not
come back in a timely fashion and you have already seen the patient and you
have already done the same set of tests over again or when the specialist gets
the patient they end up doing those things again. The other kinds of outcomes
you might want to look at would be duplicate testing or numbers of visits.
There may be other ways to try to get at that area of efficiency and they are
going to be different in different settings.

I’m not convinced that we should say no to tracking this. When I talked
with people in non-integrative settings, they think this is hugely important to
try to understand what is happening because they think that a way to understand
their community and their system. If you talk with John Blair, this is why he
set up all the work that he has done in the Mid-Hudson Valley of New York.

MS. MCDONALD: The other reason that the intermediate outcomes or process
types of measures can remain important as long as there is not too large a
burden or pressure on especially these primary care doctors as was pointed out
is that that is often where there is action ability. If you can measure it and
there is already been linkage to something like lowered rate admission rates
when some process has been in place that coordinates care better. If you can
measure that and pick that up as an issue then that is exactly where the
interventions and actions can occur and improve quality. That is what makes
that part of the measurement potentially meaningful and useful.

DR. CARR: I just want to jump in here before Floyd, and really amplify what
Paul was saying. Just as we were talking as I was looking up the commonwealth
fund has put together why not the best, and you can look at all the core
measures. I just looked to see who were the top performers on did you
understand your discharge instructions and 95 percent of them are specialty
hospitals, surgical hospitals, orthopedic hospitals. In that split second, I
learned a lot of things that just one thing, one question, asking the patient
and immediately seeing that the procedural areas have got it down. I can go
back and learn from that.

I am just concerned that the complexity of this is so overwhelming that we
might never get there. We might not get where we are going as we are trying to
figure out how to measure. I think a lot of the benefit of measures is to ask
the question. Again, this was a story that was in Europe or something that some
consultant from IBM told this story, that there is a mountain in Switzerland
and there is a tunnel through it, and they wanted to say if it is daylight,
lights are off, turn them on. If it is nighttime and your lights are on, turn
them off. They struggled with this for weeks and weeks and finally they said,
are your lights on and that was the sign that they put above. I feel like that
is the right question.

We talked about this at the OHIMA meeting. We did this very complex glucose
augment and finally STS and CMS said was your glucose less than 200 on the
morning after surgery. I think sometimes asking one question can catalyze a
universe of systems that will answer in a way that is right for them and for us
to try to take the elements of the answer before anyone has completely answered
it could put us in a cycle that would not see daylight.

MR. EISENBERG: I was just going to add to that because I see a lot of
complexity and I don’t want to go back to my QDS discussion, but a couple
of the elements we identified in there are care goals and experience. The
experience is not just the patient’s experience but also the
provider’s experience. One of our other panels actually through HIT we
have a couple of expert panels coming up. One is in care coordination to
identify what elements are required and can we help push standardization of an
EHR so that we can manage measurement and part of that will be care goals
because it is based on the plan of care and the care experience. I hate pop ups
so I understand, but when I see something did it meet your needs to be able to
answer that answers a couple of questions. One is it means I have read it
enough so you can know that I have looked at. It also means there can be
feedback to specialists. If that specialist isn’t meeting my needs on a
regular basis I may not use them again but maybe somebody else ought to know
they are not meeting my needs. A lot of this is not in EHRs today and it is
written somewhere on the side.

I remember on paper every time I got something I would initial it to say I
have read it. I know people initial it when they don’t read it but it
shows you have looked at it. I don’t want to create that attesting in the
EHR but I think there are ways to go about this that are kind of in progress
that may help us out.

DR. TANG: I just hope we don’t get to a point where we need measure
coordination on top of our care coordination but it really sounds a bit like
that. Another example is Geisinger got a lot of attention for their program
they called proven care. All that is is an outcome measure. I leave it to them
to figure out how to pay less output on that guaranteed price. I think that
would serve us well as a way of measuring care coordination rather than all
these process measures. I don’t know. Maybe it’s a bit much.

MS. SCHOLLE: I think these points are well taken. I can assure you these are
questions that have come up in our mind as we have done this work and tried to
think about what is worthwhile. Where should we be leading the field? It is
really going to be a different issue for integrative versus non-integrative
settings but in the short term for non-integrative settings where having a
pathway to what should you expect and to what information should be shared with
the patient with the specialist. Is the specialist getting the information they
need and is that information coming back? Having a simple way to know whether
on a community level you are accomplishing that is important and it is built in
these pieces of did the information get shared and come back or it is serving
patients and families in a way that is also another expensive approach. We will
certainly as you can tell from our broader framework it is something that is on
our plate to try to think about whether the benefits and drawbacks of all the
different approaches.

DR. CARR: I think we wanted to have a little bit of time to just recap and
prepare for tomorrow.

MR. QUINN: I was going to say another way to look at this is not in the
context of care delivery so much but in the context of social networks. Imagine
a social network with the patient in the middle and all of these other people
around them. There is a world of measurement around the activity of social
networks, how well they are used as well as their indicators of how well
connected a patient is and people in a social network is connected. To use the
paradigm friending someone. Well, maybe we call that a data sharing agreement
signed among a particular patient’s provider to show that they are able to
receive messages. And then we can also look at it and I’m no expert on how
to measure social network activity but how much traffic is on there. These are
sort of things that can be done in automated way versus this. I think what we
are going to find is that looking at those patterns of social networks for
patients some of them are going to be better trafficked. Others are going to be
less trafficked and they are going to be very different networks in an
integrated system versus not. I think that that is a different paradigm than
this for looking at it and it’s not necessarily tied to all the
dependencies here.

DR. CARR: Thanks very much to both of you. It really stimulated a lot of
thinking and a lot of appreciation for the excellent work that you both are
doing. Thank you very much.

Agenda Item: Re-Cap and Discussion

DR. CARR: I realized it has been a challenging day in terms of incoming
information but we had set aside a little time until five thirty anyway to
recap and discuss. One thought I had would be to just get major themes that we
heard today and think about them because our goal had been as we talk about the
national priorities, the care coordination as one, tomorrow disparities, value,
population and health, we want to think about the lessons learned about
measurement today and then take that and apply to those priorities to identify
again are these challenges present in these particular issues. I guess I will
open it up to just have folks throw out themes that we heard today.

DR. TANG: Well, it was a very informative day I thought and I think the
challenge that most fits where I am thinking is what Blackford said. Where can
we go from what we have been doing which is very process laden and based on the
paper world to where we would like to go recognizing that it will have a new
application in health reform and it seems like we want to if you start with the
morning discussion where we talked about the counterpart to CME where you sit
down in some location and try to promote knowledge to practicing with ongoing
feedback and that is where the quality measures can come on. Ongoing feedback
about how you are doing and how to change. That would be really different and
as a consequence that might free us up to instead of saying do you have what we
think should be process A and B and C and look more towards are you getting
outcome E again can be measured electronically through some of the systems that
we are now putting into place. That would be a very different way of looking at
both what to measure and how to measure it and at least it stimulated my
thinking that.

DR. MIDDLETON: Just a couple of quick thoughts building on Paul’s
summary. I just wrote down several themes listening through the day. One is how
do we enable this idea of the real time quality and management and feedback and
reporting and perhaps have an aspirational new approach to quality as opposed
to the sometimes pejorative approach, that is, I’m not doing good enough.
I would rather it tell me, how do I do better. And maybe even combine with that
some of the CME and other sorts of affinity type activities that might relate
very well to both data acquisition and management for quality.

We talked a lot about the payors becoming harmonized around a core set of
national standard quality measures, which certainly would make things a lot
easier on the payor side perhaps, the HIT side, and maybe the provider side, as
well.

There was a subtle point I think, Frank Opelka made about actionable
quality. Make the reports, make the data actionable from the provider’s
point of view to try to bring them along. I raised the issue of what is the
sustainable business case for quality post-stimulus. We are in a great
transitional stage now period, but what happens when all that goes away?

I guess the last thing would be can we through all of this think about a
parsimony quality measures and approach and what national architecture does
that imply to get quality all the way from the point of care, all the way up to
CMS and to get insights and payment reform all the way back to the provider.

DR. GREEN: I thought one of the most important consolidating statements was
there has to be an aggregator data cleaner. I thought that was pretty much
right on. I don’t feel like we have matured our thinking adequately about
this, but I do hear a solid conversation about what was at one point called,
concurrency of clinical decision making, local care arrangement, and
Paul’s point about can there just be a few measures that are independent
of all of these things including type of practice arrangements. That’s
where the sweet spot is. They have to come together. They have to operate with
this word concurrency is a notion.

When they are not concurrent one of the apparent consequences is that things
get very complicated, very fast and they get very expensive very fast and they
probably get undoable very fast. That struck me as an important thing.

Matt didn’t develop it and we didn’t really hear this except from
his comments, but what a great thing to take home tonight to think about is
instead of thinking about this around protocols of care for chronic diseases
for mostly older people that have expensive problems is maybe we should be
thinking about measuring social networks. That is a good question, in my
opinion.

MR. REYNOLDS: What was that again, Larry?

DR. GREEN: That is an observation that instead of us continuing to think
about this the way we are accustomed to thinking about it are these care
processes, care protocols, the usual suspects and players that are trying to
communicate with each other. That maybe we should jump and think about seeing
this as a social networking problem and think about how do you measure the way
a social network functions and does it accomplish what it is supposed to.
Something like that. I’m not giving it justice.

DR. MIDDLETON: I can see the NEJM article, Larry, with your name on it. Just
like your health behaviors are basically contagious. You are only as smart as
the people you practice with.

MR. QUINN: If you are a primary care doctor and you don’t have a lot of
friends, then you are probably not coordinating care.

(Laughter)

DR. GREEN: Let me invert that. I think it was D.F. Fox back in Lancet, about
getting dangerously close to 50 years ago, also pointed out that one of the
functions of one’s personal doctors to protect that person from the over
zealous specialist and we have a lot of evidence that this is true- that this
is something that is true. When you start measuring the social network you can
argue that a robust primary care system that has few the other referral to
visit ratio is low. That measure can either be neglect or it can really robust
care and we have to have a measure that is a way of approaching that
measurement that distinguishes that.

I really like Matt’s idea. I wish we could talk about it more and I
wish we had someone talking to us about it. Do we have someone testifying
tomorrow that knows a lot about this?

DR. CARR: No, we did have that NCVHS afternoon. I don’t know if you
were there.

DR. GREEN: (off mic)

DR. CARR: That is why we keep meeting.

MR. REYNOLDS: First, my comments are not as chair of NCVHS. They are visitor
to this committee. I actually was pretty disappointed. I heard an incredible
number of experts be experts on their piece and experts on why their piece was
good and what is happening with it and I think that is excellent. But I
didn’t hear any grouping up. I have actually been a huge fan of meaningful
use and I use it a lot at home. I use it a lot everywhere I go. Some of the
ability starts sliding further to the right as I listened today, as far as who
is agreed with what’s in what column. If the industry in essence is truly
committed to move this versus prove that individuals or individual groups are
the one. That worries me.

I think some of the things that I did like — I think Blackford we ought to
do a primer on how to think about this with some of the slides you did and that
Paul did. It really decide this idea because again, when you go home and you
start thinking of all the people that are going to have to catch up that
don’t even know this is going on today and haven’t seen the chart and
oh by the way they got to get somewhere between now and 2013 to do something.
There is a lot of help that needs to be done at that end of it and we can use a
social network and we use whatever we talk about to get that group to move. We
have a lot of educated players around this table but there are a lot of people
out there that don’t even know what the table would look like. I think
that is a key thing.

I think we have to have a light to go to and I think all of us complaining
about things like HIPAA but somebody forced it to happen and said this is it
and now we have come out with the next versions, which maybe it is going to
make a difference, but it grouped everybody up. I think rather than just
inviting the vendors, which some of the things that we are going to make sure
that we do something that they will play in because I think we all would say
there aren’t many doctors writing their own systems and there aren’t
many practices out there with a whole lot of time to figure out how to use it
so we better deliver that in the right way.

The standard data set, I will speak as a payor. I would love to know there
was a standard data set and I am telling you we would do everything possible to
push people there so that once you have the standard data set then you could
start actually having audible discussions. Right now it is everybody picking
their own side of the discussion and then you quickly start losing your thread
of what I really think about and what data is or isn’t actually there. I
always call them home base. As an implementer I love a home base. We can argue
away from home base but if we don’t have a home base we just argue on
lines and premises that don’t happen.

I just think it is a great time for NCVHS to be doing this. I think it is a
great time for us to be partnering with David and others, because maybe in some
ways we may be able to say things that may be wouldn’t be able to be said
in other ways.

PARTICIPANT: Say more about that.

MR. REYNOLDS: I would rather not right now. I am only speaking as a visitor
today.

PARTICIPANT: (comment off mic)

MR. REYNOLDS: In other words, all the meaningful use hearings we had
observations. Out of these hearings we could have observations and/or
recommendations in doing so we could help move a ball forward because one thing
is they have their shoulder to the ball. They are having to push a lot of balls
at the same time. We happened to have a clean space to step back for a moment
and we are looking at this – we don’t have columns, but yet we have
understanding of columns and we have understanding of timeframes. We have
understanding of things we have done in the past, and the industry has done in
the past.

Hopefully we could create a body of work that could be picked up as Paul and
I talked, as David and I talked and others have talked, could be picked up and
used in ways to make recommendations that didn’t necessarily come just
from them. That would be my summary.

DR. CARR: I wanted to add one thing too. As we are talking very facedly
today about SNOMED, and SNOMED and problem list. I would just like to point out
that I don’t know many physicians who are nimble with SNOMED. I think
there is a whole educational piece that obviously we will have to do some catch
up work, but I mean really building into curriculum not just physicians and
nursing and all health education. There needs to be a track within the
curriculum about measurement and documentation or taxonomy or something because
we do not have a workforce that understands us right now.

DR. FITZMAURICE: Maybe the physicians don’t have to understand SNOMED
but their vendors do and have to give them choices that will lead them to a
SNOMED concept so that they can be added across.

DR. MIDDLETON: Harry, thank you for opening the door to allow us to maybe
say a little bit more about what we really feel. I struggled a little bit
because I found myself, and it’s not particularly that I know that much
about quality, but I found myself kind of in and out. I found a lot of the same
sort of presentations about kind of the same stuff in their points of view and
what not. That is all well and good. People are expert in their individual
areas and what not, but when I got to the late morning presentation, whatever
that was, I wanted to kind of get us to think about the bigger picture here. We
really have an opportunity to think about the bigger picture because CMS may
not exactly be accountable for quality. AHRQ isn’t exactly accountable for
quality. Who is the quality czar in this country? Is there such a thing? Could
that person with an HIT czar help reform with healthcare reform, help reform
along? I think that is kind of the bigger set of issues that have to be put on
the table. As one of the folks back home likes to say, as Steve always says put
all the blood on the table and then let’s see what is going on. I
don’t think we have done that yet.

MR. REYNOLDS: I would like to see the excitement on the table, excitement to
go to the same place and to make a difference on that list – and try to do
some of this even though I don’t always have the ball.

DR. FITZMAURICE: Social networking. Sunday, my son Jim who is 39 years old,
went out playing touch football with my son Joey who is 41 who was a
quarterback and a bunch of other kids, kids of 41. Somebody caught a pass and
my son Jim put his hand in there to knock the ball out jammed the index finger
so that it wound up breaking two metacarpal bones here. This finger was
pointing in the wrong direction and he drove himself to the local hospital.
They couldn’t get it relocated. They brought in a surgeon who was able to
move it and finally get it in, splinted it. They had a before and after. They
didn’t pin it or anything. He came to our house and his wife came to our
house too, with the kids. He gave me a disc that had a DICOM reader on it and
his images. I put it in and we looked at it and it looked horrible. It really
did. You wonder is that stuff going to stick together or not with nothing
holding it.

We sent it around to the FTSE Google group. It is a group of nine kids and
their spouses, and maybe a couple of others. The suggestions started coming
back. One of them has a relative that just got out of the military. He is a top
hand surgeon in York, Pennsylvania. He was at the Ravens game but he is willing
to not drink, not have any beer, so that he could look at it tomorrow and then
operate on it. Somebody else said here is someone who did Johnny’s ACL.
You can try him. A lot of suggestions just from people who are not very
knowledgeable in medical care. My wife is a nurse. One of my daughters is a
nurse. But we got good information that at least gave him some paths to go. He
decided to go to the ACL guy and get a referral to maybe a hand specialist if
the guy thinks he needs it.

Can you imagine having that with medical professionals, that you send it
around and you start getting this feedback?

What are the themes that I saw? Like Harry, I couldn’t see how to fit
it all together and move forward? I am struck by there is a lot of investment
and demonstration of quality measure effectiveness in changing what we want
changed.

I saw quality measure architecture and workflows. I liked seeing that. I
wanted to see here is the way it should be but I don’t know if we are
there yet or not. But I think that is what Harry wants to see is what should we
be working for. How did we make this happen? How do we aggregate quality
measures for effective change? Provider information for clinical decisions and
for patient choices.

Another theme. Is there a financial reward system that promotes/rewards,
quality improvement and coordination of care? I don’t know. Should we
focus on is the mean greater than the bar or should we focus on is the
distribution changing? The percentage of patients who are below the bar,
percentage of patients who are above the bar. Are we moving toward more and
more above the bar? The bar is wherever we set it.

We have a transition coming up, a transition from quality measures to other
quality measures to performance measures. We need to keep track of the data
elements and their attributes. We need to get into different versions of code
sets and this is over time. This is not just in the next three or four years.
We are having changes in coding systems. How do we handle that and does that
impact – do our measures mean the same thing? Do we have the same validity
as we carry across these changes?

I have already covered the social networking. Maybe a social network is I am
a patient and instead of giving me a disc, can you email this to this Google
group? Can you email it to my primary care physician so that he can get on the
ball and find the best specialist and get me some answers very quickly? It
starts to pull everybody together as a team and with some accurate information
and then maybe physicians come back and say, you know I really need this
additional information. And the tingling at the end of the fingers, is it
grossly swollen? That is what the military doctor said. I need to know that
before I can do anything.

DR. CARR: Well, thank you everyone. Thank you for calling the question here
because I think it really helps us to focus on the job that we set out to do.
With that I think we will call it a day and we resume tomorrow morning at nine.

(Whereupon, the meeting adjourned at 5:35 p.m.)