[This Transcript Is Unedited]
NATIONAL COMMITTEE ON VITAL AND HEALTH STATISTICS
AD HOC WORK GROUP FOR SECONDARY USES OF HEALTH DATA
July 18, 2007
National Center for Health Statistics
3322 Toledo Road
Hyattsville , Maryland
CASET Associates, Ltd.
10201 Lee Highway, Suite 180
Fairfax , Virginia 22030
Table of Contents
- Context Setting, Background on Seminal Work (Cont’d)
- Considering a Framework for Optimizing Uses Of Health Date
- Steve Labkoff
- Meryl Bloomrosen
- Impact of Standards on Uses of Health Data
- John Hamamka
- Floyd Eisenberg
- Methods of Protecting Privacy in Uses of Health Data
- Glen Marshall
- Lori Reed-Fourquet
- Providers Who Capture and Use Data
- Practice-based Research Networks – Kevin Peterson
- VHA Upper Midwest Quality Performance Improvement Alliance – Jennifer Lundblad
- Northern New England Cardiovascular Disease Study Group – William Nugent
P R O C E E D I N G S [9:05 a.m.]
DR. COHN: Please be seated. Good morning. I want to call this meeting to
order. This is the second of three days of hearings of the Ad Hoc Work Group on
Secondary Uses of Health Information of the National Committee on Vital and
Health Statistics. The National Committee is a statutory public advisory
committee to the U.S. Department of Health and Human Services on health
I am Simon Cohn. I’m Associate Executive Director for Kaiser
Permanente and Chair of the Committee. Now I want to welcome committee members,
HHS staff and others here in person and do want to welcome those listening in
on the Internet. And for the benefit of those with cable, I do want to tell you
that we actually are currently on the Internet. So as always, I would advise
you to speak clearly and into the microphone so those on the Internet can
appreciate our deliberations and testimony.
I want to again thank the National Center for Health Statistics for their
hospitality and for hosting us for this meeting.
Let’s now have introductions around the table and then around the
room. For those on the National Committee, I would ask if you have any
conflicts of interest related to any of the issues coming before us today,
would you so publicly indicate during your introduction. I would also expand
that to even if there are no outright conflicts of interest, if there are
interests or participation in the various bodies that will be testifying today,
you probably should publicly indicate even though it is not precisely a
I want to begin by observing that I have no conflicts of interest although
I am an active member of the American Medical Informatics Association which
will be one of the testifiers today. Harry?
MR. REYNOLDS: Harry Reynolds from Blue Cross Blue Shield of North Carolina.
I’m a member of the committee and no conflicts.
DR. CARR: Justine Carr, Beth Israel Deaconess Medical Center, a member of
the committee, no conflicts.
MS.. GREENBERG: Marjorie Greenberg, National Center for Health Statistics,
CDC and Executive Secretary to the Committee.
MS. BRITT: Myra Britt, I’m a contractor to the Work Group.
DR. Mary Jo Deering, National Cancer Institutes Staff to the NHII Work
Group of the NCVHS.
Dr. LOONSK: This is John Loonsk with the Office of the National Coordinator
for Health Information Technologies.
MS. GRANT: Sharon Grant, Booz-Allen & Hamilton, Contract Support.
MS. HAMILTON: Christine Martin Allison, Booz-Allen & Hamilton, Contract
DR. LABKOFF: Steve Labkoff, Director of Healthcare Informatics, Pfizer
DR. BLOOMROSEN: Meryl Bloomrosen, American Medical Informatics Association.
DR. OVERHAGE: Mark Overhage, member of the Committee and the Work Group,
and while a member of AMIA, I have no conflicts.
MR. VIGILANTE: Kevin Vigilante, member of the Committee, Booz-Allen &
Hamilton. No conflicts.
DR. STAMEN: Phil Stamen, Health Policy. I’m a member of the Committee
and no conflicts.
DR. STEINDEL: Steve Steindel, Centers for Disease Control and Prevention,
staff of the Ad Hoc Work Group and Liaison to the Full Committee.
DR. TANG: Paul Tang, Palo Alto Medical Foundation, member of the committee,
no conflicts, current Chair of AMIA.
MR. ROTHSTEIN: Mark Rothstein, University of Louisville School of Medicine,
member of the Committee and no conflicts.
MS. JACKSON: Debbie Jackson, National Center for Health Statistics
MS. HAZENER: Beth Hazener, America Health Insurance Plan.
MR. CONNORS: Chuck Connors, Geographic Information Systems Development at
DR. EISENBERG: Floyd Eisenberg with Siemens Medical Solutions, also with
HITSP Population Health Technical Committee and the IAG Quality Domain.
MS. SISK: Jane Sisk, Director of the Health Care Statistics Division at the
National Center for Health Statistics.
MS. CONN: Heather Conn, Informatics Specialist, National Center for Health
MR. LANDAHL: Morris Landahl, Office of the National Coordinator.
DR. COHN: Welcome all, as well as those listening on the Internet.
Before we move into the agenda review and before I actually turn over the
microphone to Harry Reynolds, one of our Co-Vice Chairs to actually run the
meeting, I do want to just make a couple of comments and go through agenda
review, and some of this will be a little repetitious of what we discussed
yesterday. But since we have actually everybody here this morning, it’s
useful to sort of remind everyone a little bit about what we’re doing, the
charge and, I think, the plan.
As commented, this is the second day of this first set of hearings. We
intend before we’re finished to have somewhere between six and eight full
days of open hearings, hear from a variety of stakeholders and interested
parties on the whole issue of secondary uses.
Now just to remind you, we have been asked by the U.S. Department of Health
and Human Services and the Office of the National Coordinator to develop what I
would describe as the overall conceptual and policy framework that begins to
address some of the secondary uses of health information both for a way of
thinking about it and certainly a taxonomy, and, I think as we began to talk
yesterday, certainly a definition of terms. And there are a lot of terms out
there that get bandied about, and I think if we come away as part of our report
providing some clarity for HHS and the country in terms of all of this, this
will be one very good outcome of the deliberations and work.
Now beyond that, and once again, probably the reason that we’re really
here talking is we’ve been asked to develop recommendations to the U.S.
Department of Health and Human Services on needs for additional policy,
guidance, regulation and public education related to expanded uses of health
data in the context of developing a nationwide health information network. And
obviously, this is a broad area. I think our intent and our initial area,
initial emphasis is on uses of data for quality measurement, reporting and
Now I should add that beyond this what I describe as almost high level
recommendations and frameworks and all of that, I would also interpret our
charge to include coming up with what I would describe as more specific
recommendations, actionable recommendations for HHS in relationship to tools,
technologies, other approaches that may be able to minimize any unidentified
risks that we identify in the course of our work. So it isn’t just telling
people to be educated better, to enforce regulation better, but certainly if
there are things that we can identify that really will minimize identified
risks, I think that will once again be a very valuable outcome and once again,
it gets us relatively concrete certainly as well as makes our recommendations
potentially actionable both in the near term as well as the longer term. So we
actually do have a relatively wide variety of work products that we need to
come up with, and we’ll talk about some of the activities and how the
agenda is structured moving from the general to the specific and then back
Now, as yesterday, I do want to thank all of the members of the Work Group
for donating their summers to this activity. Paul Tang, Bill Scanlon, Mark
Overhage, Mark Rothstein, Kevin Vigilante for being willing to serve. I do want
to particularly thank Harry Reynolds and Justine Carr once again for donating
their summers to be co-vice chairs of this activity. And then, of course, key
liaisons and staff which includes John Loonsk from the Office of the National
Coordinator, Steve Steindel, Mary Jo Deering, Marjorie Greenberg, Mary Beth
Farquhar who hopefully we will see and John White who I know called in
yesterday and may be calling in again today.
Obviously, we also have staff, Debbie Jackson, Cynthia Baur who is here. I
want to thank obviously Erin Grant who isn’t exactly staff, but I guess is
consultant. Erin Grant as well as Christian Martin Anderson, and of course
Margaret Mosiaka who is now sort of lead staff on this project. So obviously
thank you all for your participation and contribution, and I think it will be a
very interesting set of hearings this summer, in case you were wondering what
else you were going to be doing this summer.
Now the intent, as I commented, is structured in a way we’ll be
talking almost simultaneously about – you know, I apologize. I actually
didn’t thank Mike Fitzmaurice who I’m sitting here looking at for his
involvement and participation, and welcome this morning, representing the
Office of Agency for Healthcare Quality and Research, and I’m sitting here
looking at him and I didn’t really specifically acknowledge him as one of
our liaisons. So thank you. I know I’m going to miss somebody in these
Anyway, the agenda is really meant to be sort of this combination of
looking at broad framework issues as really more specific issues that really
involves quality and, as I was describing, tools and technologies and
approaches to help minimize risk. And we saw some of that conversation
yesterday as we started out with framework conversations and then dug down
deeper into areas. Obviously, this morning we again are talking at sort of a
high level and then begin hopefully talk about taxonomy and some definitions
and then hopefully dwelling down as the day goes on also.
This is meant to sort of create a dynamic tension between sort of high
level views as well as like let’s get real and stay grounded. And we all
have to judge, as the hearings go on, about how successful we’re being on
this. Certainly, at the end of today and into tomorrow morning, we will be
talking about some of the structure of the August hearings so that we can fine
tune whatever we need to do to make them more effective and useful based on the
information we’re hearing today.
Now, the hearing today starts out and obviously we’re again at the
sort of the framework level. Talking is considered a framework for optimizing
uses of data, and we’re very pleased to have Steve Labkoff as well as
Meryl Bloomrosen joining us from the American Medical Informatics Association.
They had what was a very successful not in terms of offsite, but symposium and
session looking at secondary uses of data to framework issues at the framework
level, and we’re very interested in hearing obviously in open session some
of your thoughts about that as well as recognizing, I think, your bringing
forward sort of concepts, approaches, frameworks, tools, trying to think about
how they might be able to help us in all the work that we need to be doing. So
we’re obviously delighted that you could join us this morning.
Following our break, we have John Halamka calling in, and then we have
Floyd Eisenberg joining us in person and thank you, talking about the issue of
standards and how that plays into our conversations. And then this afternoon we
move into issues of definitions, anonymization, pseudonymization,
de-identification and all of this, and the overall issues of security and how
that plays in.
And then finally late in the afternoon, we’ll have a session talking
about providers who capture use and share data and how this all begins to play
out in that environment.
Late in the date, we’ll have some work group conversations. We really
do want to make sure that we’ve gotten everybody’s thoughts, insights
from the day recorded as we begin to move into the third day of our
With that, I will turn this over to Harry. Harry, thank you for being
willing to facilitate the session.
MR. REYNOLDS: Okay. Our first panel’s in place, so that’s good.
We’ll just go in – do you want to go in the order that you’re
listed on the presentation, Steve and then Meryl.
DR. LABKOFF: Sure. Meryl and I are going to hand off side to side during
the presentation. So I want to thank the Committee for inviting us to
participate and to provide our thoughts about secondary uses of health data.
AMIA has been for several years now trying to examine this issue and surface
what are major concerns and issues in the arena. And in the course of today,
we’re going to go over –- I guess I better do the slides, I generally
will be going over some definitions that we came up with, an overview of things
that we did in 2006, and we’ll get into the conference that we just
convened back in June and go over some pre-conference working groups and what
their findings were, and then get through some of the preliminary themes that
have emerged and some of the thinking around those issues.
I do want to stress that the work we’ll be presenting is not
conclusive work. The meeting convened only a month ago, and most of the work in
distilling and coming up with final conclusions is still ongoing. So we’re
just sort of giving you a preliminary look at what our thinking is and what the
meeting came out with. We would be very pleased to come back in a few
months’ time when papers are written and presentations are more honed to
give you sort of final thoughts on the subject.
So with that, I’ll hand it off to Meryl.
DR. BLOOMROSEN: Thanks, Steve, and I want to just echo Steve’s
thanking you all for inviting us to be here. We actually did testify in front
of perhaps it was the full committee last year –- November, I believe it
was, and that was to give you a summary of our work from last year. That
testimony actually had the benefit of six or seven months having passed when we
first convened the meeting.
But to Steve’s point, our 2007 meeting occurred just about a month
ago, and we’re very delighted to be here and would be pleased to answer
questions, but would caution. I hope all the answers aren’t, well,
we’ll get back to you. But there are quite a few people thinking about
this from a multi-disciplinary approach, and we’d be happy to come back,
as Steve said, to augment or clarify any additional information that might be
Let me just digress for a minute and make sure, as we did before, that we
thank the leadership of AMIA of Paul as our Chair and to extend thanks to the
Board of Directors, Don Detmer, our President and CEO as well as David Bates,
our incoming Chair, Charlie Safran who may be on the phone listening in and
also available to participate in this presentation if he can. But very much
this is a concerted team effort, and we’re very much wanting to express
our thanks to those folks.
So to take a little bit of a step back, a year ago or so, we took our first
look at secondary use of data. And in order to do that, we came up with the
definition that you see in front of you. Well, what is primary use of health
data, and I will say and I understand from just side bar conversations that
there are quite a lot of discussions happening about the term primary and
secondary in and of themselves.
We in fact had significant conversations and discussions both last year and
this year about the use of these terms. But for starter purposes, we have made
a distinction between primary use and secondary use. And as you can see here,
for our thinking, primary use of health data are data that’s collected
about and used for the direct care of an individual patient. Most of the time,
in our way of thinking about this, it’s occurring in real time as the
clinicians and the providers are in fact taking care of the patient.
And we recognized that there might be some connotations to this definition,
but we felt like we had to start somewhere. And in fact, when we looked at
where existing policies, regulations and guidance concentrate, it is in our
minds mostly concentrating on the primary use of data. And we found that there
was reason to elevate the level of thinking and thought that was going into
other uses of data. Next slide.
So we began with some definitions that are, as Steve mentioned, in process.
We’re honing them, we’re refining them, and we’re trying to
distill them. And at this point in time, the secondary use of health data is,
in our minds, defined as non-direct care use of personal health information
that includes, but is not limited to, analysis, research, quality and safety
measurement, public health reporting and monitoring and surveillance, payment,
provider certification, accreditation and marketing and other business
activities that include strictly commercial activities. And as you can see on
the slide parenthetically we kind of say a/k/a everything else.
I will point out, and I think Steve will go into a little more detail, that
as we looked at this definition, some of the terms within our definition become
subjects of discussion. And as those of you and several of you around the table
were either at our meeting last year or even at our meeting this year, Simon
and Justine and Mary Jo, you probably remember that the use of the word
commercial became fairly contentious and subject to a lot of discussion. So
that might be something that we’ll suggest and continue to look at
We also want to point out that we’re identifying the second bullet
which is a secondary use of health data occurs when data are used for purposes
other than those for which they were originally collected or obtained. So we
think between the two statements here that it’s a good reflection of
trying to make this distinction that we thought was important between primary
and secondary use.
So last year in April of 2006, AMIA convened an invitational meeting, and
we published the findings of that meeting in January of this year in the
Journal of the American Medical Informatics Association. The 2006 meeting was
– we had about 30 individuals representing various stakeholders’
viewpoints and constituencies. And in our report, we identified a couple of key
things from that initial convening.
We found that there were widespread and incredibly diverse secondary uses
of health care data. We were sort of touching the tip of the iceberg when
people started to talk about what uses were they making or could they make if
they want to make of data. That public trust issues – the consumer, the
patient, the patient’s caregivers, the trust issues dominated.
In fact, a lot of the reason for that is there seemed to be a gap in an
understanding or an awareness of what uses there were of data along the
continuum. There seemed to be multiple people along this chain that are
obtaining and using data and then re-obtaining and using data, and there seemed
to be a consensus that the general public was not aware of all of these uses.
We found last year that technological and technical capabilities – and
to your point, Simon, if the committee is going to be looking at tools and
techniques and technologies, we found that capabilities technologically were
outpacing policies, procedures, guidelines, et cetera in the sense of dealing
what could and should be used – dealt with in terms of data.
So the ability of technology to allow people to de-identify but then
re-identify patients, or to merge different data sets, even if they’re
disparate, to link health data sets with potentially non-health data sets, for
example, employment types of data, became of concern and of importance to our
group last year because the ability to do that without facing the fact that
policies on the books national and state even did not address the fact that the
futuristic uses of these data were not covered.
And then also that we felt there was considerable need for attention and
leadership at the national and state level to explore these issues, and
we’re obviously thrilled to see that the Secretary and NCVHS and AHIC and
others are beginning to look at this. Next slide.
So last year, we proposed a framework for secondary use of health data
that’s identified on this slide, and I’d like to mostly identify a
couple of the key words. You’ll see them again as Steve outlines what we
continued in 2007. The transparency issue – that people need to know, all
of us need to know what is going on if there are people collecting our data,
the transparency of that, the fact that we should know how and what is going on
is very important.
We early on last year acknowledged that data ownership was going to get us
into a circular conversation, and we didn’t want to simply restrict our
conversations to ownership. The group felt that conversations and issues
relating to privacy and security of data were able to address ownership issues.
But in fact we felt that data stewardship and the control of the data and the
use of the data sort of surpassed issues relating to ownership.
There was a general consensus that privacy, policy and security still
needed attention, public awareness, education was important, that uses and
users of data were in fact a complex issue. And what you’ll see is that we
started last year with the recommendation that AMIA could or should or someone
should look at developing a taxonomy of users and uses and, again, national
leadership. Next slide.
So with that as a backdrop, Steve is going to dive into what occurred about
a month ago when we convened our 2007 conference on secondary data. And I would
like to mention that this year we actually wound up having a hundred plus
people at the meeting. We expanded the broadness of the participants and
continued the dialogue. Steve.
DR. LABKOFF: Thank you, Meryl. In addition to that, one of the things that
we undertook between ’06 and ’07 was we teased out three specific
areas that we felt needed more work than could be accomplished in just a couple
days in a meeting and convened three working groups, and we’ll talk about
them in turn. A working group around taxonomy that was led by Chris Chute and
Stan Huff, a working group on data stewardship which was led by Melissa
Goldstein, and an identification/de-identification working group – I
forgot who led that off –-
DR. BLOOMROSEN: Doug Barton.
DR. LABKOFF: Sorry, Doug Barton. And we’ll go through what those
working groups found. Then we decided that at the course of the meetings, the
nuts and bolts of it was going to be to attempt to try and tease apart a
framework for how to consider secondary use of health care data in four
different domains, and those domains were the research domain, public health,
quality and commercial space. As Meryl indicated, the commercial space became
somewhat cantankerously debated during the course of the meeting.
We also came out with some principles around data stewardship. And for
anyone who’s gone on the Internet, you can get all of the pre-meeting work
and preliminary materials at the link that’s included on our presentation
So to start off, we’ll talk about the data stewardship area. And I
guess the first thing was what was the definition of data stewardship that we
used, and we decided that data stewardship would encompass the responsibilities
and the accountabilities associated with managing, collecting, viewing,
storing, sharing, disclosing or otherwise making use of personal health
information, and that principles of data stewardship would apply to any and all
personnel and systems and processes that engage in health information storage
and exchange within and across organizations and would provide guidance for
those for all of the discussions about secondary uses and lay the groundwork
for principles, and this became sort of the starting point for that working
Now one of the tasks for the working group was to come up with some
preliminary principles and the question as to why would we establish them would
be these. They provide a rationale and safeguards for legitimate use of
secondary uses of healthcare information. The principles could be used to
describe enforcement mechanisms that provide reassurance for appropriate usage
and describes the benefit to the field of having trusted data stewards who
adhere to these principles.
And the issue here of trusted stewards is really an important one because
effectively if you have two different organizations both of whom adhere to a
given set of principles and are considered trusted stewards, then the ability
for them to share information among each other without having to go for
independent transactional guidance or transactional approval is facilitated. So
this concept of having a series of principles or a certification, perhaps,
among organizations to allow sharing could facilitate things such as the NHIN
or local health information exchanges.
The principles that came out of the meeting included those around
accountability, including governance, oversight, and to the extent and level of
acceptable regulations, openness and transparency. This issue of transparency
is one that continues to come up, around structure, processing, delivery of
data and business processes and practices.
Notice to patients is also one that was exceptionally important so that
patients are informed that their data, whether used in a blinded or non-blinded
fashion, they were informed of how their information is going to be utilized
outside of the health care domain – outside of them receiving health care.
Privacy and security, including data quality, de-identification, the cost
of re-identification, granularity of patient consent, permitted uses and
disclosures including data aggregation and analysis, and enforcement remedies.
And the enforcement remedies are important basically because if you
don’t have some series of remedies in place and someone does violate these
principles in practice, there’s really no recourse for the patients or for
the organizations at hand.
Another distillate that came out of the working group was what was called a
consumer guide regarding personal health information, and this is a series on
the next two slides a series of questions that one could levy at organizations
that are meant to steward that information. And I won’t go through every
single bullet of these at this point, but will just highlight a few.
Does the organization have privacy policies that are written in clear
understandable language, definition and terms for patients and their
Are patients notified when there are changes to these privacy policies.
or not and whether or not it can be identified. Does it identify how data are
protected. What happens when an organization is sold, or it merges, or another
organization receives the data through some business transaction, or if the
company goes into bankruptcy.
What happens in the case that the individual terminates their agreement for
working with the health organization. Will the data that they’ve generated
become purged from their system. Will it remain on the system in a blinded
form, or in an identified form.
The answers to these questions helped generate a series of levels of
stewardship that individuals can take and use as an assessment against
organizations that they’re making choices as to whether information will
be stored or how to be stored.
The next working group that convened was that around taxonomy, and it was
agreed early on after the first meeting that a taxonomy was absolutely
necessary to have our people talking on the same page about terms and
processes. That work was convened prior to the June meeting and was vetted at
the June meeting. The taxonomy was meant to identify possible non-political
uses of personal health information and to clarify societal and public policy,
legal and technical issues.
The taxonomy supports more focused and productive discussions regarding
health information, health data and their uses.
DR. BLOOMROSEN: Excuse me, I just want to mention that we don’t have
copies with us of the draft taxonomy, but they are among the many documents
that are posted on our website, and they’re available to the public. So if
you wanted to start looking at that, that’s a work in process, as Steve
mentioned, but that is available on the web.
DR. LABKOFF: The axes of the taxonomy at the moment include what are the
categories or classes for secondary usage, how is the data meant to be used,
and what are the existing potential sources of secondary data and who are the
users. In short, it’s basically attempting to identify all the different
ways of talking about the subject in a consistent and common framework.
Now you’ll notice that we’re not going to be talking about
de-identification and identification working group, and the reason that
we’re not is because over the course of the pre-work it became very clear
to us that that issue became very difficult to work through, and issues around
that became distilled and worked through in the rest of the meeting at the June
meeting. So that working group did not come up with policies or recommendations
or a final work product to vet at that point. But we caught that early and were
able to bring it to the whole group a little later.
The next portion of the meeting that took place at the June meeting was
vetting what we were calling our framework, and it was this framework actually
that caught the attention of some folks and I think generated some of this
interest. And we figured out early on that this is such a complex issue in many
different dimensions that we tried to figure out a way of dissecting the
dimensions in such a way that you could address different axes, you know, kind
of concurrently, but in different domains.
And the four domains that we identified were those of public health,
research, quality and commercial uses. And an exercise that took place at the
June meeting was trying to figure out in each of those domains where along a
continuum would basically would your framework sit. So within these six
different areas, accountability, transparency, patient consent, the cost of
re-identification, oversight and regulations, where would it be the right place
for these in these four domains for these basically sliders to sit. And
we’re not mentioning these sort of independent sit at 25 or something, but
sort of a band and given different situations, the band would expand or
contract but in a given range.
And we wanted to give you the definitions of where these end points were in
the framework because we do feel that this is a workable framework. And by the
way, if there are other domains that need to be addressed, the framework can
accommodate that simply by adding another domain but still using the same
series of points.
So for accountability, we said that the accountability area was that around
leveling sanctions or penalties for disclosures of inappropriate use of patient
health care information, and one end point was that there was no
accountability, that there would be no penalties, and the other extreme would
be at 100 would be criminal sanctions. So if something were to occur, there
would be, you know, absolute fines or even potentially prison time for misuse
DR. BLOOMROSEN: At the risk of –- what we did as part of the exercise
was ask people to break into small groups and to tackle one of the four domains
at two points in time — the current state and potentially a future desired
So in an effort to inform, we asked people to sort of take a Polaroid
snapshot of where the participants thought reality was today in each of the
domains across each of the dimensions, but not necessarily thinking that
everyone felt that the way things work today was the way they could or should
be, and then to take a snapshot of where we thought the future state should be
that we should be aiming for.
And one of the reasons why we’re not ready yet to distill all the
findings from each of these small groups is these are still very much a work in
process as much as the domains are themselves and as much as the attributes are
themselves. So we’re still distilling all of the commentary we received
during our meeting to try to flesh out because we were asked at one point to
try to tell you where the sliders fell, and we’re not in a position yet to
make a definitive suggestion that the slider should or could be at any
particular point on any of the continuums.
DR. LABKOFF: But as they say, stay tuned. The issue of transparency, the
measurement of transparency was the extent to which practices governing the use
of patient health data are known and understood by those who disclose or use
the data and to the patients whose data are the subject of use.
And the end points in this instance were zero where the patient was
completely unaware of the secondary uses of the data, or 100 where the patient
is informed of every use of their data and at the time of its occurrence. So if
their data was meant to be used for a quality measure, they’d be notified
for that particular instance each time it would be used.
So you can see we strived to find really disparate end points here for each
of these areas. In terms of patient consent notification, that dimension was
around the opportunity offered to patients who would allow or permit the use of
their health care data, notification refers to the mechanism by which patients
are informed of their right to consent. And the end points in this instance
were no choice at zero, or 100 would be opt in.
Now I mentioned earlier that the de-identification working group had a lot
of difficulties, and it very clearly became the issue of, well, maybe the issue
isn’t around de-identification so much as it is from a practical
perspective the cost of re-identification because I guess there’s issues
or questions as to how much one can actually completely blind information. So
if the point really isn’t that – we won’t even go into whether
or not that’s possible or not. But if you go to the point of saying how
expensive is it, that puts a metric around how hard it would be for end users
to really decode that information.
And in this case, we said the end points of zero were where the decoding
would be actually very easy, or at 100 where the decoding would be actually
very, very expensive and difficult sort of akin, I guess, to something like
cracking RSA encryption or something like that.
In the last two areas – last two dimensions, oversight was to the
extent to which an entity is subject to governance or supervision including the
ability to impose remedies for breaches, and the end points here were zero for
internal residing with the entity that has the data, or 100 percent which would
be external residing with a public governing board so that if it was at 100 a
governing board or an external body would be the ones who would vet the uses on
a per-use basis.
And lastly, regulatory or law, and this framework really was the framework
around regulations and laws that govern secondary uses of health care data,
including penalties and enforcement guidelines. Zero in this case meant no
regulation, 100 meant fully regulated and, again, with penalties.
An area that sort of got into the meeting and was presented at the meeting
was one that wasn’t really a working group but looks like it will become a
new working group, and that is around analytic principles. And you know,
I’ll put my Pfizer hat on for a second because this one generated for some
of the work we did internally at Pfizer in terms of thinking about if people
are going to start using secondary – using health care data for secondary
uses, there really ought to be some basis for guidelines or principles around
how these data will be reported especially in analytics.
You know, if someone is going to make a claim that something works better
than something else, or one treatment is better than another, it would be
helpful if we were able to use a framework for which the analytics were
actually done in a consistent fashion. That sort of led to this concept of data
analytic principles, and we presented them at the meeting as suggested
principles or to start the conversation, and it kind of took.
So the rationale around this is that a statistically sound approach is
necessary for secondary data analysis of large clinical practice data sets,
that random analyses or unstructured data mining actually could yield
associative conclusions or potentially introduce false positive associations.
Standard data analytic principles provide a framework for sound studies with
credible reproducible results and for minimizing errors that are possibly
introduced during analyses. And data analysis principles mitigate the risk of
false positives, and finally they provide grounds for multiple parties such
that the analyses can be more readily compared.
Now the proposed principles are fairly simple, and they include, number
one, an agreement that people who are going to be reporting an analyses of
these data sets would all have agreed to use the principle –- that’s
sort of the first principle.
Then establishing local data access committees or data guardians or
gatekeepers in institutions, and these gatekeepers would be meant to provide
governance and guidance to run data sets and run analytic projects. So this is
similar to how IRBs work in university settings. But in other settings,
they’re not necessarily the same types of rigors that are out there, and
we felt very strongly that having that kind of a watchdog over these large data
sets is actually very important.
Hypothesis is actually needed to actually start the analyses. But
prospectively defined scientific hypothesis or purpose should be done for
hypothesis testing required prior to doing analysis. And the corollary to that
is that if hypothesis testing is actually the goal, that’s fine, but that
no changes to medical practice would be made until hypotheses that were
generated in that process would be validated in a second experiment or second
A sound experimental design would be needed. We should have a clear
statistical experimental design necessary to undertake the study. And lastly
and most importantly is to close the loop so that these data gatekeepers
actually are informed about the results of what’s been done in the course
of the studies. And by having this, it helps mitigate many of the risks
involved with these false positives, Type 1, Type 2 statistical errors. And,
again, these are not prescriptive. These are to start the discussion. There may
be other principles that are needed. In fact, one other principle that was
suggested was around being able to show an audit trail of how you arrive at
various conclusions in the course of a study. And I would posit that that might
make a good sixth principle possibly.
So some of the key takeaways from this 2007 conference include the
following: Secondary uses of data is important and valuable. And although the
value may be in the eyes of the beholder, there’s a need to broadly
educate various audiences with the value of secondary uses of these data.
Issues are very complex and ongoing work is needed, and the environment
continues to be dynamic and fluid.
Consumers have an important role, although there were various opinions
about what that role might be. Taxonomy is an important tool to help inform the
greater community, and it will need expansion and maintenance and a process
built to keep that continued.
There’s still some confusion around various rules, specifically HIPAA
and privacy. FDA’s – chuckles out there. FDA’s human subject
protection regulations and the Common Rule, none of which may be applicable or
adequate to address the secondary uses of health care data.
Data stewardship principles need to be refined. They must address implicit
chain of trust as data changes hands. Data must be of minimum quality. It must
be accurate, reproducible, complete, timely and credible. Data limitations
should be acknowledged and described and data analytic principles adhered to
Some of the plans, work products and outputs that ought to be accepted from
the meeting later this year will include white papers, commentaries,
recommendations on the value of secondary data, health data stewardship,
framework and principles for secondary uses of data, and refinement of the
framework instrument that we described here today. Data stewardship and
definitions and principles are being worked on, and the taxonomy of users and
uses continues to be refined.
Some next steps that we’ll be working on is to synthesize and distill
the conference discussions. We’ll be reconvening the working groups and
introducing a new working group in the case of the analytics principles.
Refining interim work products for the meeting and continue the public
discussion at the AMIA 2007 annual session later this November and participate
in ongoing discussions and forums such as this one and AHIC and Connecting for
Health, eHI and several others.
And lastly, we would like to acknowledge the Steering Committee for the
Conference. As you can see, there is a wide array of stakeholders who came
together to help work on this, including Doug Barton from Lockheed, Meryl and
Bob Browne from Eli Lilly, Bart Harmon from Tricare and the DOD, Mike Lieberman
from GE Health Care, Susanne Markel-Fox from GlaxoSmithKline, Charlie Safran,
Sam Scott of Lockheed and Bill Tierney from Indiana University.
I’ll flip through the other slides to demonstrate who the other
sponsors were, and, again, you can see there’s a wide array of
corporations that care about this subject and participated and just demonstrate
on the slides the lengthy list of folks who participated in these working
groups. Again, taxonomy was led by Chris Chute and Sam Huff. Stewardship was
led by Melissa Goldstein, and De-Identification working group was led by Doug
And with that, I think we will conclude at this point and open it up to the
floor for questions. Thank you very much.
MR. REYNOLDS: Thank you to both of you. I’d like to thank you in a
couple of ways, first for an excellent presentation, but also the work that you
did helped a number of us that aren’t as astute in this phase as others
around the table really get a jump start on how to think about it. So we do
thank you very much for what you did and what you delivered even prior to this
session. So thank you for that.
Questions, Mark and I have Justine, Simon.
MR. ROTHSTEIN: I have a question and a comment. The question is what do you
expect to come out of this work? Is it a way of trying to control the
development of public policy based on the expertise of your members, or is it
best practices or some sort of professional guidance? DR. LABKOFF: I think
there are multiple expectations from this work. One of them in fact is to help
guide policy creation. It became clear to us in the very beginning of this
effort that many of the policies that exist today just didn’t address some
of these issues because it hadn’t been really discussed in open forum up
until about two years ago. And I would say that would be one of the major hopes
for this work. Also to inform corporations and inform the businesses around the
country in terms of what the rules of engagement are for how to work with data
in a transparent and a trusted fashion.
The issue of commercial uses, again, was hotly contested during the course
–- just even the word commercial actually was hotly contested during the
course of the meeting, and being able to help distill out what that means and
provide a series of rules of how to engage in a consistent fashion is, I think,
quite important. Meryl?
DR. BLOOMROSEN: Yes, I think to augment that, we’d like to inform and
provide assistance as much as possible by providing and leveraging the thought
leadership that’s available through the organization. As an example, the
folks who have been working very hard on the taxonomy – and you saw their
names – are people who work and immerse themselves in issues relating to
taxonomy on a very regular basis. And what we would like to do is continue to
bring their expertise to bear as we refine, for example, the taxonomy. It was
conceived of a year ago, and then we’ve been working quite hard to get it
vetted as much as possible with as wide an audience as possible.
As an example, to Steve’s point, at the Spring Congress of AMIA, we
had a town hall meeting at like six o’clock in the evening to vet the
draft taxonomy thinking, oh, no one’s going to show up for this. In fact,
we had standing room only, and we recognized that by continuing to work on
this, we will hopefully be contributing to, you know, the public good as well
as bringing taxonomy expertise to helping classify secondary uses and users. So
I’m hoping that answers the question. That’s an example.
As Steve mentioned, these work products are a work in process, and I’m
not sure that we envision that there’s a start and stop to necessarily any
of them because of the way the environment is changing.
For example, we are looking at and re-looking at – I’ll use the
taxonomy again as an example. It started out with three axes, and, as Steve
mentioned, it’s sort of evolving or morphing based on a lot of the vetting
that took place that it looks like we may need five axes. But how that taxonomy
might ultimately come to bear on policy making and other venues, we would hope
we can help inform what kind of uses could be.
So, for example, if someone were to make a request for a secondary use of
data, perhaps they could use the taxonomy to explicitly state what uses
they’re making of the data so that we’re all, as Steve mentioned
earlier, on the same page about the uses and users. What we then sort of found
is this is a very complicated question. There was no easy answers, although
we’ve honed in on the commercial use conversation, there were no easy
answers in many of the break out sessions and as we fleshed through these
MR. ROTHSTEIN: Okay, and now my comments. Having read your material and
heard your talk, I’m concerned about the primacy that you give to consumer
choice. In my experience, consumer choice is often set up as an alternative to
any regulatory activity. And under the current regime, the fact of the matter
is that consumers have virtually no choice with regard to secondary uses.
So, for example, disclosures for public health purposes did not require any
sort of consent or authorization. Disclosures for health care operations, one
of your secondary categories, did not require any consumer input. And when
consumers are required to execute an authorization, they are often in the
position where the authorization is compelled. So that if you apply for a job
or life insurance or disability or long term care insurance, and the list goes
on and on, as a condition of applying, you sign and it releases all your
So I value the concept of consumer choice. But in reality, there is no
consumer choice or very little consumer choice. And if you put too much
reliance on consumer choice as a principle as opposed to regulating the
substantive aspects of what can be disclosed and what can be used, I think that
undermines the situation.
And finally, I was interested that you did not include – perhaps you
did, but I did not see any discussion of the concept of contextual access
criteria. This was a part of our June 2006 letter to the Secretary where we
recommended that contextual access criteria would be –- should be
explored, researched, and if feasible, implemented to limit the disclosure of
information to secondary –- for secondary use whenever possible so that
when disclosure is made to an employer, only job-related information goes. And
unless we have the capability to do that in an informatic sense, then even the
law requiring that only certain information goes is basically worthless.
So was there any thought to including discussion of contextual access
DR. LABKOFF: So the answer is yes, there’s a great deal of thought.
And in the course of the – I put back the slide on the screen about the
framework. And to your point, we’re not trying to be prescriptive in the
course of this meeting. We’re trying to actually elicit where these
sliders need to be set. With respect to consumer consent, if you look at –
if you check this in your head, you know, put a little circle over public
health, well, in the public health I’m not going to report absolute
findings, but I will tell you that just from memory that, of course, in the
public health domain consumer consent really was way down at zero because
patients don’t really have consent even by law, I believe, for their data
to be used for public health reasons.
But in other domains, there might well be a need for consumer consent, and
it might be very appropriate. Where in these domains those sliders fit really
is what we’re trying to figure out. So, for example, in research, I do
believe you do need to ask patients’ permission to use their data to do
research on it. And so in that case, the slider might be set much higher than
it would be in the public health setting.
So in each of these exercises, so the point of this exercise wasn’t to
get prescriptive and say this is where it must be. The point of the exercise is
to say where is it now and where do we think it ought to be in the given
And that particular one that you mentioned was one that was absolutely
discussed, and we’ll have more to say about that as the meeting materials
get distilled down. With respect to contextual access, again, we couldn’t
present every single thing that was presented at the meeting in this short
period. But that was an issue that was discussed in the meeting in a variety of
the break out sessions and will be addressed again in the distilling of this
work. It wasn’t addressed specifically in this presentation today.
DR. BLOOMROSEN: Mark, what I would suggest the Committee consider is when
we broke up into the four domains, it became quickly apparent that we even
within the domain couldn’t over-generalize. So on our website are some
discussion questions and scenarios that we used. And in terms of potentially
setting the slider or having conversations about these issues, it became clear
that we might need to do it on a scenario-by-scenario basis which I would
believe is somewhat analogous to your recommendations relating to contextual
access. So that we felt like we tried to divide the health world into these
four domains just for purposes of getting moving on distilling it so that
secondary data use was compartmentalized.
And then once we got into those domains, it quickly became clear that we
couldn’t over-generalize even for public health or for research. We had to
probably go down a little more detail. So I would invite you to look at our
scenarios as another way to address some of that.
DR. COHN: Can I just follow on only just because I was there and just to
maybe be a little more gentle on the conversation.
At least in my memory or my sense of the conference –- gentle, gentle,
not general. But I think a lot of the conversation and I think a lot of the
value of the discussion was, as you commented, really trying to figure out
where things ought to be here versus some sort of a future state, at least
along these axes.
And at least my memory was – and obviously this will be additional
work by AMIA and by the Board was really this issue, which I think Mark was
referencing, was more along the lines of tools, technologies and approaches to
minimize risk in the context of all of this. And I don’t at least to my
memory of being in the conversation, that was not really – that second
part was really not the focus of this session, but might likely be a focus of
another expert panel or separate meeting.
And if you think I’m wrong in my characterization of that –
DR. LABKOFF: No, I think you’re right on, Simon. I think that’s
precisely how we tried to frame this up. I mean, we’re not claiming this
as the end all and be all in this discussion. This is the starting point of the
discussion, and it’s meant to provoke conversation and thought, you know,
discussions on policies and potential regulations if necessary. But it’s
meant to start those conversation, not end them.
DR. BLOOMROSEN: And I think you’ll see that the taxonomy probably
– at least our hope is the taxonomy will help us speak to the different
uses in a context within these domains.
MR. REYNOLDS: Okay. Justine.
DR. CARR: Well, I think some of my questions have been already touched on,
but thank you for excellent work and an excellent presentation and ongoing
So a question I have is, again, this slider was actually a very effective
tool in helping clarify where things were. In a way, we heard a bit of this
yesterday as we heard different speakers come through where they talked about a
project where there is a lot of trust, there was less concern for some of the
regulatory and so on, and where there was confusion there was reaching for a
lot of regulatory.
But is it the expected outcome that with this there’ll be hundreds of
scenarios, each of which would have a different solution?
DR. LABKOFF: That’s honestly hard to say. I would hope it
wouldn’t render into the hundreds, but certainly I think there might be a
series of scenarios with which you then would have to extrapolate from there. I
would hope we wouldn’t be vetting six or seven hundred scenarios at the
end of the day. That would be a fairly onerous task. But I do think that there
are probably, you know, I guess you want to call someone who tends to be a
lumper rather than a splitter, I think that someone’s to my left over
here. I think the objective here is to figure how to lump some of this together
in ways that make it easier rather than more complex for general use and
utilization. If we’re able to lump many of these scenarios together and
try as hard as we can to provide guidance. But you know, at the end of the day
it’s how people interact with people that are going to make these things
work or not work. No matter what the regulations or laws that are intended, we
have to trust that if you put out policies, people will adhere to those
policies. I don’t think that we can hope that that will be –
there’ll be some balance that will stop misuse.
DR. CARR: Well, my recollection also of the meeting which was great was
that you could move – you’d be ultimately able to move towards
lumping with three of the domains, and that what was called commercial
represented a vast opportunity of splitting. And will that area be – will
you be doing a deeper drill down on that area?
DR. LABKOFF: That is the expectation, and, again, we’re just beginning
to distill out the materials here. But the expectation is that figuring out
different areas of the commercial sector are some people framed it as data for
dollars, if you will. There does need to be a clarification because it turns
out that corporations are not the only organizations that pay for data. It
turns out that research institutions do it as well, and for the data to be
collected and aggregated in the first place, somebody actually has to have a
business model around doing that application. It doesn’t happen for free.
So whether it’s a corporation buying data for some research project or
for some commercial venture or for some marketing campaign or a research
institution purchasing data just to advance their own research agenda,
there’s a commercial transaction that takes place there. And we need to
understand better how those are going to be framed in the future in a way that
is respectful and transparent and basically, you know, does the information
keep it safe.
DR. BLOOMROSEN: Justine, I think your point – I think a point that
sounds to me like it’s implicit in your comment because you were at our
meeting is that there may be some interrelationships or even interdependencies
with some of the slider categories. As you said, as people were studying the
sliders during the exercise, if transparency was high, then there might be an
implicit patient consent that goes along with it, that as long as I know what
you’re doing with my data, I’m more comfortable or we would think the
patient should be more comfortable. So the level of consent wouldn’t
necessarily have to be that you have to tell the patient every single time in
real time what’s happening with the data. So I would like to emphasize
that as part of an implicit finding. We haven’t really distilled it all
completely yet. But that seems to be where people are headed.
DR. CARR: Yes, that’s how it seemed at the meeting as well.
MR. REYNOLDS: Simon, and then Mark and then Kevin.
DR. COHN: Well, I guess I sort of had two questions or maybe a comment and
a question. The comment goes along with Mark Rothstein which is that obviously
we’ve heard a lot about HIPAA and had conversations about that yesterday.
I’m of course always struck that, and I think Mark said this already,
the patient consent is one dimension of activity. But the other one is for
information practices. And as someone who – I would apologize probably in
open session, described to you what I usually do with information, policies and
consent forms that I get in the mail, but it’s sort of like in a sense at
least in some idealized fashion you want to make sure that people are protected
with their information, policies and practices as much as possible, and that in
a busy environment and a busy person doesn’t have to be consenting or not
consenting at every turn in the day. And I’m not sure that I saw that well
represented in any of this although patient consent is a subset of that overall
issue, but it doesn’t really do that.
But let me get to really the basic point that I have which is – and
I’ve been sort of listening and I have participated in the meeting. One of
the things I came away a little confused by and maybe you can now clarify this
for me is how to figure out how does this overlays or doesn’t overlay or
interdigitate with the current world of HIPAA as you know it.
And I can see touch points, but I can’t decide whether this is meant
to deal with things beyond HIPAA or really is meant to be a different way of
completing the architecting or at least overlaying all uses, and I guess if
we’re indeed talking about all uses of data, then I’m sort of going,
well, what about the pair, what about the O part of PPO and where does the
employer fit into all of this stuff.
And I guess I’m just trying to figure out where – I mean, it sort
of fits but doesn’t quite, and obviously I’m sitting here grasping
for words. But I need some guidance in terms of how we could sort of leverage
this if we’re sort of thinking in a world where HIPAA exists.
DR. BLOOMROSEN: I’ll take a stab at that. I think the best way to
answer the question is that we’ll be doing more work through the data
stewardship working group. But to answer the question regarding HIPAA, and if
you’ll look at who was on and who participated in the data stewardship
working group, there were obviously many attorneys present and folks who have
experience in HIPAA.
I think our conclusion is that these discussions were outside of the scope
of HIPAA, I believe. Having said that, we haven’t definitively reached the
conclusions. But I think we’re saying that there is confusion about where
HIPAA fits in the use of secondary data. Also the FDA’s regulations and
rules and the Common Rule. I think there’s a slide that actually
identified for you that there was sufficient differences of opinion and
confusion that we believe that’s an issue that could be addressed via
policy or further clarification. We weren’t necessarily looking to solve
any HIPAA deficiencies or address HIPAA exclusively. It came out as a byproduct
of the discussion that it appears that these conversations raised questions
about does HIPAA address this, and, if it does, is it sufficient.
There was, I think, and this is Meryl talking, an conclusion that probably
these fall outside the scope of HIPAA and maybe someone needs to be thinking
about how to bring these back into some policy framework to the extent that
they’re not currently addressed.
DR. COHN: I think that’s a very reasonable issue and certainly one
that we’re pondering. And the first part of the question is, knowing that
HIPPA is hundreds of pages long and has initial guidance’s on top of it,
where are things where things are not clear, where are things that are just not
known to people very well, and I think that’s probably in some of these
borderline areas or some of these issues we certainly need to dig down and
figure that out.
DR. BLOOMROSEN: And Susan McAndrew is here and was certainly at our meeting
and participated in our working group. So I don’t want to put words in
anyone’s mouth in particular, but I do think the conversations were
complex enough that at least it’s an area that does require further study,
and that may be something your group may want to look at.
Having said that, I’m not sure we would want anybody to connote that
we’re advocating a redo of HIPAA. That is not what we’re saying.
We’re on the Internet, and we are not suggesting that HIPAA be redone.
What we believe we’ve identified potentially are some nuances or gaps that
HIPAA may not address or maybe did not intend to address because we believe
we’ve been talking about uses of data that are beyond HIPAA.
MR. REYNOLDS: I always like it when we get consensus from our speakers.
DR. OVERHAGE: I guess one area I’d like to draw down a little bit, I
heard you talking about the potential for a fourth working group focused around
the analytic principles, and I understand the notion of rolling that forward,
and I think it gets back to the issue of in some ways the value proposition and
the transparency that go along with the research. If I know flaky research
being done with my data that puts my data at risk, I may not want done creating
research that’s going to say people fly badly once done and I may make
different trade offs.
It struck me that a lot of the principles that you list in this whole area
is fractals. You know, every time you drill down, there’s another whole
set of things that you have to try to sort through, and the research is one
that feels that way to me.
Your principles that you listed, recognizing that they’re early on
their evolution and so on — understood, yes, seem to be focused around what we
call big formal studies as opposed to, for example, one of the most common
secondary uses of this kind of data that we see which is covered by HIPAA is
use of data sets like these four hypotheses, generation and preparation of
proposals for studies and things, exploratory sorts of analyses that obviously
would be held to a different degree of rigor than this is.
So one of the fears I have, I guess, is that this is yet another ballooning
area that’s going to be difficult to bound. And so I’d be curious to
your comments about (a) in the theory of analytics as you think about it, do
you have thoughts about how it ends up being bounded and scoped and whether
these other uses fit into it or have to be addressed separately. And if so,
this challenge, I think, that sort of heavy weight process of a board hearing
about it and reporting back and things probably becomes infeasible pretty
DR. LABKOFF: So I guess the answer is that I think just like most of the
things we’re discussing here today is there’s a spectrum of ways we
can approach this. And you’re right, the studies that we – when we
started thinking about this in my company, it was around groups that do large
data analyses, in the epidemiology group and in the office research groups,
these are folks who care deeply about the fact that their studies are done in a
valid way and are perceived in a valid way. That’s their bread and butter.
They need to make sure that it’s done that way.
And when you get down to smaller – actually, there is one of the
corollary principles was that in the case of smaller things when you’re
trying to do hypothesis generation, there’s a provision for that that
we’re suggesting that hypothesis generation itself can be the purpose
Now with respect to the question on overly complicating the scenario in
terms of making it, you know, everything has to go through an IRB or an
IRB-like being, and that can slow things down and become too bureaucratic.
Again, just like I answered the question earlier, there’s got to be
guidance here around how things are done. You know, perhaps I don’t know
where the line is drawn, but when you’re doing small stuff, I mean, you
abide by principles that are relevant to small things.
But when you start doing large data dives over millions of patients where
Type 1, Type 2 errors can come in where if more than one organization –
more than one group and organization using the same data set for research
purposes, that in and of itself can introduce error into everyone’s
studies. Those types of things ought to be controlled, I believe, and I think
that’s kind of where we’re trying to shoot for.
But please don’t misinterpret the suggestion of principles being as
prescriptive nor being mandatory, but just as being guidance in certain
domains, in certain types of environments where they really need to apply or
ought to apply. Does that kind of get to your question, Mark?
DR. BLOOMROSEN: I’d just like to augment that to say that I think we
envision that the data analytic principles, Mark, might in fact become part of
the data stewardship principles. In other words, that data analytic principles
need to be part of being a data steward. And we’re not really sure where
we’re going, where the outcome would be on that. But I think we welcome
your comments and insights on that as well.
We also recognize that it overlapped a lot with health services research
processes and disciplines and things like that.
MR. REYNOLDS: Kevin.
MR. VIGILANTE: Thanks for a great presentation. Now as Paul was alluding to
yesterday, you know, it’s when we think about our charge, it’s nice
to be able to sort of triage things into areas of high yield or high importance
so we can focus our activities. And I’m intrigued by your exercise you
conducted where you sort of had people using this tool in terms of what is and
then what ought to be.
And I just to – I know you told us you don’t have any conclusive
evidence. But are there early indicators of areas that might be more productive
for us to focus on either because they’re controversial or because
there’s a lack of consensus, or because in comparing what is and what
ought to be there is a paucity of guidance, policies and regulations, a huge
chasm that really needs to be addressed. And if the answer is we can’t
tell you now, when could you give us some early indications of it because I
think that kind of information would be useful as we think about this triage.
DR. LABKOFF: I think that one of the early areas that I think that
absolutely needs to be teased out some more is the area around the commercial
emphasis, and I guess to Justine’s point, that may not be a lumpable
We need to dissect deeper. It was very clear in the time spent with the
subgroup in that space, I mean, there was fractured conversation left and right
and center, you know, trying to figure out in this scenario what does it look
like versus that scenario versus – and then the points about the research
community actually is a commercial entity in and of itself, although it’s
under a different umbrella.
So all those issues need to be teased out. So that is an area where we
anticipate additional work to be done. But in terms of anticipating outcomes
from this, we’re hoping that we’ll have some material out, I would
guess, by November/December time frame, something like six months out from now.
MR. VIGILANTE: That’s kind of late for us. Is there any sort of early
– you know, because we get the insight into some of these thoughts in
terms of which – and even the principles which were the ones that –
DR. LABKOFF: Well, let me back up and ask – I wasn’t aware that
there was a time component to this work. If there is, please let us know, and
maybe we can re-triage our priorities and re-organize in such a way that we can
be – we want this work to be influential. We want it to be able to be
useful. And if it’s going to come out too late, then that’s clearly
not going to be helpful.
But if you tell us that you need stuff done, please don’t tell me
it’s due by the second week of August.
DR. BLOOMROSEN: Well, Kevin, I would again ask perhaps Margaret and others
can follow up with us at the staff-to-staff level. But certainly the taxonomy,
the working draft is available. The sliders, you know, that’s a work in
process. The principles – all of this is, you know, again a work in
process, and we’ll try our best to be able to be as informative as we can
for your processes.
DR. LABKOFF: Yes, and I should clarify the time line just because this is
based obviously on the request from HHS. But I think the intent was to have our
work substantially done in the October time frame, and I think that was, as I
said the initial agreement. So obviously we’re looking for things – I
mean, our work – I just want to say this to our work group. Obviously, our
greatest delight is to identify what pieces that we can endorse as opposed to
having to reinvent. And so I think in all these things like taxonomy, we
obviously need to further review it. It may need guidance from you in terms of
who we should talk to better understand that and understand when it moves from
draft to executive version 1.0 type of work also.
DR. BLOOMROSEN: Yes, the taxonomy’s a good example because it’s
pretty concrete, and yet it’s pretty in depth at the same time. And what
we’re doing, as Steve mentioned, is it’s been vetted twice so far
formally. It’s been vetted at the Spring Congress, and then it was vetted
at this meeting, and we’re still accepting comments and we’re in the
process of reworking it.
So there’ll be another version of it that we were planning to take to
the AMIA meeting in November. But certainly we’ll be posting, whether
it’s 1.1 or 1.0 or whatever, it’s –-
DR. LABKOFF: Let me go back to the – let me turn back to the committee
the question, of all the things we’ve presented here, which are the ones
that you prioritize as being needed sooner rather than later because if you can
tell us your priorities, I think that we can try to rejigger some of our
priorities and try to get that work done in your time frame.
MR. REYNOLDS: Paul, do you want to make a comment?
DR. TANG: So one, I wanted to do upfront because I didn’t get in the
line early enough and I just wanted to redispose that I am the current Chair of
AMIA. So I won’t comment on things that they might recommend.
I do think it would be helpful, Simon, if you specified a time line –
October, I think, we’re to be turning in our final report which so, as an
example, October would be way too late for input, yes, and I think it’s
acceptable for me to say that what Kevin’s question I thought was
excellent in terms of how it would help – that work probably would help
because he drilled right down into let’s get to the meat, and there’s
so many things to discuss. Let’s find out where there’s the most
controversy, where the most help can be, and so I thought his comment was
particularly useful. But obviously it has to meet a time line, and perhaps you
can shed light on when is it in time enough.
DR. COHN: Yes, well, I certainly think the taxonomy should be — one would
hope that it could be a low hanging fruit. In turn, I’ll ask Margaret to
begin to take a look at that, and the question would be, as I think we’ll
talk off line to identify whether somebody needs to come and brief us
specifically on the taxonomy, that’s going to be likely a living document.
So it’s more a question of –
DR. BLOOMROSEN: Yes, and I was going to say –
DR. COHN: But on the other hand, it’s hard to deal with things that
just remain in draft forever. So I think one can appreciate endorsing or
supporting a version one. It’s a little harder if it’s a version .4,
and that’s just sort of an observation of processes.
You know, I think that clearly the intent of having you here at this early
date knowing that it hasn’t been very long since that conference — since
you had your work completed on that expert panel, expert meeting with the
intent to begin the dialogue, and I think that things, for example, Kevin is
asking like, well, what are the big areas and what are the big areas. And I
think certainly you’ve identified commercial as an area that needs more
thinking and more work which, of course, is the area that Paul Tang brought up
yesterday, I think, using slightly different terms but was an area that I think
and other speakers have presented as an issue that is particularly vexing. And
it isn’t that it’s bad or good. It’s just that it’s complex
and probably has different things in it.
DR. BLOOMROSEN: Simon, perhaps another way to address – and Harry, the
question is to potentially have you consider looking at the fact that
there’s HIPAA, there’s the Common Rule, there’s the FDA
protection of patients and research, et cetera, et cetera, et cetera. There are
some disparate components that may address some aspect of secondary use, and
that might be another drill down where the extent to which there are any gaps
in existing code beyond HIPAA that address the issues of data, that’s
something we just touched on, but we have not drilled down yet. But I believe
that might be another area sort of like with commercial uses that there’s
lots of conversation.
MR. REYNOLDS: One other that we’ve been looking at is having used the
sliders, it’s becoming more and more apparent, at least to some of us,
it’s more like a sound mixer. Have you ever watched somebody doing a
recording because what happens is – one thing that I don’t see that I
would like and Simon brought up earlier, there aren’t clear definitions of
the data. De-identified, anomymized, pseudonymized, and so on and so on. Once
you fix your slider, any discussion you have, all I got to do is take those
four kinds of data, and I may move every slider again. And so I think
that’s a cut because as we try to explain this – because part of what
we want to do is come up with something that once it happens the general public
understands it, too. And a lot of these discussions and a lot of this focus
wouldn’t necessarily be transferable to somebody trying to understand it.
Simon also mentioned employers today because, if you listened to some of
our testimony yesterday, individuals are most concerned about what data their
employers may or may not get. So we say commercial, we say public health, we
say quality, but there’s also a big group out there that makes it so.
Taking some of those things into consideration also because as we come up with
these definitions, that moves a lot of these very quickly even after they may
DR. COHN: I’ll make another comment which is that I think that the
work of AMIA in terms of trying to fix where on the lion’s share of
everything fits is, I think, laudable. I’m not at all certain that’s
really part of the charge of this committee.
I think our committee – I mean that the thinking that these are
important dimensions that need to be considered and then the issue of how you
minimize risk, maximize public benefit and all that really is the way I think
we need to think about it as opposed to having long discussions about whether
for a certain use you get 50 percent or 65 percent or 70 percent. I do think
that that may be an area where obviously that may be important for you and
especially how we mix all of this together.
DR. LABKOFF: I think the issue is – I’m not going to claim that
it’s necessarily important for us or anyone. But I think what was
important was to be able to take the temperature of folks in these domains and
these axes to figure out, you know, when you start out having a discussion and
you can’t distill it down into discernible parts, it makes it even more
complicated to have a discussion.
By breaking it out into dimensions and these axes, it makes it at least be
able to start the conversation and start being able to understand where folks
are thinking about it. And, again, not meant to be prescriptive, but to be able
to start these discussions and get a sense so that when this committee decides
where things ought to be around research and patient consent, all of a sudden
you’ve got a way to hone down rights of that area, whether we don’t
care where you set the slider, that’s not the point. The point is that you
can actually decide in a secondary use of data discussion that there is a way
to get the discussion to the research component and then get down into these
various axes. I think –-
MR. REYNOLDS: Mary Jo’s been holding a comment, and then Kevin you
have a comment.
DR. DEERING: It isn’t about – one thing about the slider issue,
but then a larger question. Getting back to Mark’s initial comment about
MR. REYNOLDS: Is this – I thought they were asking us what we need to
tell them because there’s a list of people who have questions.
DR. DEERING: Okay. Then the first one is the priority. You began your
conversations by recognizing that you had heard a lot of discussion about the
full term secondary use, and that you recognized the issues about it, et
cetera, et cetera, and then you said that just forth starters were going to use
And then all the discussion, of course, about secondary uses and primary
uses. So you’ve actually locked in. And by noon or by one o’clock
yesterday morning, you would have felt the whole universe was against using the
term. There seemed to be a very strong movement not that the work group
necessarily is going to ultimately decide to go in that direction.
So one question to you would be, do you feel that it would be useful and
could you ask yourself that question. How strongly do we feel about the utility
of the term “secondary data” and make a conscious decision as to
whether this is –- the use of it is purely opportunistic because
that’s where it’s at and just to get on with it, or whether you feel
strongly that it is in fact useful.
Now maybe you can’t do it. Maybe it’s out of your time frame. But
since there was so much strong feeling yesterday by many other people that,
gee, why don’t we try and get away with it, it would be interesting if
AMIA could move toward at least making a pronouncement on that.
DR. LABKOFF: I don’t think AMIA is trying to pronounce that thou shalt
call this science secondary use of the health care data. I don’t think
that’s the point. I think when we started the discussions around this area
two years ago, it felt like we needed a way to describe it, and, you know, if
you start with the definition that primary use is used for health care –-
the providing of health care, and everything else is secondary to that,
that’s kind of where the term came from. I don’t think anybody’s
kind of wedded to the term. But you know –
DR. DEERING: Some of the other people said that those other things are for
MR. REYNOLDS: Let’s do it this way. Let’s do it this way.
We’d love you to consider whether or not what you think or don’t
think and just at some point come – that’s great.
What I’m going to do is –- we got a tight time frame for the next
panel coming. So I’m going to – Kevin and John, and I know
there’s other people, I have about six other people on the list. I’m
sorry, but –
MR. VIGILANTE: I’ll be very brief. This is just a sort of a follow up
to the previous conversation, I’m sorry. And I’m just being really
tactical here about thinking about the work we have to do and accomplish in a
fairly short time frame. And my only – I’m not really talking about
where the slider should be set or shouldn’t be set. I’m just sort of
trying to tap your experience and what you’re doing now to help us focus
where we get the most bang for the buck. And so, you know, if — and I’m
just going to throw a hypothesis out there, you know. If you’re just
saying that the domains that require the most scrutiny or the highest level of
scrutiny are and concern are –- ranks something like this, you know,
commercial, research, quality of public health, and within each one, when you
talk about those principles, when you think about what is and what ought to be,
there is more of a paucity of regulation, guidance, different principles, and
then helps us direct our thinking about what recommendations to make to the
Secretary. And if we can identify those gaps and those areas early on, it helps
us triage our activity, and that’s all I’m saying.
MR. REYNOLDS: John?
DR. LOONSK: Thanks. I want to commend the work that’s been done and to
also commend the offer to help further with the work of this committee. I do
think that it would be desirable for the committee to take AMIA up on that in
some specific areas that I’m not sure are quite formulated yet. And so
hopefully as the work here matures a little bit and as the thinking here
advances, that that offer will be still held out and that this committee,
obviously time frames are short, but can formulate that specific, ask that that
would be something that AMIA might respond to.
I do, you know, this is a tremendously complicated area, and the concept of
sliders is clearly one tool to address some of that. There are also multiple
axes that need to be considered here, and a number of them are on the screen
here. I think, you know, part of the discussion here is that secondary use is a
slider, and it’s a sliding term and that to try to nail it down exactly is
fraught with issue as well.
But I’m also particularly – I think that these axes and these
sliders are very helpful in helping to frame the conversation which is how
they’re offered, I think. My general question is about the commercial
aspects because that — I’m not sure that that’s the right axis to
put a slider on. And part of the question about that is, you know, because I
can see at the same time just looking at it from a consumer perspective that
brazen commercialism associated with the use of their health data might be very
unattractive. But that in fact where we have commercial interests that align
with advancing the quality of care and the quality of how they are treated,
that’s a win-win that is obviously encouraged.
So I do think that it would be helpful to continue to advance sliders on a
number of different axes, some of which were on the screen, some of which are
not, cognizant of the fact that eventually from a consumer perspective, this
may have to be packaged into three or less considerations. Even what you have
here is obviously complicated and complicated to this audience, much less
complicated to the consumer.
Just on the commercial side, was there – you said there was a very
robust discussion. Was there any light that came from that in consideration for
other ways of articulating an axis or axes in that domain that would perhaps
come –- advance that conversation because it does seem like just
commercialism may not be the operant axis.
DR. LABKOFF: Well, I’ll answer that and then I’ll hand it to
Meryl because I think she sat in that particular working group. I think the
answer is you’re right on is that it covers, you know, we were looking for
a term to just, again, sort of provoke conversation and discussion and, boy,
did we provoke.
I think what we need to figure out is how do you dissect that into
different channels because, you know, a commercial transaction for the sake of
safety is certainly different than a commercial transaction for the sake of
marketing. Both are necessary in the world, but they’re going to be
treated differently most likely, and just labeling them and lumping them
together under commercial doesn’t do either of them justice, nor does it
help the discussion. And we learned that along the way. We had to start
somewhere, and that’s where we began.
But I think that figuring out what those transactions look like or what
those pieces look like in a more granular fashion would help, I think, help
this committee and probably help overall in the space. Do you want to —
DR. BLOOMROSEN: I would just say that would be said as an accurate
reflection of the conversation. The term commercial, of all the terms that we
decided to use and tried to define, seemed to be one of the terms that was very
subjective and might need to be further clarified because it was –- we
were talking a lot about business models even in that commercial quadrant, if
you will. So that under what circumstances or specifically under what business
relationships are data bought and sold might be something to explore and for
what purpose, and not just for money. I mean, I don’t – it became
much more apparent that it was not just controversial; it was very complicated.
DR. LABKOFF: A use case that was presented at the meeting was wrapped
around how a health information exchange would sell their data so they could
sustain themselves in a business setting, and that’s where the discussion
started, and then it fragmented into different places.
But, you know, if we get into the health information exchange debate –
and I don’t want to open that door here yet, but bottom line is a lot of
REOs and a lot of health HIEs are thinking about ways of using their data in a
way to create a sustainable business model, and that’s a very complicated
discussion, and that’s kind of where this all began.
MR. REYNOLDS: Okay. So in terms of where that discussion went, you would
point to use as the principal axis for helping to further delineate
commercialism, because potentially a multitude of these axes could be applied
DR. BLOOMROSEN: Well, again, I think another way to imagine potentially
using the various tools is in conjunction with one another. So you’ll see
a lot of granularity within the current draft taxonomy with what I would guess
would be further granularity. And if you took those and looked at how they
might speak to the different domains, that might be another way to help lump
and split, if you will.
Implicit in these quadrants, by the way, is education as well. So I think
our thinking about this is evolving somewhat. If you look at what started in
2006 and where we are now, we’re sort of –- we’re getting a
little more specific and trying to nail down some definitions. As problematic
as they might be, to Mary Jo’s point, we feel like these terms need
defining at least to have the context and the discussion. So we have the
taxonomy. We struggled with creating scenarios that made sense. Steve mentioned
one. And then we recognized that where people might include something like
education also needs to be teased out.
There was also some potential blurring between quality and research, I
think, in the conversation. So, you know, we want to be as helpful as we can.
But I think some of the synthesis is allowing us to make some other conclusions
MR. REYNOLDS: Okay, I’m going to cut this off now, and I apologize for
those of you who didn’t get to. But as we said yesterday, we’ve got a
large body of work to cover, and it’s cumulative. So keep asking your
questions – your same question that you didn’t get to will probably
be used in one of the next six or seven discussions that we have. And are
either of you going to be around for the rest of the day?
DR. LABKOFF: I’m going to stick around for a good chunk of the day.
MR. REYNOLDS: Well, good, and if necessary, if any of these subjects come
up it will be helpful to call them back for their opinion, we’d be happy
to do that. So thank you very much, and we’re going to take a ten-minute
break – not 15 because we do have a tight panel coming in next. Thank you.
MR. REYNOLDS: Okay, we’re going to start, please. Everybody, take your
seats, please. John, can you hear me on the phone?
DR. HALAMKA: I hear you just fine.
MR. REYNOLDS: Okay, our next panel that we’re going to be hearing from
is going to talk about the impact of standards on uses of health data, and we
have – everybody knows John Halamka who’s on the phone with us and
also Floyd Eisenberg.
So we’ll go in the order that’s on here. So, John, thanks for
joining us, and we appreciate your being willing to call in, and I’ll tell
you if anybody makes any faces while you’re speaking or anything.
DR. HALAMKA: Oh, that’s perfect. Thanks so much, and I really wanted
to be there in person. Today in Boston we’re doing a documentary on the
release of human genome sequences web. There are ten individuals that have
volunteered to release 60 million base pairs of our individual person
identified genetic data plus all of our medical records. And, of course,
there’s a very interesting standards implication because, of course, we
now have to gather all of our medical records from every place they may be and
represent them in some standard way on the web along with our genome sequences.
So I will certainly report back to NCVHS at a later time the interesting
implications and issues that come out of this initial meeting of how one as a
patient gathers all data and releases phenome and genome to the public.
To the matter at hand, we will spend 15 minutes introducing in general what
HITSP has done with primary and secondary data standards, and then you’ll
hear from Floyd in really great detail on the best thinking around how
we’re going to take some of the aggregate data that we gather in
hospitals, clinics, pharmacies and labs and use it for secondary uses such as
quality analysis and population health and research.
Now I presume folks do have the slide stack that I sent along?
MR. REYNOLDS: Yes, we do, John.
DR. HALAMKA: Great. And so I will just start with the first slide, and I
will cover the AHIC priorities and timeline so you can see what’s primary
and secondary uses of data from the use cases we get from AHIC, talk about what
we’ve done thus far and really what implications it has for secondary uses
of data, and to show you some detail on the complexity of what we’ve had
to work through to ensure that primary data sources are also useful for
So if we can go to what would be slide number three in your stack which is
labeled AHIC Priorities and Use Case Roadmap, this is a slide that I apologize
for. Of course, if John Loonsk is in the room, he knows this was what we create
called the eye test slide. But it does in one slide illustrate all of the
standards that we’ve worked on in 2006, 2007, what’s coming in 2008,
and what you’ll see 2009 and beyond.
So specifically in 2006, we were given consumer empowerment, electronic
health records and biosurveillance where consumer empowerment really meant
demographic data, who is the patient, and that includes age, gender, aspects
that might be zip code, name, such things, medication lists and history.
The EHR narrow use case was laboratory result reporting. But the
implication is if we’re going to come up with a set of data standards for
the uniform transmission of a laboratory result that we should also include all
those aspects around that process that might be used for secondary purposes,
and I’ll show you some detail on that.
A lot of effort has been made to really be quite broad in our use of
laboratory standards reporting so that it can be repurposed for multiple use
cases in the future. One great theme of the standardization that HITSP is doing
is the last thing we want to end up with is 100,000 pages of guidance simply
because every use case is a one off. We really try to think about these use
cases and break them down into base standards, what we call transaction
packages, composite standards. These reusable chunks that say if laboratory
data which implies not only results but the vocabulary around naming a lab test
and who was it ordered on and why was it ordered, that that set of transactions
can be repurposed from multiple uses primary and secondary.
And biosurveillance is a perfect example where, if we are going to detect
outbreaks or bioterrorism events, we may want to look at radiology report
tests, laboratory results which are numeric, a variety of demographics so that
we understand the distribution of whatever is detected. And so all of those
kinds of standards that would be necessary for identifying a patient and
identifying a lab and identifying a radiology report are now a secondary use of
public health monitoring of the de-identified or synonymized data set coming in
from multiple sources of data.
In 2007, we’ve been given a variety of use cases which also have
primary and secondary purposes where consumer access to clinical information
means that every individual with a personal health record should be able to get
at their data regardless of where it’s sourced, be able to take that data
and put it in both a network and even a physical piece of media like a thumb
drive or a CD and deliver it to a consumer of data, be it a doctor’s
office or a hospital. And so we’ve had to come up with a variety of
standards allowing extraction and the transmission of data through multiple
means and not controlled by the patient.
The emergency first responder use case is to deal with Katrina-type mass
casualty incidents, how do we ensure that medical records are available for an
individual, and then when treatment is delivered in such a setting, how does
that get part of the permanent record.
Medication management includes all aspects of e-prescribing and inpatient
medication reconciliation. So quite a lot of interesting issues with dealing
with longitudinally keeping the medication lists from all the various types of
environments of care consistent.
And quality is a perfect example of a secondary use of data. So we’ve
been working very closely with the NQS. You’ll hear all of the details
from Floyd on making sure we can look at all of those standards that are
necessary to support AQA, HQA, HEDTH or XJCO, et cetera and not that we’re
going to go through every single one of those hundreds of quality measures, but
we believe you can say if you have atomic lab data, medication data, problem
list data, medication management data, that you will be able to derive the bulk
of these quality measures from good vocabulary controlled problems, meds,
allergies, medical visits, et cetera type data. And so although it’s
certainly true that many of the data elements we’re going to need are used
for the primary treatments of a patient in direct clinical care, we’re
going to repurpose many of the existing standards we have already named for
this secondary purpose of quality analysis.
And in 2008, just early looks at what we might be getting are remote
monitoring which would include such things are comiter interfaces, device
interfacings from the home, remote consultations which may be a web visit,
could be telemedicine. We will do referrals and transfer of care for continuity
among providers which would include problem lists, reasons for transfer and
results of such a consultation.
Some definitely new personalized health care PHR work as we get more and
more mature on our consumer empowerment, 2006, 2007 and 2008 efforts. Public
health reporting of reportable infectious disease and direct communication to
labs, again repurposing as we can all those microbiology and laboratory results
that we had already used in 2006 and 2007 for other use cases and response
management which would include again more of the bioterrorism, mass casualty
event kinds of data.
In 2009, you can see a vast array of possible other use cases that we could
get to, and, you know, working very closely with the Office of National
Coordinator and AHIC will ensure that we do in a Pareto kind of fashion those
standards which have the greatest impact on primary and secondary uses of data
collection. And I might imagine such things as clinical trials, clinical
devices. These will be things that we will be given in 2009.
Overlaying all of this is an immense amount of work on privacy and security
because privacy and security are foundational and cross cutting across every
one of these use cases. The great challenge that we have is that there is not a
of security standards that empower whatever regional variations may occur in
whole variety of specificity of what a patient might want released to whom and
in what situation. So we have to deal with all the aspects of security and
confidentiality from authentication to role-based access control to auditing to
consent management, secure transmission. And by October of this year, we will
have a first cut on all the standards that are necessary for nine different
classifications of this whole security framework that HITSP has developed in
conjunction with other organizations — the AHIC Working Group on Privacy and
Security and the HITSP Foundations Committee looking at all the best practices
for security management across many industries.
The next slide, just to show you the timeline on all these standards, you
can see 2006, 2007, 2008, 2009, 2010, how we go from AHIC having an idea about
meeting a standard for primary or secondary use to HITSP coming up with
interoperability specifications to those being recognized by the Secretary to
CCHIT incorporating them in its functional criteria for certification of
electronic health record and HIS systems. And, obviously, when we have the
recognition by the Secretary, that has implication because federal requisition
or federal purchasing of any type of system then will have to be compliant with
these standards. That’s really the definition of recognition of a standard
by the Secretary.
So you can see the timeline goes really from idea to complete vendor-based
adoption in about three years or so from idea to completion. And you can see
the litany of work that’s in process and the timeline for getting it all
The harmonization process, just so you know where we get these ideas and
where our work products come out, AHIC will prioritize primary and secondary
requirements for data standards and deliver use cases to HITSP which then turns
them into actions, actors and events, very detailed use cases understanding,
well, if a patient wants to do this, if a public health agency wants to do
that, if a doctor needs it for a particular purpose primary or secondary, who
are the actors, actions and events that are important and then what standards
exist or what gaps exist such that we, you know, will produce an
interoperability specification using as many existing standards and ask SDOs to
fill gaps where there aren’t existing standards.
This is a very regimented process that we go through with multiple layers
of review and public comment. We have created comment periods of one-month
duration at several points that enable a broad array of input from all
stakeholders on the appropriateness of the standards selected and their
implementability in the environment by vendors and institutions and certainly
comments from the public on such issues as privacy and security concerns.
So just briefly then on what we’ve done thus far and then I’ll go
through some detail on showing you how we’ve prepared for secondary use of
the data. On consumer empowerment, we have really worked with all of the
standards development organizations, SDOs that include ASTM, DAQH, CDC, Federal
Medication Terminology, HL7, Integrating the Health Enterprise (IAG), NCPDP,
X12 and SNOMED to create a single document-based clinical summary that includes
the demographics, the medications, the allergies and certain aspects of patient
preference like advanced directives in an XML-structured vocabulary controlled
document that incorporates all these standards, really the best thinking from
every one of these organizations.
So that was a lot of effort to achieve that level of collaboration in a
singular interoperability specification which is now completely finished. Every
aspect of it is documented, on the web, been accepted by the Secretary and
every one of the implementation guides are finished by the standards
Now next slide, slide seven on harmonization, just describes some of the
efforts that we had to go through to bring all of this together and really take
a lot of the nitched standards that had been developed in the past and bring
them together into one common framework. I won’t go through that detail,
but in a sense that we took the CCR and the CDA from ASTM and HL7, brought
together that one document, the CCV.
Biosurveillance –- now here’s a very good example of how these
standards that we have brought together for consumer empowerment and for EHR
can be used for secondary purposes. So this is, of course, an opportunity for
taking anonymized or pseudonymized data that is raw clinical data as might be
used in consumer empowerment in labs but using it for public health
surveillance and looking at the demographics of where events may occur,
symptoms may occur, lab and radiology results that may occur that point to a
specific disease process or event.
Next slide. The key take home is that we have repurposed for
biosurveillance the EHR laboratory result reporting we did for clinical care.
The lab to doctors EHR standard is the same for that clinical use as well as
the biosurveillance use. So we have worked with HL7 to create –
there’s really two kinds of standards, a messaging standard that uses
HL7251 to go from a lab to a doctor’s EHR to transmit in real time data
from system to system, but as well the documents that I described that might be
used for consumer empowerment is a mechanism by which an XML vocabulary
controlled human readable and computable document containing lab data can be
Now for the EHR, we have had to work with HL7 to create HL7251
implementation guides that are appropriate for all the AHIC use cases, EHR,
biosurveillance and consumer empowerment, and that ballot is currently active.
It required HL7 to actually amend their existing standards. That ballot closes
on August 4th. We do expect that that will be a successful ballot.
And so Secretary Leavitt knows this one is coming. We finished everything about
EHR except this one ballot, and therefore recognition and federal procurement
will take place likely in the March timeframe because of the delay in getting
this ballot done.
Next slide. Just to show you some detail of that laboratory message that
HL7 has now incorporated in this ballot that enabled this standard to be used
for primary and secondary purposes, we had, for example, to have a vocabulary
controlled institutional identifier to understand what institutions ordered the
test, what labs did the test. This really enables biosurveillance and public
health monitoring to localize test results to a regional lab or place, an
ordering physician, an ordering entity.
There are many aspects of the work flow around the lab beyond the simple
result which would include why was it ordered, was this an employment-related
illness. So if you want to look at such things as injuries that occur on the
job, or infections that may be – and there’s a comulant infection or
gathered because of a needle stick, that kind of data needs to be placed in the
laboratory order. The clinic name if the patient was admitted in the hospital,
what type or acuity of admission was it, how were they discharged into what
kind of care, long term acute care, home or other place, when was the service
delivered, what are the physicians involved in the care process, and then a
whole variety of control terminology such as the LOINC, what lab test was
ordered in vocabulary controlled detail, SNOMED use for the nature of the lab
test, what was the reason for its order, UCUM for the unit of measure of the
laboratory result itself. And we’ve standardized those control
vocabularies across all the AHIC use cases.
And then finally just to show you again, the very granular detail that HL7
has created an implementation guide that modifies individual data elements in
the PV1 and PV2 segment that empower the secondary uses of data for public
health, for research, for biosurveillance, for surveillance and those sorts of
things that would not have otherwise been included if this was just purely a
constrained message reporting a result back to an ordering physician in an EHR.
So next step, we continue to work with AHIC and, as I’ve said, that we
will finish up the ballot for the HL7251 message August 4th. There
is one other message still outstanding from Oasis, a standards organization
that’s working on a measure for hospital resource availability. If in the
case of a mass casualty or a bioterrorism event you would like to know what
hospital beds are available in a given geographic location, this data standard
that reports on the nature of what a hospital’s capacity might be will be
done in the fall time frame, and we’ll also give that then to Secretary
Leavitt probably the first of October.
We’re working very closely with CCHIT to align their certification
processes with the generation of the standards by HITSP so that the vendors
will incorporate these standards, and we will therefore have products that
empower these secondary uses of data, of course, presuming privacy, security
and patient consent are achieved. But if we have the vendor products gathering
and transmitting the data according to HITSP standards, that’s truly going
to empower all the other uses of data public health and other agencies may
We are really focused, as I mentioned, on security and making sure we do
document consent, ensure auditing, ensure patient access to audits, all the
things that will make data transfers possible. Security standards are
absolutely requisite before the public can feel good about the data standards
that transmit their data to various entities. And, of course, finishing up our
additional 2007 use cases and awaiting the December delivery of 2008 new use
cases from AHIC.
So that’s the broad strategy, the broad overview of primary and
secondary uses of data, and a capsule summary of the work to date. And Floyd
will, of course, now show you in great detail our best thinking on how
we’ll measure quality, how we’ll gather data for secondary purposes
and some of the aspects of coordinating SDOs and filling gaps around them.
MR. REYNOLDS: John, thank you. Floyd?
DR. EISENBERG: Okay, what I’ll try to do, and there is a fair amount
of detail in some of these, and I’ll try to go through them fairly quickly
today so we look at some of the issues around policy, the data re-use
You’ll notice in my slides since last year when we talked about
secondary use in our History Committee, we had a lot of feedback, which
apparently you did yesterday. But secondary that –- especially public
health users are not secondary users. They use the data primarily except it
didn’t originate for the purpose for public health.
So I have adopted the term re-use data. Others have suggested value-add,
but let me not get into all the – what you want to call this. Let’s
just say we’ve dealt with that in HITSP as well — look at some of the
sources of data quality measurement, re-use management and some of the next
steps. Obviously, the urgency from the Presidential Order, Executive Order for
promoting quality and efficient health care in federal government administered
or sponsored programs.
In HITSP as well as in IAG, there has been a lot of discussion about –
while we were waiting for the use case itself, about the re-use of data and
where data is reused. And in the Patient Care Coordination Committee at IAG, it
was requested that folks from the groups from the quality area as well as
public health as well as clinical trials get together to try to identify what
are some of the commonalities of their needs for data and what are some of the
differences. And in doing so, there was some initial analysis of there are a
lot of re-use data issues for financial analysis, biosurveillance, reporting of
infectious disease, as you’re seeing the next use cases come out, data
share for quality as well as research.
And today, coming from, as this committee is well aware, coming from
existing systems, there are many point-to-point interfaces to make that happen
and very complex.
So in order to more effectively deal with that, we actually came up with
three areas of data re-use, and they are either rule-driven reporting, meaning
a retrospective analysis of the data that exists to determine how well
something occurred or to determine if there is – what is the appropriate
cohort of patients to deal with in a trial or to identify those who have
certain adverse outcomes in – so rule-driven reporting.
Related to that is simple case reporting, identifying any single case with
an adverse event or who might fit into a category. And the third was a bit more
complex, and this we have not bitten that off yet as part of the use case from
ONC, and that is clinical decision support, the decision guidance concurrently
as a problem is identified for a patient to determine what that next step
should be as part of the direct clinical care provision that also applies to
the secondary use and that will be collected for measurement or for placement
into the cohort.
DR. OVERHAGO: Could you distinguish the first two before you go on. In
fact, I didn’t understand –-
DR. EISENBERG. Okay. Actually, these terms have driven and they may well be
very similar. The simple case reporting is for one case in the middle, and the
rule-driven reporting is the aggregate. But in many respects, it’s the
same rules that are driving the decisions. So it’s just the left side is
the aggregate, the middle is the individual, and that’s more retrospective
and concurrent with the decision guidance.
But all of these – these three different areas of need is what we
identified so that if there are data to be able to have the data input so that
the rule-driven reporting, the aggregate analysis could subscribe to the data
that are needed to the data being published so the reporting group knows what
they can subscribe to, and here in the individual case, what’s the case
step mission and the case report, but very similar to the aggregate.
The decision guidance is more guideline related, and we have not really
approached that and what standards we would use. That is our next step after
this year’s effort – one of our next steps, and you’ll see there
are from the HITSP group we actually have three phases, and that is between
second and third phase.
But in the IAG group, also information coming from the collaborative for
performance measure integration into EHRs which is a collaborative of AMA, CMS
and NCQA as well as the IAG group, there were certain types of work flow that
were identified needed to determine a cohort and to manage that cohort around
And so starting with quality, to identify what are the criteria,
what’s the group, the site that we’re looking at, how do we identify
the cohorts so the patient or all patients meeting those criteria for inclusion
into that group, what are the exclusion criteria’s that take the patient
either out of the denominator if it’s a quality measure or out of the
numerator because for some reason, if they’re expected to receive a
medication but they’re unable to for medical reasons, what are those
reasons that remove them now from the study. And what are the reporting data,
how is it reviewed and fed back, how is it analyzed, is it mapped –-
actually, before analysis, aggregated and communicated.
So we looked at those categories, and what was agreed by the groups that
were present at the IAG Committee was that the quality strategies, research
strategies and public health strategies pretty much aligned around with similar
workflow to identify cohorts of patients out of a set of data from a
And there was a white paper requested that we will be putting together, and
there was some delay in getting that done as various groups in public health
asked to have additional input. But so far all of them mapped to the same
Also looking at to measure this population what are the data elements.
There has been and this – I called this AHIC. I apologize. I wasn’t
able to change the slide. It’s AHIC had asked NQF to National Quality
Forum to create a health care information technology expert panel to determine
data elements to manage for the quality use case, the AQA, Ambulatory Quality
Alliance and Hospital Quality Alliance measures the data elements required to
get to the high priority measures. And these data elements are listed here.
There’s some liberty I took from what actually came from the HITEP Panel
because I added some additional terms to that, but they’re the same data
elements. So the definition of the measure, demographics about the patient,
results which might include laboratory results, imaging results, and, in this
case, they may be quantitative or qualitative.
An example is a left ventricular systolic dysfunction measured by ejection
fraction. The quantitative is an ejection fraction numerical greater than or
less than 40. The qualitative is a set of terms that indicates moderate to
severe dysfunction that may be allowed by different measures, and I’ll
show a slide also not meant for reading but what is described in some of the
measures to see how we’d be able to actually collect those data.
Substance administration, which includes medication, oxygen, other
substance administration. That term being used because it was the term used by
HL7 for medications and other substance. Procedures performed, location, and
location is specifically important here with respect to not just where is the
patient, but where are they going next. So if it’s a hospital discharge,
if the patient is going to hospice care, that becomes an exclusion for the
measure. So location was important to be able to calculate who is and who is
not in the measure.
Events – clinical observations from findings, problems which include
allergies but not limited to it, diagnoses and history. And as we looked at
these in the HITSP Committee, we had additional data discussions over
procedures and diagnostic tests. Depending on the study on the measure, we
could be looking at ordered, tests that are performed, those that have results
and will require a result and also require procedure, date and time especially
with respect to surgical measures that have a pre-operative antibiotic within a
time frame related to an incision. So the trigger events, given that example,
for date and time.
Lab information, the result value as well as the order because different
measures – some look for the order, some look for the result. Symptom
information, physical findings and observations, vital signs, physical exams,
med allergies, true or anticipated side effects, diagnoses, and what types of
diagnoses, principal, admitting, chronic conditions, acute, and also some of
the information especially related to exclusions may be coming from family
history, patient past history which could include medical or surgical, social
history, allergies, medications, existence, orders and trigger events.
Basically, the record – in other words, to look at measures and to
look at secondary or re-use of data to be able to find a cohort in a
population, there’s very little that’s missing. So that’s a
challenge because is there – now our challenge is, is there a standard to
get to all of these elements, each of these elements so that we could define
them for dealing with this.
So we started with –- we looked at the list of 60 measures, 30 of
which had a lot of detail in it, and I’m going to show one example here
where we can find some good standards and some where it’s a little bit
problematic. In this case, it’s the Ace Inhibitor or Angiotensin Receptor
Blocker, and this could be – I’m going to refer to the HQM measure,
so discharge of a patient with left ventricular systolic dysfunction. Do they
have a prescription for one of these medications.
And so it’s are they are on the medication, do they have dysfunction
— systolic dysfunction. Do they have allergies, or do they have other medical
reasons. And in terms of do they have NMI which is actually the first criteria,
that was well defined. That’s ICD-9. It certainly could be SNOMED. No
problem with that.
Did they have a test for left ventricular dysfunction, and we certainly can
look at that as far as the standard. But if fraction of the record from joint
commission and our problem was how do we take this to is there a SNOMED code
that indicates this, that, that, that and all these components, and there very
likely is. Part of what we will do is in HITSP is we’re taking five
measures down to that very detailed analysis to say are there really SNOMED
codes that exist here because we’re looking at SNOMED as the method to
pull those data. But if in fact – and if there aren’t, then recommend
there needs to be.
But more importantly, we’ll be referring to the HITEP Expert Panel and
back to standard for measurement developers that these kinds of descriptions
need to be codified more in the terminology if we are to be able to get this
information out of a problem list and elsewhere in the record, whether
we’re using parsing technologies or we’re taking discreet codified
data. So that becomes problematic.
Another one was heart failure patients with documentation that their
caregivers gave them – they or their caregivers were given written
discharge instructions on these six items. The six items are listed in bold.
There were no SNOMED codes. There were no other codes. That was the end of the
measure which we then went through SNOMED –- or I went through SNOMED to
see is there some terminology, is there some term or code that would apply to
this. And if it were codified out of the record, could I say, okay, we met the
Well, SNOMED played really well. There was a term. Very likely, especially
– and I can guarantee the first one came from some nursing terminology
that is now in SNOMED. I suspect that most of these others did as well, but not
the same nursing terminology which led to a discussion in our group that we
have public domain nursing terminologies and the two public domain ones that do
not cost – that have cost to providers, we should put them both into our
standard spec. So they were clinical care classification and Omaha.
And what we found was both of them, as others, are also in SNOMED. And
after just a small bit of discussion, we came up with a consensus statement.
Again, you don’t have to read it all here. But for the purpose of HITSP
and for purposes of interoperability with respect to the ONC quality use case,
mapping is required through SNOMED CT. So that will be the standard at least
we’re recommending. We’ll see what the comment period comes back for
the terms for quality and secondary or data re-use.
For local interface — user interface terminology, any of the preferred
nursing terminologies are fine as long as for the data re-use they are mapped
to SNOMED, and that was the consensus we came to with agreement from all the
nursing groups on the terminology groups in the discussion.
And briefly – I’m not going to go through the quality use case,
but basically as we look at the hospital base care and we look at the clinical
care, one thing that’s important is in each of the scenario flows –
and the reason I included this – is the patient level data is identifiable
in most of these situations until it’s aggregated. And I believe
there’s a good reason for that, and that is to identify the entire
population, whether it’s for quality, whether it’s for any cohort
determination, but whether it’s for quality, for clinical trials or public
health, unless you know your entire population and the full data set, you could
be eliminating patients from the set if you require consent to get into that,
to get them into your population analysis, and that could be skewing the view
of the population.
So I think it is important that we have the full patient level identifiable
set to do the initial analysis with appropriate data stewardship, but to do
that analysis and we see the same on the clinician ambulatory side that was
recommended as identifiable patient-level data.
And for provisioning, it’s after the aggregation and for getting back
to –- whether it’s clinical trials, quality or otherwise, getting
back to the individual patient that anonymization or re-identification would
need to be dealt with or pseudo-anonymization. But in the initial analysis, my
suggestion is that this –- it include the entire population of data.
I’ll just skip through this.
One of the other reasons was another piece of re-use case, and that was
augmenting clinical information. When information either gets to a health
information exchange or some regional analyzer or third party analyzer,
there’s often some missing data that may be present in written records
that are not scanned, that are not available to the analyzer.
And the use case, for many good reasons, gives the opportunity for the care
provider to augment it and not make it up. I don’t like the term augment
because it makes it sound like it’s going to be fabricated. That’s
not the intent. The intent is the data do exist, but they’re not in
electronic format, and it does imply some auditing will be required to make
sure those data that do exist really did exist when they’re reported. But
if it’s to be augmented, this has to go back to who was that patient so
the local site can identify the right patient and augment the data.
Basically, for re-use management, the information required from sources,
and just a summary, there are quantitative and qualitative sources. Some of it
will come from freeform text. Some of it will be codified. Some will be
unavailable, and that’s where augmentation will come in.
Text parsing is one way to get to it. Terminology mapping is another. But
hybrid methodology, I think, will remain a long term requirement for quality or
public health reporting as well. But basically of the three strategies, which I
did not get into commercial use –- secondary use of data, but for
research, public health and quality, it’s pretty much the same
As far as anonymization, just briefly benefits or privacy protection, the
limitations is to truly anonymize, you have a limited data set. For the
biosurveillance use case last year, the intent was just to find a trend that
there is a new syndrome out there, that there is a situation with influenza,
with anthrax, and it didn’t matter who the patient was, and it even
statistically – it could be adjusted for the fact that some patients might
be reported more than once because they were seen in more than one facility
because we were looking at aggregate population trends. Once we get to quality
and public health at the local level, it has to get back to that patient, and
looking for real quality measures and identification of cohorts for health care
operational management, more detail is needed.
Pseudo-anonymization does add some privacy protection but also increases
some burden and cost for creation of pseudonym, re-identification and
re-identification through where this pseudo-anonymization occurred — the local
level, regional and larger regional area, or all three. And because we are
asked to be architecture neutral, it could actually be occurring in all three
depending on the local implementation. So —
MR. REYNOLDS: Could you give us a little more definition on what
DR. EISENBERG: Pseudo-anonymization – and actually you’ll be
hearing a lot more of that from Lori Fourquet just after lunch, that means
providing a pseudonym for the patient so that the patient cannot be identified
by the receiver of that information on the other end.
Now there are a lot of ways to take minimal data and know who the patient
was, whatever that pseudonym is, but that’s the basic premise is to
provide a pseudonym so the direct patient identifiers are not accompanying the
data. Then it becomes issues of do you provide just the age, just the year of
birth, just the month of birth, and depending on how high level you make that,
it becomes more difficult to determine how well you’re doing on a quality
measure in the first four months of the year if you’re really not sure who
in your population is what age at what time. And depending on how high a level
you take that, you’re limiting your ability to use the data.
But pseudo-anonymization is basically providing an alternate ID, and many
of us have done that in local settings in simple ways, but not using standards
for years for VIPs in our own hospitals. That’s been going on, but
we’re looking at it from a standard way. Okay.
As far as some of the detail of what we’re looking at now for
collecting some of the data, it’s to be looking for in using some of the
profiles from IAG, the retrieve form for data capture to be able to identify
the measure definition, collect the data through a form filler where that form
filler is unable to get the data electronically, how a human may be able to add
additional data to complete the report.
And this actually was one of the optional items from the biosurveillance
use case interoperability specification for public health reporting which we
know we will be able to re-use for the use case that we expect in December on
public health reporting. But it could also apply to quality measures as well.
We also see for the query for existing data – QED which is a profile
out for public comment now from IAG to be able to get data from different
repositories as needed for quality measures in order to get information for
display. The RID profile from IAG which would be displayed for a human to read
and abstract but at least to get data on an analysis level, to use XES
registries to be able to get data out of existing documents out of the CBA
structure, and so to re-use many of the existing profiles that already exist
that are already out there to be able to get the data for analysis, ask for
additional information where necessary and get it back to the agency.
What we’re not tackling right now is how that measure is defined by a
measurement developer in a standard format because of work ongoing, and
I’ll get to that in a minute, or how the report back to the measure
requester will look because there are different ways that happens today but not
a definite standard. So that’s still in development.
And actually so our new term is to leverage IAG, also message-based
collection of data, and those will be part of the constructs we will be doing
over the next two months and also to identify a standard export and import
model. What we’ll be doing is looking at efforts from the collaborative
for performance measure integration with EHRs which needs a shorter name which
is looking at an XML schema for expression as well as identifying standard
There’s currently HL7 structured report activity ongoing which should
soon be going to ballot as well as work being done for expression of reportable
diseases and hospital-associated infections from CDC with HL7 as well.
So hopefully it’s given that these are ongoing now. These will be
items that we’ll be able to tackle in 2008 for the rest of the use case.
So what we’re looking at is to re-use and look at the anonymize if it
applies and where it applies, to management document sharing. These are all
packages –- transaction packages from last year. There are some new ones
we’ll look at – patient demographic query, query for existing data as
a new package that we’ll be looking at this year.
Patient explorative quality data is a conglomeration of several profiles,
and we’ll be looking at that. That’s also new this year, and retrieve
for data capture, how that would apply for quality. And so there are a couple
of new ones. I won’t go into all the details of these.
I thought I had another slide – I guess not. Okay. That’s the
overview of our current activities, and I’d be happy to take questions or
MR. REYNOLDS: Good. Thanks to both of you. John, you still on there with
DR. HALAMKA: I am indeed.
MR. REYNOLDS: Okay, good. Well, I’m going to open up to questions.
Justine has one, and then –-
DR. CARR: Yes, thanks very much, John and Floyd. I have a question, and
maybe it’s very fundamental. So apologies if it’s very concrete. But
what you’re mapping here is how we get quality information for quality
reporting, how we get clinical information for quality reporting. And so
you’ve demonstrated that we have a crosswalk of how data elements could be
standardized and transmitted.
And so my question is on the other side, as we’re building electronic
health records, is it – how much will data entry by the clinician change
in order to facilitate this crosswalk? In other words, can things go on just
the way they do today and somewhere behind the scenes all this crosswalk will
happen, or will clinical notes become more checklists and bounded in writing a
DR. HALAMKA: Well, this is an excellent question, and let me answer it in a
couple of ways. Today at Beth Israel Deaconess, we have problem list management
in our ambulatory care areas that is a combination of pretext entry and
choosing terminology from a controlled vocabulary.
Well, that means that about 70 percent of our problem lists end up being
free text which is non-computable from a quality standpoint. So, you know, what
are your options.
On the one hand, if one could use diagnostic codes which are billing and
administrative codes as a proxy for a problem list. But wait a minute,
that’s not exactly the same because a diagnostic code is an historical
item whereas a problem list is a snapshot in time of all conditions you may
have. So they’re different.
So what the implication may be is that we are going to now cause doctors to
use controlled terminologies that may be nursing terminologies like what’s
called the CCC, a patient classification of disease state, or SNOMED and not
give them the option of just typing in free text. I think the patient has a
headache today. That will be a doctor change.
Allergies, also in the past, have been a mixture of codified and
non-codified data. Those will have to be cleaned up, and as Justine has said,
one might imagine that certain aspects of text-based data such as today I have
left ventricular rejection fraction in a PDF that is a non-computable
text-based data element will now require a structure data element in addition
to what may be a pretext note describing an overall impression.
So I think we’ll see a staged implementation over time of cleaning up
problem lists, allergies and making notes at combinations of structured data
entry plus free text.
DR. CARR: Thank you.
DR. EISENBERG: And just if I can add to that, I think it will be a
combination, and I think you will see different implementations of those
combinations at different sites. Some will have different ways of getting to
it. There was a Gardener report recently, I believe it’s December, maybe
January review of an implementation of a clinical system and that exact
problem, of the problem lists were free text and the doctors didn’t want
to use the diagnoses, and they used a proprietary solution to be able to get
from what they wanted to say to what was important for a measurement, and that
was IMO which is another company which I have no relationship to.
But just – there are other mechanisms to do that, to get from the free
text, or if they want to answer, to the measurement criteria that we need.
Another issue is how does one answer an exclusion. Today, it may be in some
pretext notes to say that this patient can’t tolerate this drug, but
it’s not on an allergy list because there are no allergies. But there may
be implementation mechanisms of making it part of the order set on admission
for that problem that, rather than a computable order, it becomes a documented
piece out of the order set to say I’m not ordering this for this reason.
So there are different ways to implement components of this. But it will
change how some of the electronic records will turn out.
DR. CARR: Just one other part of that. As you know, we’ll be in a
hybrid stage for a while. So you think there will be a larger role for this
scanning technology, sort of word identification and written notes that then
can trigger this structured data element?
DR. EISENBERG: I’m interested in John’s comment on that. But I
do, and I think there are challenges around that to make sure that that is
– that it needs to be sure it’s valid and it’s accurate in terms
of collecting this data, especially as we don’t just use those for
measurement, but we use the output of that scanning and parsing as part of the
decision support we want to do. But I do think there will be more of that as
time – as we –
DR. HALAMKA: And I would just comment this is very hard to do. And that is,
to give you an example, is that if you build a natural language processing
system and an optical character recognition system that will parse free text
data and try to codify it in the structure, you have to really be careful that
you don’t run into the following.
Imagine that I dictated notes that states, “The patient uses alcohol
swabs to disinfect their skin before injecting insulin.” Well, the natural
language parser sees, “The patient uses alcohol,” and has now
identified them as a candidate for an intervention for, say, Alcoholics
DR. EISENBERG: Especially since he injects alcohol. No, that is a challenge
and it’s not only that, but what is negation and how does parsing handle
that. There are a lot of challenges around that which actually suggest that
there needs to be validation on what that parsing comes out – what the
results of that is, that this is from the clinician. This is what they meant.
And the last thing we want to do, though, is create extra workload on the
clinician who’s trying to enter the data to make sure we’ve
identified it correctly.
So it is a challenge. It’s going to be an art to see how this works.
MR. REYNOLDS: Simon?
DR. COHN: Well, Floyd, thank you for reminding me how difficult all of this
is, and I’m actually also reminded, having – as you all know, I work
for Kaiser Permanente, and as a managed care organization, we’ve been
living with HITAS and CQA measures for many years. And I am reminded of hybrid
manual chart abstracts and will probably have a major role going in the future.
I guess I’m also reminded why for some of these measures I guess the
interim or near term solution is going to be adding, I think, it’s CPT
Category Two codes and G codes and all of this to the actual bill to get this
stuff going which is what I understand is the sort of the near term strategy
versus the longer term strategy which is, I think, being perceived here.
Let me ask you maybe a more fundamental – I’ve got one or two
sort of more fundamental questions here. And, you know, let me think in which
order I want to ask them because I really have sort of two.
One is, and John, you may want to comment on this also, that yesterday we
heard testimony describing the quality paradigm as terribly broken in the sense
that we spend a tremendous amount of our time doing quality reporting, quality
measurement that doesn’t somehow connect in well with quality improvement.
And obviously I see a very sophisticated and detailed review getting the
measurement, and I think that’s what we’ve seen. I do know what
I’ve seen at a high level AHIC quality use case somehow bringing this back
into quality improvement, though, some of this is sort of escaping me in terms
of our view of how we get this into a quality improvement paradigm. Maybe can
you comment or enlighten me on how this actually really fits together.
DR. EISENBERG: Well, I’m actually sure there are others here who could
comment very well. But one concern I’ve had and in fact am presenting to
the HITSP panel on Monday a couple of questions came up as we’re doing all
of this effort to show, as in the example, that someone checked all those boxes
or a box that indicated all that education was given to the patient. We have no
idea if the patient understood it or if it was followed up, and the patient
actually did what was necessary. So what are we – what is the purpose of
the effort to collect data to prove that a health care provider did X if X
doesn’t really improve the patient’s care and doesn’t change the
There was a review in December, I believe it was in JAMA with an editorial
by Susan Horn that there was very minimal improvement in mortality, and I
believe length of stay related to good compliance with quality measures. There
was a little bit, but not a lot. So are we looking at the right measures.
And I think there are two answers. One is sometimes the measure has a
different –- we’re looking at one outcome that the measure
wasn’t intended to reach, so we have to make sure we look at the right
outcome to associate with the measure.
But the other is, I think, the measure – we need growth in the measure
area to be sure we’re really looking at improvement in health care and the
quality of care delivery, not just the process piece itself.
And A1C is a nice surrogate for good quality care, and it’s a nice
process step that I can look at. But some of these elements are not quite as
clearly well related. And so the more we can get to good surrogates for an
outcome, the better. But I think we have some limit yet.
DR. HALAMKA: And I completely concur. I think of quality measures, there
are two kinds – process measures and outcome measures. A process measure
suggests a patient with congestive heart failure was given an Ace inhibitor
upon discharge. An outcome is they didn’t return to the hospital five
times in 2007 like they did in 2006, and I think the challenge with this whole
quality measurement schema is the measures themselves are going to change over
time because they may go from process measures, as Floyd has said, to outcome
measures in the future.
So HITSP has to come up with a standard that will allow us to measure
quality regardless of how the measure changes. As long as you have data around,
well, problems, meds, visits, you know, these kinds of things, you’re
good. And this is why I’m slightly reluctant to say, oh, here is a HITSP
report and here is 100 best quality measures in the world, and we’re going
to have standards for those.
I’d rather rate the quality measures into their atomic data elements
and say we’re supporting problems, meds, allergies, visits, history, and
you do with that data whatever you think is best to measure quality.
DR. EISENBERG: And that’s basically the approach we’re taking in
the Technical Committee for that reason. But I think being able to select that
information on a broader scale than just sampling will enable us to identify
what are the quality measures that really do lead to outcomes because right now
we’re dealing with sample populations and limited data to actually find
the evidence for outcome. So I think it will help, but it’s going to take
DR. COHN: This is actually not really a follow on, and I’ll pause on
this issue. If you want to do that one, and then I’ll ask the other
DR. TANG: I mean, I think you had a very good point, Simon. One suggestion
for the group, just like Mark Overhage asked you, what does rule based
reporting mean, and just like we dealt with the term secondary use. As a
suggestion, it might be useful just to make it simple and call it population
reporting, case reporting, and contract decision reporting. It just helps in
So related to that, it’s ironic and I don’t think you can change
this overnight, but the question that Simon is asking, you said that we’re
not working on the contract decision part now. Ironically, it’s almost
like we should do that first because that was the goal and then figure out how
to report the results of that, rather than report on things actually that
don’t target the outcome and then figure out how to make it happen.
In some sense, there should be almost a paradox, but that’s not
something we can figure out tonight.
DR. EISENBERG: Well, I fully agree. I think the issue was given the time
frame, what’s the task that can be done in the near term. But I think
identifying the standards for the terms that could be used routinely will help
the front end game from the back end rather than starting out with the clinical
decision support. So knowing what it is you’re looking for helps the
decision support even though what you’re looking for isn’t quite
robust enough to show the right outcomes, but at least it’s closer.
So I don’t think there’s a problem with that, and they’re
concurrent efforts right now about how to do the concurrent measures. In the
quality domain that was sponsored by American College of Cardiology and
American Heart Association along with NHINS, and their measures are all
concurrent. So it was an interesting discussion to say I don’t just want
the problem that’s well defined; I want the problem that I think is going
on now as you say or are admitted to the hospital, and that’s what I want
the decision support to be based on.
And it could be that that’s not what you end up with as your final
diagnosis or the final reason for being in the hospital. So your population
cohort may be different on a concurrent versus a retrospective measure. And how
to resolve that has been in discussion, but it’s really been a time frame
issue as to how do we get it done.
MR. REYNOLDS: Okay, you’ve got Simon and then Kevin and then Paul, and
then we’ll break for lunch.
DR. COHN: Okay, I’m standing between you and lunch. This is bad. Just
another question and I apologize. You may not be the right one to ask. I
probably should be asking this to Kristen or Erin around the quality use case.
As I’m looking at obviously all the diagram and all of the data being
shipped and anonymized and really sent from provider environments or health
plan environments to sort of central entities for processing which is a lot of
what this whole diagram is all about in the quality use case, I was actually
reminded of a presentation harping back to the EENIA(?) sessions that we sat in
on a couple of weeks ago where they, at least in Europe and I believe it was in
England, and they were talking about models where effectively queries were sent
to the actual holders of the data with the idea being that they got responses
back on those queries.
Now, of course, England is a different place than the United States. They
likely have more comprehensive data stores. But recognizing that we do live in
a decentralized environment, does all of the work that you’re doing
contemplate a situation where, rather than all of the data going to a central
environment or to a data steward that instead it may – the query may go
out to 40 or 50 or 100 different environments or actual holders of data for
them to run the queries locally, come back with the results that go centrally,
or is that not contemplated in any of the models that we’re seeing.
DR. HALAMKA: Let me start with that one, and that is what HITSP tried to do
is not impose architecture. Anything that we do should work in all
architectures whether centralized or decentralized or federated. So in the
State of Massachusetts, everything we do is very decentralized and federated.
The approach we’re taking to quality measurement is exactly what you
describe. Let’s imagine the chief executive officer of the physicians
organization on a Tuesday night at midnight says I wonder what our diabetes
care is like. Well, it turns out we have 500 physicians, and some are in the
central EMR, but many have private EMRs. We’re working with e-Clinical
Works to enable the exact kind of distributed query you describe that, oh, at
midnight here comes the query, how many diabetics have a hemoglobin NC/A1C less
than seven, and using the kind of methodologies that Floyd has described from a
standards perspective that within an EMR the vendor would do a computation and
report back, oh, there are six out of 100 patients that fall outside this
range. The person in the central quality office would not see patient
identified data, but the results by practice of many, many EMRs reporting back
in answer to the query. And so I think the answer is we’ll support both
DR. EISENBERG: Yes, and it really was intended to be architecture neutral.
If you look on this diagram, that’s what the query for existing data would
be a very good standard to manage that. So that’s exactly what we’re
MR. REYNOLDS: Thank you. Okay, Kevin.
MR. VIGILANTE: Thanks. That was a great presentation, I thought, that
really illustrates the challenges in collecting data and identifying data for
quality measurement and then with the added challenge, of course, that measures
are going to evolve, and, you know, John is right about having more itemized
data at our disposal.
I think what I heard at the end of the day is that you basically have to
collect everything or close to it – a lot of data. And here’s my
question, and it may reveal my ignorance, but it also may help me understand
our scope a little bit.
So you have all this data about all these patients. They’re a really
rich data source. And if you’re a big institution like a Kaiser or other
big system, you have a ton of data on a lot of people under the rubric of
quality measurement, which is part of, you know, the treatment, payment and
operations covered by HIPAA. And so we kind of know what to do with that and
how to handle that and what the guidelines are.
But now you have all this data, and somebody comes and says to you –
and there’s, say, two scenarios. One is a medical device company that
makes stents, and they want to see the sort of outcomes of drug-eluting stents
versus bare metal stents, and they’re willing to pay for it. Or somebody
else is doing research, say, on, I don’t know, using bicarbonate
administration in folks with ketone acidosis, and they’re not going to pay
for it, but they’re doing research.
And it seems to me that that’s where it gets interesting for us
because now we’re treading – you’re talking about data that was
collected under the rubric of TPO for HIPAA which is now migrating into an
environment that is not so well defined.
And the question is – so that’s a secondary use of that secondary
data that was – and then there could be a tertiary user as this thing,
stuff keeps getting so – and I guess – is that really where we need
to be focusing on that margin or on those hand offs that are just not well
defined yet and vague. Is that the – and that’s a question for the
group. I mean, I don’t know.
DR. EISENBERG: Well, actually, one of the questions that I had posed to me
a couple of months ago was from some pharmaceutical groups that were discussing
to set up a clinical trial costs money at any site. So they would like to know
where the greatest population at risk for whatever it is they’re trying to
treat with their medication or their stent – where those people exist so
they only set up their studies at the highest volume sites.
Now at that point they don’t need to know who those people are, but
they – who the patients are until it’s time to really try to set up a
trial and find an investigator. But I do think that’s something that needs
to be addressed as – is that information, or how do we handle that
information and that access to the information because I know drug companies
and manufacturers of stents and all that do want to identify those sites.
And perhaps if it’s post-marketing surveillance looking at
pharmaco-vigilance, how much of that is part of operation of health care that
should be going on, and how much of it is commercialism which also should go
on, but maybe different security around it. So I think it needs to be
addressed, and hopefully the secondary use group will be looking at that
because we discuss it in HITSP all the time, and I have to get us back off
tangent to let’s look at the standards. But it comes up almost every
MR. REYNOLDS: Paul, you want to ask yours, and Kevin, I think it’s
absolutely the question on the table. That’s what our continuing
discussion is how do we – because I think your example was excellent in
that the first reason it was given out, it was given out somewhat de-identified
or whatever term we want to use for it.
MR. VIGILANTE: Well defined.
MR. REYNOLDS: But as soon as they get to de-identified, then it’s
really not helpful until everybody can go that next step. And then who
generates the next step? Is it the original holder of the data and what is
that? Is that marketing, and then do they get permission for the – so
you’re exactly right. That’s where it starts getting difficult.
Okay, John and Floyd, thank you very much for everything you covered.
We’re scheduled to be back at 1:15 if everybody would rejoin us then and
keep your questions and concerns coming.
DR. HALAMKA: Well, thanks, and certainly if I can answer any other
questions, I am on email and certainly I do again apologize for not being able
to be there in person today.
MR. REYNOLDS: John, thank you very much. You were very helpful again.
DR. HALAMKA: Thank you.
(Whereupon, a luncheon recess was taken.)
A F T E R N O O N S E S S I O N
MR. REYNOLDS: Okay, let’s go ahead and get started. We’re going
to be hearing from will have to do with methods of protecting privacy in uses
of health data. We’re going to hear from Glen Marshall and Lori
Reed-Fourquet who we’ve heard from a number of times before. So you want
to go in order of the agenda, or did you guys have a preferred agenda.
MR. MARSHALL: It would be probably better if I were to go first and give
them the framing.
MR. REYNOLDS: We’d like that. Good.
MR. MARSHALL: Okay. Actually, just to point out that a lot of the questions
and the comments that I heard in Floyd’s presentation are sort of a good
lead into the presentation that myself and then Lori will have. We’re
going to get into somewhat more of the issues around the protections in the
Just to introduce myself, I’m Glen Marshall. I work for Siemens
Medical Solutions, but I’m also the Co-Chair for the HITSP Security of
Privacy Committee as well as Co-Chair for HL7. I’m not identifying that on
the slide because this slide set has not been vetted with any of those people.
So this is really my professional view on it, but I’m should at least
reveal my affiliations and point out that what I’m going to say is not
necessarily their opinion. However, you will see some of the thought processes
going into those qualities as a result of all of this.
Okay. Even though a lot of the questions that we were asked pertain to
specific methods of data protection and hinted that there were some things to
be directly addressed, one of the things that I always do when I’m asked
questions about, well, should we use this specific technique or should we use a
specific standard or what have you, gets into, well, why would you ask that
question in the first place. And it really is that in order to successfully
address these issues, we do have to dwell a bit on the risk management aspects
associated with it.
So it’s really these questions what data assets are at risk, what are
the stakeholders, and who are they, what is exactly at stake, what policies are
applying and what do the policies require, what threats exist and also
specifically what happens if a threat is realized, and, of course, what
controls already exist in the system because if people are proposing novel
controls that duplicate the purpose of existing ones, why bother.
Okay. I’m going to make some assumptions about the data assets, and
we’re talking about secondary uses. And so really what we’re dealing
with is data obtained for these purposes, measurement quality improvement which
did come up last time as a question, population health assessment, health
improvement, and, of course, clinical research which could be not only pure
research, it could also involve industrial uses of the data.
There may be other uses, but these seem to be the topics that we’re
DR. TANG: Can I ask one clarification.
MR. MARSHALL: Yes.
DR. TANG: When you say obtain for this purpose, did you really mean the act
of acquiring that data for this purpose, or did you mean use for this purpose?
MR. MARSHALL: Yes, I really have to deal with both of those because from a
risk management standpoint, the way that we acquire the data as well as the way
we use it are sources of risk and also are covered by policies. So it’s
quite important that we cover and our scope be both.
Okay, the stakeholders – and, again, there may be more of these, but
it’s important that we’re dealing with the healthcare subjects, and
the subjects of the data, and these are the patients or people who are somehow
identified with healthcare provision. So it could be a patient, their
caretakers, a variety of situations. It could be an entire family, for example.
Healthcare providers who collect data are obvious stakeholders. Healthcare
data repositories, and I do want to qualify that I am not advocating a
centralized view. I am just noting that data tends to be at rest some place.
Okay, consumers of the data obviously that could be anybody that wants to use
the data, and then public beneficiaries, and these are really people who
benefit from the outcomes of all of this, and they all have a stake in it. And
each one of them has specific risks and specific benefits that need to be taken
Now I’ll just mention parenthetically here to give you an example. Not
only do we have risk of privacy to the individual subjects, but let’s say
I was a bio-terrorist. What would be the first thing I’d want to
compromise would be the public health data system, and then I’ll introduce
my pandemic. Okay. There’s a threat.
Or let’s say that I am a clinical user of the data for clinical
research, and industrial espionage is of use. So each one of these people has
an interest to protect, and we have to keep all of these and interest and keep
all of them balanced as we choose what we’re going to be doing.
Okay, now this is the laundry list and I ask that question, do we have
enough with sort of tongue in cheek. I would, in my opinion, say that we have
too many masters that we’re serving here, and this just at the federal
level is enough to cross your eyes. And then you get into all of the state and
local variations on that, and this is an essential thing. In the HITSP world,
we have been dealing with individual patient consents not as privacy controls,
but as policies that are developed by the patient on their own behalf to apply
to their specific cases, okay. And if you deal with them that way, it actually
becomes technically easier to treat them.
Obviously, we have enough and perhaps too many, and we say that we do not
need more, we need less and far more understandable.
Okay, let’s take a look at the kind of threats that we’re going
to have to consider here. Obviously, loss of data confidentiality. That’s
what patient privacy is really all about is the patient’s right to have
their data kept confidential. Confidential means not observed by people who
they don’t want to have observe it.
Loss of data integrity, this being that the data could be corrupted in some
way, either maliciously or just accidentally. Loss of data – just outright
loss of data via accident or whatever have you. These are all categories. Loss
of data collectors. Now here’s a case where if you need to know something
and you don’t have the people who are collecting that data, you’d
have a chance of knowing it, and that is a threat that you have to protect
against. Loss of the data repositories themselves which would suggest maybe
strategies for back ups and that kind of stuff.
Loss of funding for threat litigation. This stuff is not free. It’s
not automatic. It doesn’t happen as a result of the system. If I had a
data repository, I’m going to have to fund the stewardship of that data
for as long as the data is useful, and that could be periods of decades.
The side effects from threat litigation. For example, if I have a
mitigation that shields and de-identifies the data and later on I have an
urgent public need to know who had that case because let’s say it’s a
drug-resistant TB or something like that which we recently had a case of that,
then I may have an urgent need to re-identify the data. But if I don’t
have those links, all I have is a case record.
Terminology overload confusion. I’ll give you an example there. The
term audit as a security professional, that has very, very specific meaning to
me. As a privacy professional, it has a different meaning. As a database
administrator, it has a different meaning. As an accountant, it has a
profoundly different meaning. What is auditing. It actually is a whole variety
of activities and recordkeeping about the data. But to give you another example
and this happened during the development of this talk, the term authorization
is oftentimes used colloquially to be synonymous with consent or permission.
From a technical standpoint and from an operational standpoint, they are
profoundly different concepts and they’re done by very different people.
So we have these issues. If you talk to me as a security professional, and
you want me to do auditing and what you really mean is you want a record of all
of the patient consents, okay, I may give you an answer that you didn’t
expect because I heard the question differently. That happens all the time.
And, of course, that’s outright human ignorance.
We have a lot of people – this is a difficult topic, and it’s
– on the surface it sounds simple. But when you get into the weeds, it is
quite difficult and education is somehow required. So this isn’t ignorance
in the sense of illiteracy; this is just you don’t know better.
Okay, let’s take a look at in the current system and in the current
environment, we have a variety of control – manual protection procedures,
keeping things under lock and key, barring people from the door, having guards
at the door. There’s a whole variety of things you can do there. Educating
the stakeholders, okay. The more people know about this kind of stuff, the
better pre-prepared they’re going to be.
Physical and network protections. This gets into things like encryption at
the network and making sure the wiring closets are closed or making sure the
computer room is secure, those kinds of things. Penalties for privacy and
security violations. We already had some prescribed in HIPAA. Arguably, there
could be more. But the point is that they serve to discourage people from doing
things they shouldn’t.
Insurance – something that’s oftentimes overlooked is you may
want to protect your financial risk because of – just by insurance because
insurance can help you fund the recovery of data that you may have misplaced.
And then, of course, there is localized controls that are very, very
situational, and a lot of our healthcare providers have those in place already.
So let’s make sure that we don’t overload them and make them
So I’m going to give you a quick tour through – now this is from
a security professional standpoint. How do I go about doing my job. I start
with the privacy policies. What am I required to keep confidential, and that
could be keep confidential in terms of patients, it could be keep confidential
in terms of I’m a healthcare provider and don’t want my competitors
to see the data, or I’m a clinical researcher. I don’t want to have
industrial espionage. So there’s a variety of things that are policies.
Policies are timeless statements of what outcomes you want. They do not tell
you what to do; they tell you what should happen or what shouldn’t happen.
Security policies are very similar. They are actual protection of the data
assets, and they’re different from privacy because of the technical nature
of them. But they really do tell you what security outcomes you want. You
really get those two together, form a body that I would call system object
Threats themselves usually come in as a list of things that could happen,
and then you have to go through a step of risk analysis which is basically
determining the probability and the economic value of those threats and somehow
getting some sense of priority about them. And, of course you have
environmental protections that do go into the analysis.
The combination of system access policy, risk analysis and environmental
protections together will form a body that’s known as security objectives.
Now for those of you that may be familiar with any security technology, this is
going to start looking a lot like the common criteria and the process that you
use, ISO-15-408 Standard, because this is how you wind up going through and
arriving at security objectives in the 15408 framework. From that, from the
objectives there are three outputs, technical requirements. These are the
things that you do technically to protect the data from the – to enforce
the policies and to mitigate threats.
Then you have assurance requirements. These are things that you do to
maintain security over time. For example, a technical requirement of security
is to produce audit logs. An assurance requirement is somebody has to read the
things, okay. An assurance requirement is that somebody has to educate the
users. They have to make sure the system is installed and wired up correctly
and a variety of things like that. So those are things that you do that
aren’t necessarily technical, but they serve to provide and lock in a
level of assurance. It also, by the way, means that periodically you have to
test the system and stress it to make sure the protections are in fact
Then you have environmental requirements. These are things that, at the
final analysis, are not technical requirements, you wouldn’t do from
assurance, but you want to be provided to you in the environment. For example,
I could say I need a specific computer system. But in order for that computer
system to run, I have to increase the tonnage of my HVAC. That’s an
environmental requirement. Okay, so that’s pretty much the way that you
run a risk management-oriented program, and this by the way, because it does
marry up with a standard, the ISO-15-408 standard, it also considers the
framework to evaluate products and the adequacy to fit into a pre-established
framework of this sort.
So I wanted –- the reason I went through this little exercise on risk
management, unless you have it in place, you’re going to have proposed
controls that may or may not work. They may ignore policies. They may not
protect against the threats that you make. They may duplicate and therefore add
cost to the system without improving the security of it, thereby incurring
unreasonable costs – unreasonable meaning that there’s no rational
reason that you’d spend that money, and they provide no real assurance,
and who do they provide assurance to. So there’s an exercise here that you
need to go through so that you can at a later point actually provide a rational
basis for your choices.
Okay, so now I’m going to get into some proposed controls. I am going
to provide these without architectural recommendations specifically thinking as
a HITSP co-chair any of these things do not imply a specific architecture, but
any examples I give are in fact just that – examples.
So the first thing is that you want to achieve accuracy for the data at
rest. That means clinical records – not so much data in motion across the
networks like transactions but when you get a repository of some sort. You
really want to use standardized data sets. Why? Because you can detect a whole
bunch of good things if you’re using standardized data sets as opposed to
freeform text. So there’s a lot of automatic controls that become
HL7 CDA seems to be something that’s used a lot if you take a look at
the history constructs, it most certainly is because it’s an essential
part of the IAG approach that a lot of their work has gone into.
Using a standardized vocabulary, now there’s really two dimensions
here. You have to use a standardized vocabulary obviously for the clinical
data. But we know that these vocabularies are always in flux, and we talked
about earlier, as Floyd talked about mapping concepts. So if I have nursing
data sets, for example, CCC or Omaha, if those are changed and new terms are
brought into them, they have to be mapped against SNOMED. So there’s a
harmonization process. And that’s something you just have to keep doing.
Also, the orthogonal to that, make sure that when you introduce a term like
auditing or authorization that you use it in a way that everybody understands
exactly what you mean, and it means the same thing to everybody. Providing
assurance for the data subjects. It turns out that if you provide a certain
degree of assurance to the people who are the patients that you’re
collecting the data from, they are far more likely to provide you accurate
data. If you provide no assurance, they’re going to lie. I mean, how many
times have you given your exact phone number to somebody at the checkout
counter who asked for your phone number when you’re paying with your
credit card. I mean, I always give them a fake number. It’s none of their
business because I have no assurance they aren’t going to publish it.
Okay, so consent and confidentiality controls are obviously part of that.
Now providing incentives for data sources. This gets into the ugly little facts
here is that – in the current reimbursement arena, just the cost of care
are covered. Very little money is left to invest in these new technologies.
People, therefore, will go to seek additional revenue sources. Now clinical
research is one of those primary sources. So basically healthcare providers
have a very large incentive and it’s a financial incentive to collect
data, but being an incentive to collect data that we the public need. I
don’t know. Right now, if they’re being asked to collect data and
they’re not being reimbursed for it, we are not likely to get as good data
as we’d like.
And, of course, educating the data subject and the sources so that they
understand the necessary implications of everything that they’re doing. So
what I’m saying here has to go well outside of this committee and get a
lot of detail.
More controls – integrity of the data at rest. A very simple thing
here. This is a PIP standard. It was developed as part of the roll out of their
recent AES which basically is a very strong form of hashing that means that
your data at rest will likely, if you provide a hash code for it, you can prove
its integrity over a period of decades without risk of spoofing.
Providing assurance to data subjects. Obviously, it turns out that the, as
I pointed out, a lie can perpetuate, and it could create data integrity issues.
So we have to provide assurance to people so that they will in fact disclose
the correct data to you.
Providing incentives to data repositories, again, the data repositories are
the people that are going to be providing integrity over the long term. And
unless they are paid for what they’re doing, they’re going to wave
their hands and say, oh, yes, I do that, and not really do it. And, of course,
educating the data stewards, whoever they may be, whether that is a local
repository at a hospital or an original repository, they have to know what they
Availability is another issue, and, again, using standardized data sets
turns out to enhance the availability of data because that means that you do
not have to go through scanning text or, if you will, interpreting. But it
means the data is far more readily available. So this improves the time
dimension of availability remarkably.
Repruning data subjects and sources. This just basically means that, you
know, if you recruit ahead of time, and you know your population ahead of time,
you have a better chance of getting the data when you need it. Providing
assurance for data subjects, again, if the data subjects are not assured,
they’re likely to avoid even being asked questions and making their data
unavailable to you.
Providing incentives to data repositories, again, if the data repositories
are not going to be automatic data-admitting machines without payment. So we
have to deal with that. And, of course, educating the data stewards, again,
helps with availability.
What you’re going to see is a pattern of certain of these mitigations
actually serve to provide multiple controls, and a lot of them are relatively
simple to do and non-technical.
Okay, now the subject of consents has to be brought up at this point. All
of us have been to physicians’ offices, and all of us have been asked to
sign the HIPAA Notice of Privacy, and it turns out that that isn’t a
consent. That’s just somebody’s told you what they’re going to
be doing with your data.
There has been a suggestion that we want to get much more active consent
from people, especially for secondary uses. However, I’ve seen these
consents, and they’re things that only lawyers could love. So first thing
is we want a standard form for all people, and that is that when you sign a
consent, you should recognize it as a consent form right off the bat, and it
should be unambiguous as to what it is that you’re doing. So there’s
a certain amount of standardization there.
Simple language in the subject’s native tongue. We have had, if you
will, newspaper reports of people who are not native speakers, maybe
uneducated, being recruited for medical tests, and they didn’t know what
they were signing because it was in lawyer’s English, and they spoke
Spanish. There’s a problem with that.
Okay, verbal and written form. This is important that you have to be able
to make sure, and usually it’s in a couple of ways, that the person has
given their knowing consent, and verbal and written usually are the acid tests
there. And it has to be limited meaning that the specific purpose has to be
noted and the duration of that purpose has to be noted, especially the
accountability, and this is something where we’re a little bit lax in the
current rules and regulations is that the person who is signing the consent
should be able to know who is accountable for enforcing that consent. Should
they appear violated, they shouldn’t have to go looking for somebody to be
And of course, no duress. This has a couple of things. You know, I’m
not talking about arm twisting, or paying somebody or what have you. I’m
talking about sticking a consent under somebody’s nose when they’re
in an office for treatment, and they feel their problem is urgent, and
they’re willing to sign anything to get that treatment. That’s
So these are some of the issues that are coming up, and I believe that if
we were to go after consent at this level that some of the consents that we get
would in fact be of high quality and dependable and would be immune from the
litigation threat, which by the way I didn’t list litigation as a threat,
but we all know it is.
Okay, confidentiality for data at rest. This gets into something – now
Lori’s going to get into this in far more detail. But we’re really
talking about standard anonymized de-identification which I’m defining as
a permanent redaction of identifying data to provide assurance of
confidentiality for the subject of healthcare information, re-identification
being highly unlikely.
In security, you never say something is impossible. You really are trying
to say that the level of assurance is very high, okay. And that – what
that really says is that there are certain data mining techniques that are
known to our national security agencies that will in fact reveal our
subject’s identifiers, and we just have to realize that those techniques
are known to bad guys as well.
Okay, standard pseudonomyzed de-identification –basically you’re
substituting identifying data with something else, and you’re providing an
assurance of confidentiality for the subject, okay. But that doesn’t mean
that it’s sort of a lesser assurance because re-identification can occur
under already pre-defined conditions. So, for example, let’s say I
substitute a patient’s identity with a number, and that number, if the
person who knows what that number indexes to, can serve as a link to
re-identify the data. But that means that the person who has agreed to it has
to go through a protocol to actually do all of that re-identification, and
there are technical aspects of that that Lori can get into.
Okay, and then there’s one other thing which is aggregation. If all I
do is provide you aggregate data like how many cases of a certain type do I
have, it actually does serve to completely de-identify the data. But if you
have a small aggregation group, I can re-identify it by forensic techniques.
Aggregation exposes you to the risk of data availability if I need to drill
into the aggregate data to find out more facts about the details.
And, of course, it’s only useful for the aggregation derived purpose,
and oftentimes that also exposes a risk of misuse where people will take
aggregate data and apply meanings to it that it was never intended to mean, and
they’ll imply it by implication, and this is where you get conspiracy
In any case, that’s pretty much the realm of the proposed controls
that I would go through based upon what I see as a surface analysis of the
risk. I’m sure that more would pop out if we did a thorough risk analysis.
MR. REYNOLDS: We’ll let Lori go ahead and then we’ll answer
MS. REED-FOURQUET: As Glen indicated, I’m going to take more of a
technical dive into some of the specific techniques that we talked about for
privacy enhancement. Much of my perspective comes from my involvement in
standards and the topic of pseudonymization, anonymization is one where
we’ve just generated a standard in ISO-TC215 for which I serve as the vice
convener for WG5 on security.
When we look at considerations of privacy for secondary uses of healthcare
data, and in some cases throughout these slides much of the more informed work
comes from what we did last year in HITSP biosurveillance, and we haven’t
taken as deep a dive on that yet in quality, so just keep that in mind.
If you’re looking at privacy matters and these are in compliance with
HIPAA and protection of human subjects, we’ve got four options for
enabling the secondary use of the data. Using the personal data with consent or
non-objection for the data subject, as Glen just described in the consent
approach; obtaining an IRB approval where it’s determined that the risks
are minimal and/or acceptable; using the personal data without explicit consent
under some public interest mandate or anonymizing, de-identifying that data
prior to use.
Many complications with relying on consents and authorizations for that
secondary use. That consent would be very difficult to track to its original
authorizations, and this is particularly the case for research subjects. How do
you go back and interpret the original authorization now to a broader context
of application of that data. And then if you need to go back and
retrospectively identify and contact those original data subjects to add the
additional explanation of how their data is used and extend that authorization,
that becomes very difficult, especially where those database projects may
contain tens of thousands or millions of records. How are you going to go back
and obtain truly informed consent from those data subjects retrospectively on a
There’s an additional risk of privacy for those patients. They may
object to being contacted down the road particularly if it’s been a
circumstance where their health or environmental issues have changed since
their original authorization or consent for using their data for research.
The other risk of privacy threat would be linking techniques to what might
have otherwise been de-identified data in order to find the contact information
for those individuals. So in trying to go back and capture their authorization,
in a sense you’re compromising their privacy.
Privacy enhancement technology – and notice I use the term
enhancement, although throughout here you may see the term protection. It
touches upon that statement that Glen just made that we are never in security
100 percent assured that we have protected it, but so we will talk about this
as an enhancement.
We’re not just talking about securing the information, but using the
technology to protect the information and personal privacy. Pseudonymization is
the technology we’ll drill down upon in more detail which is essentially
reversible de-identification technique, although it’s not always
reversible. It can be consistently traced throughout the information system. So
many of the current or historical research projects will fill the track with
patients within an organization. But if you’re doing a community study or
otherwise, the linking of those records across that community or other cohort
becomes more difficult without some sort of identifier on top of that.
They can be under strict and defined controls and enable re-identification
of those data subjects in accordance with policy. Converting the identity into
a pseudo-identity for use within an information system is what we’re
trying to accomplish, and we would typically, if you’re going beyond the
walls of an organization, rely upon some sort of trusted third party service to
securely assign and manage those pseudo-identifiers across the domain of set of
Re-identification, again, can be restricted and defined to pre-authorized
rigorous procedures that would be fully implemented security controls, and that
needs to be specified in some sort of re-identification policy.
The standard that I referenced ISO TC215, Health Informatics
Pseudonymization, was approved in March of this year. The technical
specifications specify its principles, requirements and guidance for privacy
protection using pseudonymization services for the protection of personal
We named it by HITSP last year for privacy enhancement of biosurveillance
data, but it’s included in the HITSP quality requirements and design
document for privacy enhancement of quality data, although we have not since
completed the actual detail analysis of how much anonymization is going to be
required when we get to those data elements.
Some of the terms related to anonymization, de-identification, and this is
how we have defined them in the standard, is the general term for any process
of removing the association between a set of identifying data and the data
Anonymization is the process that removes the association between the
identifying data set and the data subject is a subcategory of
de-identification. It does not provide a means to link back to that person
across multiple data records or information systems, and it does not allow the
re-identification of the anonymized data.
Pseudonymization is a type of anonymization that goes –
MR. REYNOLDS: Lori, hold on a second. I think Paul has a clarifying
DR. TANG: Lori, can you help me understand the difference between
de-identification and anonymization given your third bullet says you cannot
MS. REED-FOURQUET: The de-identification is really a generalized term which
includes both anonymization and pseudonymization and other techniques such as
removing the data identifiers. It’s, again, a general classification.
The distinction between anonymization and pseudonymization is that the
pseudonymization is going to allow you to link and potentially re-identify the
data subjects, and it’s going to allow you to link across the multiple
I’m going to go into more detail as well. If that’s still not
clear after I drill down, perhaps we can come back to that.
DR. TANG: Well, I think I’m having trouble right at the beginning here
because I believe I understand the difference between anonymization and
pseudonymization quite clearly. What I didn’t understand is what you said
that de-identification is –
DR. CARR: Anonymizaton is a subset of de-identification. What else is a
subset of de-identification. I mean, is there something else?
MS. REED-FOURQUET: Is pseudonymization is also a type of de-identification.
DR. TANG: I think that is to me, it’s in conflict with your definition
of de-identification because that says the removing of association between a
set of identifying data and the data subject, and pseudonymization does not
completely remove the identifying data and the association between the
identifying data and the data subject.
MS. REED-FOURQUET: It does remove the identifying data, and it removes the
association with the data subject. You cannot get back to that data subject
unless you go back into the algorithm essentially that was used to provide the
pseudo identifier. So in the absence of, say, a key to unlock that identifier,
you have no means of identifying that patient.
DR. LOONSK: Yeah, I just wanted to comment that I think that there may be
different interpretations of how anonymization is being used and how
pseudonymization is being used in the context of if any of us had any
responsibility for reinvigorating the life of these terms. There were some in
public health who were seeking to remove direct identifiers from data being
provided to public health and provided to an authority where those data are
potentially managed and protected where that authority potentially has even
broader domain potentially to accessing those data, but where de-identified
data in the HIPAA context lose substantial portion of the meaning.
So, again, if you take de-identified data in the HIPAA context, you lose
localizing information that may be of importance in biosurveillance
circumstance. On the other hand, getting named data with the person’s
record number or the patient’s name was not always necessary for
biosurveillance, and there was a desire in public health to seek to not get
more data than necessary to support the need.
Pseudonymization or pseudonymization had a particular role in the public
health context wherein, for example, local public health agencies may get name
data specifically to follow up on reportable diseases and didn’t want to
lose those named data or the ability to follow up on those cases as is part of
their authority and part of the requirement for their activity by law.
So the concept of pseudonymization and there are probably – there are
certainly a number of other contexts for it, but one is where an appropriate
authority can make a request in the case of, for example, a reportable disease
to try to ask the data provider to identify more specific information about the
patient so that case report can be completed and that can be followed up on if
it’s a case of communicable disease, for example.
DR. TANG: Okay, so I think I –
DR. STEINDEL: Lori, let me phrase the clarification question like this, and
this has to do with Glen’s comment that a lot of terms in this area are
overloaded. And if we talk about de-identification as it’s described in
the HIPAA regulation which is the way we usually think in this room about the
overloaded term de-identification, that would be a form of anonymization, is
MS. REED-FOURQUET: That is.
DR. STEINDEL: Because that is a process to remove the association between
the data and the identity.
MS. REED-FOURQUET: Yes, that is correct. However, I will step through the
process that we went through last year with biosurveillance where that is only
step one of a three-step process.
DR. STEINDEL: I’m just trying to clarify terminology.
MS. REED-FOURQUET: Okay.
DR. STEINDEL: And now with respect to de-identification as you have it on
this slide, that is a definitional statement with respect to just the ISO
publication. So it’s really different in de-identification from the HIPAA
MS. REED-FOURQUET: Yes, it is.
MR. REYNOLDS: Last comment on this, and then I’ll let –
DR. LOONSK: I would suggest that in the way that de-identification has a
fairly clear definition and it’s principally associated with HIPAA
de-identification which is very specific, pseudonymization has a specific
meaning which includes the association of some sort of data linker so that one
can get back. Anonymization is ambiguous in its meaning and has been used in
several different contexts, and whether it’s a super set or a subset of
de-identification is depending on the context and is not clear.
MR. REYNOLDS: All right. Lori, will you please continue. Have you got a
real problem on this –-
DR. STEINDEL: Yeah, I just got confused on this.
MR. REYNOLDS: No, that’s fine. Go ahead.
DR. STEINDEL: Because I thought the way I stated it was relatively clear
because we’re talking about the word de-identification and anonymization
and pseudonymization with respect to these definitions with respect to ISO. And
de-identification in the overloaded sense from a HIPAA point of view is very
specific. With respect to the ISO definition, it falls under anonymization
So anonymization with respect to this slide which is what we’re using
for clarification at this point in time is not an overloaded term. Now I agree
with what John says when we put it into the HIPAA world, we have the exact
opposite of that where de-identification is relatively clear and anonymization
So that’s the problem that we’re grappling with on this
committee, and I think it’s something that we’re going to have to
attack when we write a report.
MR. REYNOLDS: So summary is Lori’s presentation is Lori’s
presentation, and –
DR. STEINDEL: And we need to think about it with respect – we have to
think about it in the next hour or so with respect to Lori’s presentation.
MR. REYNOLDS: Go, Lori, you’ve got it.
MS. REED-FOURQUET: Okay, and just to follow up on that a little bit, these
terms were included in our final comments because the industry commenters at
STC groups and the vendor said that this is an area where there is a lot of
confusion, and that we needed to put down some specifics in definition. So
that’s where this ends up. Okay, a little bit more on semantic
overloading. But for the purposes of the tactical specification, we did rely on
the definitions from the European Directives for personal data meaning any
information relating to and identified or identifiable natural persons, and
that being the data subject. And we extend the term data subject in the
technical specification to also denote other entities such as devices,
processes, organizations, et cetera. But the other – let’s be careful
on further overloaded terms. Since there are other privacy legislation out
there that are using these terms, not only do we want to harmonize those terms
in the standard world, but let’s be careful moving forward on how we do
that in the legislative area as well.
So in the – the first thing we do when we look at the pseudonymization
technical specification was identify potential uses that we would be needing
this technology for specific to health care. There is an annex that describes
some uses cases, and these include secondary research use of clinical data,
clinical trials post-marketing surveillance, pseudonymous care – in other
words, if you’re a patient who wants to go to a website and keep their
record anonymously, they should be able to do that, or otherwise moving
laboratory data as de-identified samples. Patient identification systems where
we’ve had some discussion in the past on the voluntary patient identifiers
and how a patient might be able to be in control of their identity and perhaps
change that identifier if they felt it was compromised. So this technique
Public health monitoring and assessment, and we actually included our
biosurveillance use case as the specific example in the annex.
Confidential patient safety reporting such as adverse drug effects, and
this is one of the use cases where we may want to not only protect the
patient’s privacy, but also the provider so that we do not threaten the
provider to make those reports.
Comparable quality indicator reporting, as we’re looking in the
quality use case now, other such peer review and consumer group. You’re
linking that data to some traffic data to health data. You may be supporting
some of the consumer organizations.
The concept of identification now. We have a set of data subjects and a set
of characteristics. So a data subject is identified within a set of data
subjects if it can be singled out among the others. So if you have an
information resource that has a certain number of subjects, they should be able
to be linked by those characteristics.
Some of the associations between the characteristics and the data subjects
are more permanent than others such as your social security number, your date
of birth and others may not be as long list such as your email.
When we go through the processing, we talk about a payload. So a payload
would be as we take the personal data and we split it up into two parts, the
payload is the information that is considered anonymous, non-identifying and
then the identifying information is separated out. The identifying information
contains a set of characteristics to allow the unique identification of that
subject. So you’re going to have your demographics, your social security
number, et cetera as being the identifiable.
DR. TANG: When you say contains anonymous data, do you really mean that
another word for payload cannot include any text that is not codified but could
be re-identified by classification.
MS. REED-FOURQUET: Yes.
DR. TANG: It truly means there’s absolutely nothing in there.
MS. REED-FOURQUET: Yes, because this is going to be the stage before you
finally publish it. So for this processing, we are making the assumption that
the payload is already anonymized. So you may have processing happening before
this stage. It may be this text processing. It may be other codifying – it
may be removing the last two digits of the zip code as we talked about for the
zip code’s small area of risk.
DR. TANG: So, as an example if you were to pseudonymize this packet, then
it is true that if you wanted to re-identify, you actually might have to change
MS. REED-FOURQUET: To re-identify –
DR. TANG: No, I mean I would be allowed to have a different payload when I
choose to re-identify if I’m authorized to re-identify a pseudonymized
MS. REED-FOURQUET: The re-identification process would bring back in the
payload after you have your identifiers. You would have the payload information
repository with a pseudo-identifier. You would need to go back and locate the
tuberculosis patient or whoever that might be, and you would –
DR. TANG: well, let me make it very precise like TB. Okay, I’ve
described everything about this person who went to Europe and back. So in the
text of my progress notes, when I sent that to you pseudonymized, I sent a
pseudonym, but I’ve also cleansed all the text. And when I went to
re-identify this package and its payload, it really requires repulling which
– do you see my question. Really I no longer have any value, and I
can’t create it by re-identifying the algorithm.
MS. REED-FOURQUET: Well, there are many implementation approaches to that
and techniques that can be used. So when you cleansed that pretext data,
hopefully first of all you codified it where you didn’t have information
walls. If you really needed to go back to the free text, there are other
techniques. So we could have encrypted, for instance, that text and made it
part of the payload. And maybe you were only authorized to decrypt it under the
same re-identification authorization processes.
Or, as you say, you may go back to the information source now that you do
have the true identifier and go to the clinical data that has much more depth.
DR. TANG: So I think in the process of asking the question and hearing your
response, I have a different understanding of pseudonymization than I had when
we first defined it which is that it’s not true that I can apply an
algorithm that I got permission to use to actually re-identify in toto and get
the full information content of the payload. So that’s missing.
MS. REED-FOURQUET: Correct. Correct. You’re only getting to the
identifier which will then enable you to further get to the payload.
DR. LOONSK: And a related and equally complicating issue, this implies that
potentially from the payload you may have excluded things like text like the
woman next door.
MS. REED-FOURQUET: Yes.
DR. LOONSK: Who with a natural language processing would not normally
identify it because it doesn’t look like a name, but would be to some
extent identifying information that is eliminated from this payload.
MS. REED-FOURQUET: Yes. So it certainly is a balance because I suppose if
you were trying to see who – where the source of an infection came from,
that woman next door may be a key piece of information, but this isn’t the
only source of information. By getting the identifiers, you should be able to
go back to the information source as you would typically do today.
DR. LOONSK: Okay, Lori.
MS. REED-FOURQUET: Okay, anonymization is the process that removes the
association between the identifying data set and the data subject. So that
might be done by removing or transforming the characteristics of the associated
data or by increasing the population of the data sets so that that identifying
data is no longer unique. So basically filling it with dummy data, if you will,
so that it’s less identifying.
Pseudonymization now is a particular type of anonymization that after you
remove the association, you add an association of a pseudonym. Okay, so if
it’s irreversible – so pseudonymization allows for it to be
reversible, but you can also implement it in an irreversible mechanism, then
you do not have a method to drive the pseudonym. So pseudonymization is still
valuable in a one-way scenario by linking the subjects across the multiple
domains, but you may choose to never allow identification.
So if in reversible pseudonymization the model includes a way of
re-associating the data either from a derived payload or from a pseudonym in a
look up table. So typically today if it’s a local hospital, the hospital
may have a look up table, and that’s how the research organization would
come back to them as opposed to being able to recompute the identifying value.
So the pseudonymization processing, you would take identifying data and the
payload data, split them into two different parts, pass the identifying data
through a person identification service, and this is how we specified it in
HITSP, and then that pseudonymization service will turn the identifiers
consistently into the same pseudo-identifier typically especially if
there’s a trusted third party approach through a key cryptographic method
and then you would take that pseudonym plus the payload data and that would be
considered de-identified, and you can load that into whatever information
resource you’re trying to make available.
I’m going to come back to privacy threats – yes?
MR. REYNOLDS: You will not be first on the other question. Go ahead.
DR. TANG: On the previous slide, so there I did not see a method used to
anonymize the payload data.
MS. REED-FOURQUET: I have not included a method in here for the
anonymization, and I’ll drill down a little more on the anonymization
process in a few slides further.
Identification or re-identification is one of the privacy threats. So our
concerns if we’re going to keep it a secondary use information resource,
one might be can the status of the subject be re-identified. So if I have a
data item, can I establish a link that that data subject. And if I have the
data subject, can I establish the data items that are associated with that data
subject. That’s a little more simple than the inference, and it’s the
inference issue that causes the more complex problem.
So if I have given the data subject, can I verify it against another set of
characteristics that I as an attacker have access to and then associate that
with the data subjects. So this is the linking it with some other data resource
whether it’s traffic information or anything else that I may have access
to that you’re not expecting to be linked with your information resource.
Given the data subject, can I verify that the set of characteristics is not
associated. So I can as a hacker come in and start weeding through the data.
And, again, pointing out that that attacker may have access to additional
information resources – either authorized resources or unauthorized
So we want to be careful that we don’t consider the data that results
from anonymization or any of these de-identification processes to be fully
protected in and of themselves. If somebody is going to have data mining access
to them, they can certainly be a threat to privacy.
Refining the concept of identifying ability and anonymity. We need to
account for all of the means that are likely, reasonable either by the
controller or by some other person to re-identify. And, again, this is a quote
that was taken from the European Data Protection, and that drove us to want to
in the standard refine the concept of identifiability and anonymity and take
into account in the threat model what means are likely and what those any other
person might be.
Okay. Levels and approaches for anonymization. So level one kind of coined
rules of thumb on data items. That is for the most part, that was the 18
variables from HIPAA. Let’s simply remove those.
Level two is a status, data model, data flow, re-identification risk
analysis. If I have a diagnosis code and an admit date, I probably could
identify patients under certain circumstances.
Level three is more of a dynamic populated repository. In theory,
we’ve created a repository of information that should be pseudonymous. It
should be privacy enhanced. But if I do an analysis against that repository
similar to an audit, if you will, I might be able to identify additional
outlyers that were not considered in level two.
So back to level one anonymization, and this is the process that we went
through for the biosurveillance use case. We took the 18 HIPAA variables and
tried to identify where we might have a problem in generating biosurveillance
information resource with simply removing those identifiers.
The items that came up less than 20,000 people in a zip code. A zip code is
a very important element for biosurveillance.
Dates, a birthdate is not a biosurveillance issue, but it certainly is
coming up in discussion in quality. I did highlight in this case because we
need to know specific ages that a patient is before applying to see whether or
not they’re qualified for a certain measure.
The admit and discharge dates, again, will have issue with the quality
Level two anonymization – we looked at the data set that was given to
us by the AHIC. Date of variables that were likely to be in the form of
freeform text. We had suggested that those needed to be codified and/or
removed. Those data elements would be chief complaint, things such as nurse and
triage notes, and test interpretations. And in some of these cases, there are
other variables that were typically being mined today for detection. So
we’ve recommended that those be codified.
MR. REYNOLDS: Lori, can I interrupt you for a second.
MS. REED-FOURQUET: Yes.
MR. REYNOLDS: I know you’ve had a lot of questions. I noticed
you’re about halfway through your slides. I’d love to leave enough
time if we could for questions because I think there’s going to be a lot
MS. REED-FOURQUET: All right. Data variables, subject to re-identifications
from other fields would be putting such things as facility codes, diagnosis
codes, disease state, and laboratory results. You put those together, and you
may have some compromise, and then outlyer variables. Really the diagnosis code
is the one that’s highest risk of being an outlyer. The risk analysis,
though, is resource specific. So we have to this resource analysis when we get
our data set for quality.
The level three anonymization really is more of an audit process. It’s
continuous re-identification risk analysis of the live database. It would take
into account the content of the data, the outlyers it might lead to indirect
identification, and we should be able to have a routine risk analysis criteria
specified by policy or by a service provider so that there’s an assurance
level that this is happening on a routine basis.
The re-identification risk analysis should be defined by policy. What does
it take, who can re-identify, what it the authorization process, and that is
very much going to depend on the risk model. And apart from the regular
re-assessment of these reviews may be triggered by events. So if you change the
data variables, if you add data variables, you should re-review your risk of
Identifying information that is necessary in secondary use may need to be
enabled. You may need to roll up to the three-digit zip codes. There are other
techniques that can be applied to further encrypt other data elements where you
may need to make those data elements accessible, and then they are all going to
need to be coupled with all of the other security controls that Glen had
discussed – access, physical controls, personal sanctions, et cetera.
Just briefly on consent, if I could deviate slightly, the sensitivity
classes and functional roles are standard classifications that we are working
on in ISO and trying to work through the HITSP process. When you provide an
authorization, if you’re going to include consent and authorization to
access this information, it would be helpful if we can start getting into more
of a structured representation of that consent so that we can have machines be
able to interpret, read and act on that information. So which data is being
authorized, what sensitivity level is it, to who are you authorizing
disclosure, say, to their functional role and for what purpose, and does that
purpose include quality assessment. Are you going to ask for authorization for
There is an example included in here of data that would be collected, say,
for research project where you’ve gotten some identifying information,
name, date of birth, address. While address in this case is not necessary, it
can simply be removed, and then the remainder of the information at bottom is
used for research data.
The remaining identifying information, we go through an identifier
calculation, cryptography, be assigned an identification and the remaining
privacy protected risk management associated with anonymizing that payload gets
processed as well, and then at the end that data can be made available to a
Reasons for re-identification, and this should be included for audit
requirements specified by policy. Why would you re-identify, and these are
listed in our standards. You might want to verify or validate the data
integrity. You may be going back to check for suspected duplicate records. You
may be looking to enable requests for additional data. You might want to link
or supplement research for information variables. You might be looking for a
compliance audit, informing data subjects or their care provider awaiting
significant findings so that they can provide some follow-up care, or to
facilitate follow-up research, and that last one would be your public health
The identification requirements – there are a number of requirements
that I won’t go into. But there is a need to assure the health
consumer’s confidence. We need to specify what our re-identification
policies are in order to make the discloser of the source data comfortable.
We have a list of pseudonymization service standards for privacy policies.
We have not defined a standard policy, but we’re looking for at minimum
trustworthy services and what needs to be expressed in a policy in order to
enable these services and make them trustworthy.
We similarly have trustworthy practices that are minimally expected to be
sure that the underlying security is in place including things like this audit,
time-sensitive re-identification. This is specifically to accommodate the
health concerns. If a pseudonymization service is going to take three days to
re-identify, that will be unacceptable to a public health agent that needs to
act, say, within 10-15 minutes to have that source data. So that needs to be
specified and made clear to the subscriber.
International perspectives – this is not an exhaustive list, but I
noted there has been use in reference to this technology specifically to health
care in France, The Netherlands, Belgium, U.K., Canada, Australia and certainly
in our U.S. biosurveillance use case. And while I don’t have specifics,
the Japanese delegation was extremely active in here, and I believe it’s
in their files.
Some specific concerns about HIPAA – just with respect to enabling
this technology, within the de-identification, the privacy rule says that the
code is not derived from or related to information about an individual. The
concern with that statement is that pseudo-identifiers may be key encoded based
on those identifiable data, and we want to – I’ve had questions
saying is this compliant with HIPAA because it’s a cryptographic encoding
based on those identifiers, or is that going to be a problem.
And then cannot be translated to identify the individual, is stated in
there, and the concern is using a reversible key encoding approach, it might be
permitted that reversible identification is not permitted.
So if there is any way of clarifying that information, that would be very,
very helpful. And policy recommendations to clarify those HIPAA rules go out
for pseudonymized resources to be enabled. We need to establish policy for
secondary uses. We need risk assessment, risk mitigation expectations so that
we can appropriately define mitigations for them.
And then establishing a policy and minimum requirements for
MR. REYNOLDS: Before we open up for questions, could you go back to the
HIPAA concerns log one more time and take us through, starting with the second
bullet, take us through the subsets of the second bullet there, and let’s
make sure we get that.
MS. REED-FOURQUET: Okay. So the privacy rule indicates there are commits
and assignment and retention of a code for other record identification if that
code is not derived from or related to information about the individual. Now if
we use a technique that’s going to take your demographic information and
encrypt it using key encoding mechanism, is that or is that not derived from
that source information.
Now the intent behind that statement, I believe, was that you would not
have something like a look up table that once you identify your secret, you can
use it over and over. But the technology we’re talking about here
wouldn’t be subject to that risk.
Okay, the next one, it cannot be translated to identify the individual. So
if we’re allowing for re-identification, are we not translating back to
intentionally re-identify that individual, does that mean that we are not able
to use this under HIPAA.
DR. TANG: Unless Sue’s in the room – oh, Sue, I mean to me, both
of those are disallowed very explicitly.
DR.LOONSK: Can we just further specify the question because this is for
de-identified data in the context of, for example, for public release, just to
be clear. So there are other data uses specified in HIPAA that were not
MS. FARQUHAR: The first one probably is technically disallowed, whether or
not – I don’t think it was the intent. What we were looking for is
you can’t come up and scramble a social security number and claim that
you’ve come up with a re-identified.
MS. REED-FOURQUET: Exactly.
MS. FARQUHAR: The second point really is the question of who. I mean,
clearly we allow for re-identification. And so what we were looking for is, you
know, the code, the key has to be kept by the source of the information, the
source that synonymized it or de-identified it so that if the key goes with it
or if the ability – if the scrambling of the information that’s
derivative of the identity is so transparent that the recipient can translate
it, then that would not – I mean, the re-identification has to be done and
kept secret by the source. It can’t go with the data.
MR. REYNOLDS: Any comments from anybody on this? We will now officially
DR. TANG: I don’t think HIPAA said that it’s so obvious. I mean,
I think that qualifier’s not in there. So that’s why I think both two
is almost a derivative of one. You can’t have it where you can – that
was not in that particular language. Those are both disallowed.
MS. FARQUHAR: But it doesn’t, I mean, there aren’t – you
can’t have an anonymous key that goes with it that is identifiable only at
the source of the data.
MR. REYNOLDS: And all put together in this way.
DR. LOONSK: Can I – you said that’s not what they were talking
about, and was not an anonymous key part one of the pseudonymization
techniques. So that was one of the – there are at least two
pseudonymization techniques being described, one in which number one certainly
would be at issue because it is based on the name or medical record number,
another where it’s an anomymous key where there’s no – it’s
not a data manipulation of anything identifiable, but it’s just a sign
that the point of origin, and it sounds like that’s what Sue is indicating
may be allowable in this context.
DR. TANG: But I think their use – the whole reason for them doing
this, well, you will be able to link – you would be able to link
pseudonymized data from multiple sources to this same individual. That would
not be true based on what you just said.
DR. LOONSK: That is one function of pseudonymization. That’s not the
only function for which pseudonymization is sometimes referred to.
DR. TANG: Well, we just can’t get all of the benefits or the uses that
she talked about if you use the other.
MS. REED-FOURQUET: And I also want to point out, and maybe that changes
your interpretation of this first one. Typically, this would be a two-cap
encoding, so the source of the data would do one pass of encoding with their
key, and the third party service would do a second. And in doing that, you
would need to go through both parties essentially to re-identify. So it’s
that final one being based on an encrypted identifier that was generated from
identifying information be a problem.D
MR. REYNOLDS: Simon, Glen did you want to comment first?
MR. MARSHALL: I’d like to point out that this entire discussion over
the last five minutes has made Lori’s point. The lack of clarify in HIPAA
and the lack of clarity in the interpretations, I could go to ten lawyers and
get eleven different answers. No, and it doesn’t mean that any of them are
correct. So that the lack of clarity and one of her recommendations is to deal
with that lack of clarity and lack of specificity.
I’d also point out that the current recommendations in terms of those
18 variables were produced without records and adequate risk analysis, okay. So
somebody came up with these because they thought it sounded good. But the fact
is that Lori just gone through and shown you that they aren’t. So we need
to deal with those issues.
MR. REYNOLDS: Simon?
DR. COHN: Thank you very much, because I had been a little confused by what
was going on. Thank you for clarifying that the whole thing was to confuse us.
I’ll pass on this. I have a question on a completely different point.
Maybe others are still in the confused state I currently am. I don’t know
what question to ask at this point other than to say, yes, there needs to be
MR. REYNOLDS: John.
DR. LOONSK: The first item there where to follow up on Paul’s comment,
the potential to link data though it is pseudonymized has its attractions in
the quality context because it absolutely can go cross counters, go across
locations and do it.
It would be highly dependent on, though – and this is my question, on
a very meticulous specification of those data that in practice is probably not
found in most care situations. I mean, you’re using an algorithm to do a
consistent modification to data with the hope that you can relink it
afterwards. You’re highly dependent on the cleanliness of those data to
MS. REED-FOURQUET: Actually, the way that we specified in HITSP, it’s
leveraging the patient identity cross-reference manager which is already
assigning based on whatever underlying linking algorithm, a consistent
identifier. And so you’re not actually feeding input raw patient elements.
You’re feeding as input the link.
DR. OVERHAGE: But that’s a big assumption that that’s how they
all work. They don’t all work that way. There is not a consistent
identifier in all of them.
MS. REED-FOURQUET: Yes. There’s certainly architectural issues that
have to be addressed in any implementation of this.
MR. REYNOLDS: Okay, did you –
DR. LOONSK: I just wanted to add there’s one more concept that’s
out there that people talk about which is, and I think I may have heard it, but
it’s perturbing data to retain the value of the data for analytic purposes
thought swapping data attributes so that the data cannot be identified. I just
wanted to put on the table because it is talked about as well.
MR. REYNOLDS: Okay, we usually have a questioning period, but you
won’t be able to tell the difference in this one. So I’ve got a
question, and then Simon and Paul and then Mark. So you can see it continues.
You said two or three things, and I’d just like a quick answer on it,
and both of you did an excellent job. And we knew we were going to really focus
on this issue with questions.
You mentioned a trusted third party. As we come up with our definitions,
trusted third party a lot of times fits in the eyes of the beholder because
who’s exchanging the data. So as we continue to think about the consumer
and others when we use some of these other terms like trusted third party, that
may or may not translate, you know, to each of these audiences.
So as we come up with any of these other definitions and everything
we’re doing as we put our reports and everything together, the eyes of the
beholder is always going to be a key subject that we have.
Second, you mentioned that pseudonymization – I’ll learn a new
term – equals protection. Again, only depending upon how strong the chain
of – so I’m not sure I can completely buy into that right upfront
because the strength of the chain of the data being passed really decides
whether or not pseudonymization is in fact a protection.
MS. REED-FOURQUET: It can’t be the only protection.
MR. REYNOLDS: Good, good.
DR. LOONSK: And there could be a number of issues, one being how many data
elements from a re-identification standpoint are eliminated so that the ability
to re-identify from other data sources and pseudonymized data is not clear, or
the way in which the linker is held, and both those are potentially –
MS. REED-FOURQUET: Yes, and that’s why surrounding this there is so
much focus on policy, definition and specifying it, clarifying it and going
back and continuously doing the risk analysis on the information resource and
information gathering process.
MR. REYNOLDS: And at some point, I want to make sure that we as a
committee, those of us that aren’t necessarily in the public sector, would
understand what public health really is by some kind of a definition because we
– no, a lot of people talk around the table that if it’s for public
health, this is okay, or if it’s for public health, that’s okay. But
a lot of us don’t have that definition. So a lot of people that we would
be sending this to may not be able to come up with that definition just
quickly. I would like to add this.
MR. MARSHALL: There’s a definition in each state.
MR. REYNOLDS: Thank you. That makes it even clearer. Okay, I got Simon,
Paul, Mike, and then Mark, and then Paul will be on question time until August.
DR. STEINDEL: Yes, I’ve noticed the same thing as representative of
CDC about a lot of the confusing statements about public health, and we’ve
asked Dr. Leonard who’s coming on as the director of National, and we
intend, assuming that we haven’t vetted the talk yet. But one of the
intentions is to put some clarifying statements around the various different
uses of clinical data for public health.
DR. COHN: Gosh, you know, I actually just wanted to test my level of
understanding, and I’m actually embarrassed because I’m probably
going to fall all over myself and look at Susan to maybe keep me straight here.
And this is maybe what’s in my level of confusion. Now first of all, going
back to something like HIPAA which gives me some basis, I’ve heard of a
variety of – I mean, HIPAA describes curee techniques that relate to some
of the secondary uses we’re talking about but doesn’t get very
restrictive in terms of public health uses of data in terms of security. And
so, therefore, layering on of additional pseudonymization or whatever is more
along the lines of additional things that CDC may choose to do to provide
additional protections. That is correct, right.
Similarly, quality depending on how we decided to find this and whether or
not this all fits in with PPO and if indeed it does fit with PPO, there’s
really once again not the need to de-identify or pseudonymize or anything else
for that matter, but we may choose to do that.
So this is once again – and if I’m want here, correct me. So
we’re sort of talking about this one, but there’s really – the
fundamental issue is protecting the data, not necessarily going through
elaborate security mechanisms of one sort or another or arguing about whether
HIPAA de-identification includes various aspects of pseudonymization unless
we’re actually expecting to fully publish the full data set of everything
that’s being dealt with for quality. Am I correct so far?
Okay, so some of this is like over-specification potentially of potentially
what we need. So I just want to make sure I was understanding.
MR. MARSHALL: That’s actually very close to one of the points I was
making as well as Lori that unless you do adequate risk analysis up front, you
do run a severe risk of over-specifying and incurring costs that are
unreasonable for the situation you find yourself in.
To paraphrase that, never under-estimate the value of an existing locked
DR. COHN: I just wanted to make sure that I was fundamentally understanding
the routine. Now having said that —
MR. REYNOLDS: Can I add one thing?
DR. COHN: Sure, and then I’ll ask my other questions.
MR. REYNOLDS: And so you mentioned PPO which I had listed. We mentioned
public health, and then there’s other whether that’s research, or
what is that. If you would add that to what your premise was, then I agree.
DR. COHN: Which was the third one?
MR. REYNOLDS: Other, and it may include research. It may include – but
are there any other things where the state is involved because I agree with
everything you’ve said, but it leaves out some segment over here.
DR. COHN: Well, I hadn’t come up with a comprehensive list.
MR. REYNOLDS: And I’m not saying you did, but – then I think
we’re all in agreement here. But that’s another category that
we’ve got to think about.
DR. CARR: Well, that was my question also. In terms of research that’s
overseen by an IRB, an IRB doesn’t necessarily obligate this level of
de-identification. So, again, just scooping where this would apply, not
necessarily in research, not necessarily – not in PPO necessarily.
DR. COHN: And I guess, once again, just trying to distill a lot of overhead
and a lot of information, what I’m hearing and I guess I must have not
really appreciated this morning was obviously your comment that HITSP is
recommending as a good practice pseudonymization for it sounds like both
quality data that appears to be going around outside of the organization as
well as biosurveillance, is that correct?
MS. REED-FOURQUET: Let me qualify that. If you read the HITSP language, it
says where it is required by jurisdiction because it’s recognized that
there may be jurisdictions or other agreements in place that may clearly allow
identified data to be communicated.
DR. COHN: Okay, so I guess I overread your slide then.
MS. REED-FOURQUET: Yes, I am not advocating that HITSP be the way that
everything has to go.
DR. COHN: Okay. Now, having said all that, not that I’m any smarter,
only less confused, Glen, where does HASH and HASH 256 – SHA – where
does this fit in —
MR. MARSHALL: Well, what that really does is deal with integrity issues.
The data can be corrupted in a variety of ways, including electronic or
somebody mishandling a value or deliberate corruption.
In any case, what happens is a hash is nothing more than a numerical
algorithm applied to a pile of bytes that results in a stream of 256 bytes long
in this case that is unique. One byte of that pile of data changes the hash
value changes, okay. And what that really says is that if I have a hash value
and I have the purported original data, what I do is I rehash it and I compare
the rehash that I just calculated with the original hash, and if those two
values are equal, I know that the data that’s just been handed to me is in
fact absolutely the data that originated. So it’s really a mathematical
DR. COHN: I’m just listening to you, and I’m reflecting on
yesterday when one of our esteemed former members, Dr. Clem McDonald came in
and was talking about hash in relation to research data, and how they had been
MR. MARSHALL: Oh, that’s another overloaded term.
DR. COHN: How does one way hash relate to –
MR. MARSHALL: Well, 256 is a one-way hash. It basically, if you will,
it’s irreversible encryption which means you can from a hash you can never
recover the original data. All you can do is, given the original data, whether
it remains with integrity. Trust me, this is one of those things that security
geeks get and we – just pay me and I’ll do it for you.
MR. REYNOLDS: Okay, so we’ve got Paul, Mike, Mark and a break.
DR. TANG: So I’m going to attempt to simplify things again. So first
of all, thank you, presenters because it was very, very helpful, and I think we
all share the same goal. My bottom line partly goes off of what you asked and
what the two of you answered which is sort of we don’t — it’s a bit
of a technology look in search of a solution in search of a problem. So let me
say so, one, as far as this is concerned, I think Mike said it right, the law
is very clear. So I think basically, at least for one – we’re all
good about one. But the main thing is there are certain allowable things –
there are certain things that are allowable by HIPAA, and it does not require
hashing anything. And we should just figure out cleanly and clearly what that
is and do with appropriate protections.
There are certain that are just plain impermissible about which the law is
very clear. Let me just try to be – and I’m not trying to be –
this is sort of pseudoconformance. I mean, Lori’s saying yes. And
there’s no reason to pseudoconform to anything because when we define it,
public health, research, that we are allowed to use identifiable data and
protect it as such.
But there are some things we just aren’t allowed to do, and
pseudoconforming doesn’t accomplish that. So in some sense whether
it’s technically like this provision or similar limitations, and I still
go back to the payload, one of the things you mentioned is well, you know, in
order to get that information out, you would (1) have to code it so we would
even know it’s there to get it out. Well, that doesn’t apply to text
data. I mean, there is no automated way to go into text, figure out what’s
identifiable and cleanse it short of the human process of coding it so that the
machine can figure this out. That’s why I call it pseudoconformance –
it really is pseudo. And I think we really confused it. It was really a good
chance for us to understand the technology, at least better. There was a gleam
of hope that, wow, there was a standard way, a way of standardizing method, et
cetera, but it may have been a pseudo hope.
But does that help clarify a little bit? I think this is not necessarily
– and I’ll look to them to – it’s going to be the answer to
all of our problems, or the answers to the hard problems that they’re
trying to solve using this technology, is that fair?
MR. REYNOLDS: All right, Mark Rothstein – Paul, you kicked over
MR. ROTHSTEIN: Well, I may be totally off base here. But I think, Paul,
your attempt to oversimplify it oversimplified it by suggesting that the
methods that were discussed here are purely academic. There is one area where I
think this is very important, and that is in the research context because if
you wanted to research on a data set but did not want to go back and — privacy
rule and the common rule, then you have to supply some method of removing
identifiers. I will not use any of the different terms. That is where this
comes in. Correct?
DR. TANG: When you use the word removing identifiers, you basically were in
search of something that would do that reliably. This does not remove the
identifiers. It encases them, but it does not in the HIPAA sense remove the
MR. ROTHSTEIN: Okay, I am not qualified to pass on this particular method,
but what I am saying is some method—some process—of satisfying HIPAA
standards and common rule standards to make sure that the provisions of HIPAA,
which would mean requiring authorization, and the requirements of the common
rule which would require higher approval in forms, et cetera, et cetera,
etcetera do not kick in. Something has to be done, and that is where
this—not this design so much, but this analytical framework apply.
DR. OVERHAGE: I guess the case you are talking about is IRB or —
MR. ROTHSTEIN: Okay, so I want to do outcomes research on a whole data set,
right? If I am doing it on stuff that is identifiable in a sense, that is human
subjects research. However, if there are—if I can satisfy the separate
criteria that OHRP has published on what qualifies for anonymous research, then
I do not have to go and get—I do not have to re-contact these people. I do
not have to get their consent. I do not have to get IRB approval because now it
is anonymous research. The same thing in a slightly different way, applies
MR. REYNOLDS: What I would like to do is, that is on the table along with
the other comments. We need to continue discussion. I have Mike and then Marc
Overhage. We have some other people on the phone.
DR. FITZMAURICE: I want to followup with what Paul was saying and also with
John’s concept of perturbing the data. I can see that an element in the
payload may have to be changed. You take the identifiers, but you may have to
change part of the payload. I would like to have those observations flagged so
that I might delete them from the observations. My dataset might be used to
link hemoglobin A1C with a person’s weight, and I also have dietary
information and other information in there. I have got the heaviest man in the
world in my study. Everybody knows who he is. I do not really know who he is.
He is 973 pounds. Somebody could say, well I am going to take that and put him
into the 300-pound class, or I am going to divide outliers by two. It destroys
the robustness and the statistical strength of my analysis. I would be better
served by knowing that there is a flag and having it removed, or given
instructions to the data supplier to me to get that out of there because the
payload itself identifies who the person is. I just want to strengthen
Paul’s comment about the payload can also be used to identify somebody.
MR. REYNOLDS: Okay, Marc?
DR. OVERHAGE: Well since we only have a few minutes, I will ask something
easy. It goes to Marc’s and Paul’s question. We talked a lot about
here about pseudo-identifiers and one of the identifiers that I spent a lot of
time worrying about the last couple years is location and where the patient
lives or works. It is not very subject to this kind of the identification for
the kind of purposes. You cannot use it if you mess it up too much. Any
thoughts or work or comments on how that—because obviously each of these
identifiers have their own unique issues it seems.
MS. REED-FOURQUET: Right, so if you have two fields, one of them is your
rolled up zip code into the three digit versus the five or nine digit code
available to most of the users of that resource. You can also have an encrypted
version of the full zip code available to authorize access of the data. There
are other lower level techniques within the information resource to allow you
to use those data elements as needed.
MR. MARSHALL: I will give you a slightly different answer. We have been
talking in the last few questions about analysis of comparative risk or
offsetting risks. You have the risks to the person’s individual privacy,
and then you have the risk to the research value of the data. There is a trade
off that we are talking about.
One way to resolve that trade off is to go to a more common point where
both things are in fact protected. So, it may be appropriate to rather than
say, de-identify individual patient records to supply an entire completely
identified patient record set, but provide protections at the set level so that
you do things like encryption of the set or other things along that line. Then
what you have to have is an absolute trust agreement of some sort between the
parties that basically says that if you violate this, if you look what the
conditions are, you are going to go to jail or some really serious stuff.
Basically, what you are doing is you want to decrease the likelihood that the
risks—both the risk of re-identification or identification of the data
outside of a proper use context and the research value of the data are in fact
So, it could be that one of the recommendations is to heck with this
de-identification stuff protected at the dataset level because that seems to be
adequate, and it covers the risk of two competing stakeholders.
In other words, if you go through the risk analysis, you may come to that
conclusion, and that is a perfectly valid conclusion. In order to do it, it has
to be reflected in policy.
DR. FITZMAURICE: In the federal government, we have about nine statistical
agencies where that sort of thing happens. We happen to be sitting in one right
MR. MARSHALL: Oh, I just described your life, right?
DR. FITZMAURICE: The NCHS is such a statistical agency where things like
that can happen.
MR. REYNOLDS: Simon has a comment to kind of pull us all together, and then
we will break and we will be back at 3:10.
DR. COHN: Gosh, that is asking a lot for me to quickly put it all together.
John, did you raise your hand? Did you have a comment before I quickly close?
MR. LOONSK: Thank you, Simon. I feel like we have put our toe into some
technologies here and some approaches here, but that the committee overall has
not fully absorbed them. I do not think this problem is as simple as we may
have made it out to be though. If you look at public health for example, HIPAA
says—if you are looking at it from a regulatory perspective with existing
law and existing regulation, HIPAA says that public health should get the
minimum data necessary.
Now, would that include the names if they are not necessary in that
context? I think that is an open question. It is not quite as clear-cut. I
think that this is not necessarily just about meeting the letter of the
regulation either. This is about trust. These technologies bare more discussion
in the context of how we get to recommendations that help to insure that trust
So, there are aspects of these, which are very esoteric, and
almost—just as a personal opinion—completely impractical in terms of
their application in the short term in terms of if you look at existing health
care and how it lays out.
On the other hand there are some very simple things that are in here that
are very practical, just not having names bandied about where they do not need
to be. It is a very simply solution and is pseudonymisation. I think that this
is worthy of further discussion as we move forward because I think that part of
this is on the agenda for what this workgroup needs to grapple with.
DR. COHN: It is good that John talked because he said about 80 percent of
what I was going to say, which is that we talked a little earlier at the
beginning of the day about the issue of tools and technologies to help minimize
risk. I think we have heard now—Glen just reminded us about encryption,
but certainly pseudonymisation potentially whether it is required by HIPAA or
might just be a very good practice in relationship to some of the things that
we are pondering is really something we need to think about. These are those
issues of tools to minimize risks, which is sort of basically what John was
commenting on also.
With that, I agree with Harry. Let’s take a break.
MR. REYNOLDS: Okay, 3:10 because we have got three speakers for the next
MR. REYNOLDS: Okay, we are going to go ahead and start our next group so
that we can make sure we have plenty of time to cover this. We are going to go
in the order of Dr. Peterson and then Dr. Nugent should be joining us shortly
from the airport and then he will have to turn around immediately after
speaking and go back, and then we are going to have Jennifer Lundblad speak
DR. PETERSON: Just for clarification, did he fly in just to get a testimony
and then he is going to fly back again?
MR. REYNOLDS: No, he is on his way from the airport to here. Okay, Dr.
Peterson, we appreciate you joining us, and you have the floor.
DR. PETERSON: Well, thank you very much. I am going to start out by saying
I am sorry I am not able to be there directly and be able to speak with you in
person. I do appreciate the opportunity to be able to address to you over the
phone. I am going into a new area here that is provided to capture and use
data. I will talk to you briefly about Practice-Based Research Network. Someone
is going to be handling the slides there for me?
MR. REYNOLDS: Let her know whenever you want it switched, please.
DR. PETERSON: Since I was not able to be there, I thought I would put my
picture up there and you can kind of see who it is that is talking to you. I
will take the first slide then.
I wanted to start by beginning to make sure that everyone knows what a
Practice-Based Research Network was, and what I am referring to here are going
to be collaborative in this case of primary care physicians that are really
committed to performing research in their own clinical practice generally that
is of relevance to primary care. These are groups of usually experienced
primary care clinicians that are in a community setting. They principally take
care of patients, but they also have an involvement in clinical research. They
are generally organized in regional or national groups and they do multiple
studies within a single network.
The reason that I think that this has importance really is that if you
generally think of these networks of Primary Care Practice-Based Research
Network, sometimes called PBRN are basically the clinical laboratories that
primary care uses. It uses it to do clinical research, and it uses it for
dissemination of information. It is important because most visit in this
country are—most visits by people are to their community clinician and
mostly Primary Care Clinic. In fact, if we look at general medicine and
pediatrics, it is family medicine. There are actually more visits to those
primary care physicians than there are in all of the other specialties
MR. REYNOLDS: What slide are you on now?
DR. PETERSON: That would be Distribution of Office Visits. So the next
slide would be the federation then. The Federation of Practice-Based Networks
is really a national organization of these networks. It was established in
1998. It has currently about 8,500 physicians in that group. Fifty-seven
different regional or national networks, and in this case I want to say that we
are talking about primary care not within a specific discipline. The group that
have these Practice-Based Research Networks running include the American
Academy of Family Physicians, American Academy of Pediatrics, American College
of General Internal Medicine, American Academy of Nurse Practitioners—I
think they have two or three national networks actually. So, there are really
all of the areas of primary care. The patient population served by that group
is a little bit over 16 million with over 2,700 participating clinics
I do have on this slide that quote from 1999. I think Larry was a part of
that. That is the most promising infrastructure then was related to the
Practice-Based Research Network.
Briefly, again making familiar with what is going on, these are the
locations of the headquarters.
MR. REYNOLDS: Dr Peterson? You are cutting in and out. I do not know if you
are using a speakerphone or if you could get maybe closer to what you are
using? We are starting to lose you in and out of times.
DR. PETERSON: Oh, sorry about that. I am not on a speakerphone, but is this
MR. REYNOLDS: Yes, thank you.
DR. PETERSON: Thank you very much for telling me. I was just going on to
the slide with the location of the headquarters for Practice-Based Networks.
Again, what I was saying here is that these are the headquarters of the sites
of the regional and national networks. In fact, the area covered across the
country include the entire country, but if we just look at the headquarters,
then it looks like they are pretty much reflecting the distribution of people.
So, what kind of data are we capturing and why are we capturing information
now? Really, the Practice-Based Research Network is a little different than
your usual clinical research network in the sense that we are really beginning
to look at data capture or information that is delivered within the setting of
usual care. It is actually because—although primary care tends to have
both of the Practice-Based Networks, we now see these kinds of networks of
providers in dentistry, neurology, cardiology, oncology has had them for some
We are also seeing specialty based practice-based research. The advantage
really is that our patients that are part of that study then, regardless of the
area that we are looking at, really do demographically resemble the general
population much more than they would within in a local research study that
occurs within an academic center. We also have accessibility to really a widely
diverse population because of the very wide distribution.
There is an advantage of having diverse locations for these and involvement
of possible delivery systems. You will sometimes see networks that are a single
delivery system, and that is not what we are talking about. So, we are not
talking here about a managed care organization or a large organization of
clinics that tend to deliver care in a very uniform way.
The data that we collect from Practice-Based Networks cover a very wide
variety of areas. The information that is delivered here is really—of
course we often talk about the translation of research into practice. That
translation is really from the perspective of a community clinician—is
really both quality. That is kind of bringing their practice up to date as well
as dissemination of new information into the community. It is a place for doing
both effectiveness and efficacy research. I think that we tend to see more
effectiveness work now being done because the sites are so generalizable. It is
also—you will hear more about the information that is collected from the
perspective of quality. Some of those groups are regulatory bodies. The PBRNs
are really provider generated performance measures. We have a number of tools
that are used to help providers evaluate their own practice. There are studies
in safety, of course, as we would suspect. There is quite a bit of work in
health disparities. There are clinical trials going on and community based
participatory research where the investigators are involving the community that
they are caring for in the identification of both the research question and of
the process that is going on in the research.
As I begin to look at the needs of this community, of the practice based
research network community, I would really begin to emphasize the
interoperability perspective. I know that you have had some tremendous experts
speak to you earlier about system requirements. He may have had something like
this up. Really, what we are talking about is the ability of these systems to
exchange information and predictably use that information that has been
exchanged. You will see that syntactic interoperability is what we refer to as
the ability to actually read information that is exported. The next build would
be semantic interoperability is really the ability for us to understand the
words that are used. In order for networks or primary care clinicians or
specialty clinicians to speak together, we really need to be able to exchange
information like this. I would need to be able to both have the ability to read
and the ability to understand each other’s information because it is only
in larger groups that the information that is provided becomes a value.
Currently, the National Institutes of Health is funded in electronic
architecture for Primary Care Practice-Based Research Networks. The purpose of
what NIH is funding is the ability to facilitate clinical research in primary
care practices anywhere in the country and to help that translation, the rapid
integration of new research findings into primary care. This is really a
trans-institute involvement with funding coming at this time from several of
the institutes, National Health Line of Blood, National Center for Research
Resources, National Institute for Diabetes Digestive and Kidney Disease,
National Cancer Institute, as well as organizations such as AHRQ.
I can show you a little bit about what that looks like, but I am going to
put up this fairly simple cartoon that—this interface that is being built
on what we call our local NIH gateway. It is a dataset that is at the clinic.
We can take information from laboratory and billing. We put that into a
registry and then export that using what we call a continuity of care record or
a CCR. The next build, this works very well and takes in consideration as we
move forward. An Electronic Health Record does the same thing, and an
Electronic Health Record is able to export this kind of thing at CCR. That is
able to populate that gateway. That gateway then does a number of things. Next
build shows that we are really able to generate a number of quality improvement
reports from multiple disease registries, clinical tools, from that gateway
while at the same time we are really able to see that that information is able
then to be shared in HIPAA compliant ways through web services with—in
this case it shows the electronic research network. It is really another portal
I wanted to show you what that looked like. That gateway is a concept
really. I have summarized it here because it is really made up of a large
number of pieces. I guess the importance of this—for those of you who are
technically savvy, you can take a look at it and see what it looks like. The
importance really is that it is based on the new Internet to web services group
through Globin. It is really a creation of a grid. That is, we are putting this
information at the local clinic available on a secure grid so that we do not
centralize information, centralize the querying service, and then queries can
go out and run locally at servers that are located across the country. Each
clinic then maintains its own data, has it run locally, and only the answer
then is back out of the clinic. No data is ever moved or removed from the
In order for that to provide an advantage for the local clinician, then
there are other benefits that we barter really. The local clinician has an
interest in research, but of course the main drive is clinical care. There are
a number of different tools that we also provide. This is one of the tools. It
looks like this. This is a patient specific position reminder. That is, it is a
patient specific profile of one with diabetes. It shows the summary of some of
the patient’s demographic information. It shows the graphic history of
previous test results. On the bottom, it has a little alert area as we see in
some of the templates in Electronic Health Records that really identify—as
of this time, this patient needs the following. Those are the kinds of
summarizes that are provided. Within an Electronic Health Record—when a
person gets an Electronic Health Record, then those may be greater or a lesser
value depending on the EHR. We expected not to have much of—we thought
that once they got the EHR they would be able to do that, but that was not the
There are also a number of other tools that are applied that are clinically
useable. In this case, it is a name put onto the Dusty Roads Clinic. It is
really a quality improvement device. In this case, it is focused on diabetes
listing patients and their last A1C—kind of the summary of the clinical
results that might be useful to a clinician. As we do that, then there is
additional clinical buy in to some of what takes part in this NIH gateway.
So that is kind of the way we bartered entry into some of the clinics. I
think that clinics are interested in the clinical tools. There is a great deal
of information that is coming out. I think that we are just at the beginning of
being able to do this. By the way, we will be working with some of the people
that will be talking to you a little bit later. The two things that I suggest
that would help that for us makes the biggest difference that facilitates our
use of data within a clinic—I may have said this over and again, but I am
going to say that one of the most important pieces for us is really the
establishment of a standard data structure and the establishment of the
standard data element.
The standard data structure that we have used and that I think is being
used predominantly by groups of clinicians across the country—not
necessarily across the country, but certainly in clinician care is the
continuity of care records. The continuity of care record is a valid standard
for standard development organization of ASTM, one of the largest standard
development organizations of the world. It is a standardized XML strength. It
is writ in what we call W3C, which is standard XML. It is openly readable. We
can identify—you do not need special expertise to get your information in
and out of this. That tends to be of value to those that are sharing. The next
build would show that continuity of care document, which is a compatible
document written by HL7, mostly compatible. It is a CDA document, architecture
document, a little bit more like an envelope that we can put information into.
Still, our data structure that we would encourage—I prefer the CCR. We
could take the CCD, but we need a data structure. We need data elements too. I
think that across the world as we enter with this National Health Services in
England and across the United States.
SNOWMED is probably the single most comprehensive global nomenclature that
we would use. It would be of value for us to have additional—facilitate
how I used it if we are going to include it. Now, those are some of the ideas
that it seems to me that I wanted to tell you about that would facilitate our
use of data. Having not been at the presentations at the previous day and this
morning, I do not know if I have answered the question that you would have, so
I thought I would open it up and say, if there are any questions, I would be
happy to answer.
MR. REYNOLDS: What we are going to do is we have the two other panelists. I
do not know if you are available to stay on the phone or not?
DR. PETERSON: Sure.
MR. REYNOLDS: Yes, we will go through the other two panelists and we will
open it to questions from the floor. Okay? Okay, thank you. I am going to go
ahead and have you go.
MS. LUNDBLAD: Hi, I am Jennifer Lundblad. I am President and CEO at Stratis
Health. We are a Minnesota based quality improvement organization that
collaborates with both providers and consumers to improve health care. When
describing the opportunities and uses of health data and health information
exchange, quality is often mentioned but it is pretty typically lower on the
list. So, I am really pleased that the committee has recognized the important
opportunities that health data and health data exchange present related to
quality and quality improvement, and so I am very glad to be here.
I hope to share from our perspective both generally the work that we do as
an external change agent. We are really trying to help translate research into
practice, and then I am going to draw on a couple of specific examples, both
Minnesota Community Measurement, that you heard Kevin Peterson mention just a
moment ago and the Quality Performance Improvement Alliance or QPIA.
We have been working with VHA Upper Midwest. The Medicare QIO organizations
in the five states of North and South Dakota, Minnesota, Illinois, and
Wisconsin. Over the past year and a half on an initiative where 40 hospitals in
our region who are already very high performing hospitals have stepped up at
the leadership level to say they want to achieve 100 percent performance. They
have a whole variety of policy and program and quality related reasons for
wanting to do this. But because this is a data driven initiative, I think it is
going to illustrate examples of responses to many of the questions that I hope
will be helpful to all of you.
So, if you can see here, I think these are often referred to as the flyover
states, but I think there is a lot of good work going on. First, you posed
questions to some of the panelists that were around setting the context in
health information exchange, in particular about enablers or restrictions or
policies. So, I want to comment a little on what is going on in Minnesota
because I think that will help you understand as I talk later about the
specific quality improvement initiatives.
One of the key remaining barriers to fully
gaining the benefits of electronic data is the lack of implementation of
standards for data transmission and sharing across providers using different
vendors—vendor systems simply do not talk to each other. This is
particularly problematic in a state like Minnesota, which is a strong
integrated health system state. We have situations where the best vendor
product choice for a group of hospitals that are part of a health system is in
fact a different choice than the primary care or the specialty clinics that are
part of that same systems. So, they are forced to choose kind of the least
common denominator to be able to exchange data, but not meet their needs and
requirements, or meet their needs and requirements and not be able to exchange
data. So, that is true in our integrated health system environment, and then
you think of the QPIA example that I just gave. Five states trying to exchange
across the state borders, and you know that we have real problems around health
In addition from our perspective in supporting quality improvement across
the continuum of care, which really includes hospitals, clinics, nursing homes,
home health, and some of the long term care agencies that are working in our
state. We really hope and expect that the next generation of quality
performance measures will include patient-centered measures across the
continuum of care that a patient experiences for any given episode or for any
given conditions. So, it will include kind of the silo based measures and
quality improvement work that we are currently doing, but it will also reflect
the need for multi-providers across the continuum of care that bring
accountability for the entire patient experience and allow identification of
improvement opportunities across this continuum. Given the current uses of
Electronic Health Records and the current capabilities of Health Information
Exchange, achieving this will be no small task.
We did hear Kevin also mention the continuity of care record, and we think
there is some hope and opportunity in that as a particular tool for looking
across the continuum.
You also asked in advance about the environment around current laws and
whether they provide sufficient privacy and security protection for
identifiable health information to be used in quality improvement. HIPAA is
really the floor for data privacy and security regulation across the country. I
know you have heard from many experts around HIPAA. Then there are kind of a
patch work as I think about it of state based regulations that come into play
above and beyond HIPAA that create a pretty fractured and confusing environment
for health care providers.
In Minnesota, we have a very strong tradition of very strong patient
consent laws. In our state, the combination of HIPAA and our state laws do give
sufficient privacy and security protections. Even with, or maybe because of the
strong patient privacy and consent laws, there are barriers that are perceived
in the ways that Minnesota laws and requirements have impeded the electronic
exchange of information: undefined terms, ambiguous concepts, difficulties in
determining application of the law as to an electronic environment, and really
the need to update where we are around patient consent.
As a result, that is what this next slide is and the attachment that you
have with my materials. We passed in this legislative session in May, the 2007
Health Record Act which seeks to provide solutions to many of the barriers that
have been perceived by providers. It includes everything from precision and
some of the definition. Things like, what does a health record mean? What does
medical emergency mean? To looking at long-term care situations where there is
perhaps someone who is not physically or mentally capable of priding consent.
How do we address those situations? There is also a framework in there for
record locator services to support data exchange.
So, this Health Record Act resulted from the Minnesota Privacy and Security
Workgroup, which under funding from RTI was a broad multi-stakeholder group
that was able to work through a lot of these barriers and develop the solutions
that have now made it into this new law. I know you have heard a bit more in
detail at the national level about the RTI project yesterday.
These are important advances in health data privacy and security to support
health information exchange in Minnesota. If you think about the QPIA, the five
state initiatives that I just described to you, you can begin to see how this
patchwork of state-based initiatives can impede quality improvement. What we
pass in Minnesota is not the same as what is in any of our neighboring states
so that the health and systems in Minnesota have to deal with that as their
systems cross the border. Then, this QPIA initiative, the Quality Performance
Improvement Alliance, which is trying to do collaborative improvement work
across five states really faces a complex crosswalk of regulations and laws
that they need to understand in order to be able to accomplish their
collaborative improvement work.
You also asked as part of content setting about the opportunities to use
the National Health Information Network and some of the national
infrastructure, and I just want to comment briefly on that. The challenge it
seems to me, from a quality improvement organization perspective in this arena
is striking the balance between national infrastructure and policies and local
innovation and control and needs. We want and need a national network and the
associated policies, but we recognize that innovation tends to arise from local
means and local opportunities. So, it strikes that there are parallels to the
world I live in which is the national quality measurement world. Those of us
who are immersed in that world are all trying to strike a similar balance
between broad use of national consensus based measures, for example those that
come from the National Quality Forum and that are reported nationally, for
example through the Hospital Quality Alliance, while not inhibiting and finding
ways to in fact encourage innovation and new measures that develop up from
local providers and local initiatives. Health care delivery is local, having at
its core the relationship between the physician and the patient, and we strive
to measure quality in consistent and comparable ways, but we also want to
encourage that innovation in ways to assess quality. So, I think there are some
parallels as we think about quality measurements and as we think about Health
Information Networks at the national versus local and where that balance is.
You then asked about the specifics and you heard Kevin just speak to some
of the ways that are the sources and uses for data. I want to do a parallel
track here as well. I have listed for you in the slide some of the ways we, as
an external change agent, draw on sources of data for use and quality
improvement. Chart abstraction, of course the benefit is that it is most
details at the bedside data about patient care. The challenge is that it is
retrospective, although the opportunities for Electronic Health Records to
improve that kind of realtime data collection and analysis are real and in
front of us right now. There are some concerns that Electronic Health Records
though will diminish the details and the nuances of individual patient care.
There are opportunities as well in HER and chart abstraction by building
guidelines in as logic models and forcing function, quality can be effected at
the bedside right during care delivery.
For electronic registries, the second item I have on my list, the benefit
is really the ability to view patient populations by conditions, for example
all of you diabetic patients or by treatment, but the challenge is that
electronic registries are not accessible by all of the providers who would find
the information useful. Internal repositories are an interesting source of
quality data, and right now—you heard me describe the very integrated
health system environment in Minnesota—most of our large health systems
are really challenged by getting data back out of their internal repositories.
They have made the move over the past few years to implement their Electronic
Health Records. We have great levels of adoption in our state, particularly by
the largest provider groups, and as we work with them, what they feel is data
rich and information poor. They cannot get the information out that is going to
help them look at their population. It is going to help improve their care
delivery. So, they have got a lot of data there, and it is not yet useful for
them in many ways for decision-making and for quality management and for
External warehouses, the fourth item on my list, are data that are part of
the local, regional, or national repositories are really useful for
benchmarking purposes, but the data and reports in them often have significant
timelines which means they are not quite as useful as they might otherwise be.
Lastly, I have included as a source of data for quality, the administrative
like claims data, patient satisfaction data, and other kind of survey data.
I want to share with you a case example going
back to the QPIA project, again, more than 40 hospitals across five states.
Hospitals that participate in this initiative told us early on that one of the
most useful things for them, they are all already high performers. One of the
most useful things was going to be able to see the data for all of their peers
in the topics that we are working on: heart failure, AMI, pneumonia, and
surgical care. They all already collect that. It already gets submitted through
their vendor to the National Data Repository. We, as the QIOs have access to
that repository and can bring that data back and have them be able to take a
look at that. It seems like it should be a fairly straightforward process, but
in fact it took us nine grueling months to work through the data policies that
exist right now that I think are very outmoded and outdated that allow that
kind of data sharing to happen. We had to go through data use agreements from
each of the hospitals. We had to go through data sharing agreements between
each of the five QIOs involved. You can imagine the spider web of paper that
was crossing paths, and then each of our organizations with VHA to be able to
share that. CMS was really supportive. They thought this was a great initiative
and a great project. It took us about a month of those nine months to actually
execute the agreements. It took us eight months for us to have clear
understanding with CMS about how that data could be shared. So again, CMS
thought it was a great project, wanted to be very supportive, but it took
us—I still feel a little bruised through the whole process of trying to
get through all of that. We have gotten through that, and it has been
enormously valuable for those hospitals. That is an example of some of the
barriers around data exchange and using those external repositories.
You also asked about not only sources, but uses of data for quality
improvement. I have articulated again the most common examples that we come
across in my organization as an external change agent. Internal quality
improvement and patient safety, comparing your own performance over time or
comparing one’s performance to those of the peers whether that is in a
state or region or another breakdown. These comparisons can lead to
identification and opportunities for improvement.
Peer review, which I think often does not get mentioned and should in the
quality improvement realm, which is using individual cases to understand
sentinel events or near misses and then encouraging peer learning specifically
between and amongst physicians. Public and community health, local, regional or
Pay for performance, I think the next iteration of transparency is really
using the quality performance data by health plans by employers and by state
and federal governments that pay differentially for quality, whether that is
for reporting or achieving certain outcomes. Then research,
contributing—quality data can contribute to the research and evidence base
that we all should be using as we are doing quality improvement work.
Here, I want to share with you another case example from Minnesota. Again,
Kevin mentioned it in his comments as well. That is Minnesota Community
Measurement. This is a new organization that was formed a few years ago when in
our state we looked at the HEDIS data that is collected by health plans about
the care that is delivered to their numbers and is reported in the Health Plan
Accreditation Process. All of the work that is undertaken, all of the chart
abstraction and administrative data that goes into creating HEDIS was then
reported at the health plan level. That does a medical group or clinic no good
in terms of its ability to use that data for performance improvement and
The Minnesota Community Measurement had its origin in attempting to take
the HEDIS and attribute it to specific medical groups so that reports could be
generated, first internally and now in a very public way about the performance
of medical groups. We know that is an area that nationally has lagged behind
where hospital public reporting, nursing home public reporting, and home health
public reporting has been. So here, in Minnesota, we are making an attempt to
say that we want to report at the medical group level, and we are working our
toward reporting at the end of an individual clinic level and be able to have
that kind of data that comprehensively tells us how we are doing at those
levels using data that has already been collected and just attributed at the
specific group level.
Now, the Minnesota Community Measurement is piloting direct data submission
now that we have so many providers in Minnesota who are using Electronic Health
Records. So, we think it will be even a more rich and a more detailed data set
that will be reporting at the medical group level.
I have given you the website there. You can see how those data are reported
out of the medical groups in Minnesota. Again, this is ambulatory care measure
at this point. I also will let you know that Minnesota Community Measurement
was selected last year to be one of the six national pilot sites for the Better
Quality Information for Medicare Beneficiaries—BQI project. So Minnesota
is one of the six sites that is getting Medicare data, merging that with the
commercial data and the state public programs data to for the first time bring
all of those data sources together and report at a medical group, at a clinic
site, and potentially at an individual physician level. So, for the first time
will be that comprehensive all payer source of data reporting in each of these
six pilot sites.
These six pilot sites are the precursors for what
you are probably all familiar with in the value driven healthcare
initiative from the Department of Health and Human Service, the value exchanges
that are currently being promoted by AHRQ. I believe the public comment period
just ended last week on the description on the value exchanges. That is what
the BQI projects, when they roll out and move beyond pilots will evolve into
these national value exchanges.
What we learned from the results in Minnesota is that we started reporting,
but with a focus on diabetes in the community measurement project, and moved to
reporting a composite measure five different elements of care and whether the
patients were in line with the result for their and for their test results in
each of those five areas. The practice according to medical guidelines in this
composite diabetes measure increased from four percent in 2004, which was the
first year of the public reporting of the data to ten percent in 2006. That
still seems really low to the general public in terms of whether we are giving
optimal diabetes care or not. That is a remarkable improvement. Many of the
groups have been able to achieve very high improvement, and so we are trying to
learn from what those experiences are as we continue to improve quality.
As I step back from all of this, I think it is important to spend a minute
talking about transparency since that is a world that Stratis Health is
immersed in and is one of the strongest uses of health quality data around
public reporting and transparence and ask kind of three questions about whether
it is achieving the results that we intend.
The first goal of public reporting is to have it be affective at bringing
the attention of health care leaders to quality and patient safety and to
driving improvement. I give this one a resounding yes. I think very much the
move for transparency has accomplished this goal.
The second goal that is often talked about in transparency is to help
consumers be more informed, decision makers, and activated patients. I would
say we are not there yet on this one. I think we have a lot more work to do and
research to undertake to know how to share data meaningfully with consumers so
that they can use it to be informed decision makers. I think we have a ways to
go with this one.
A third goal is the biggest goal of all, is transparency affectively
helping us get where we want to be as a system whether you define that as
better quality and safety, whether you define that as value, whether you define
that as the six aims from the IOM, I think it is too early to tell. I think we
do not know what those results will be. Again, I hope Minnesota can continue to
be a leader in this.
I want to leave you with two additional pieces that I think an organization
like mine my uniquely bring to this committee as you consider the uses of data
for quality purposes. First is the need for clarity and consistency in an
electronic record environment as to what constitutes the legal medical record.
We know that for external peer review for utilization management, for
litigation, we request for the chart as part of those activities. When we are
an electronic environment, we are seeing as the QIO in Minnesota real
challenges when we request a record from an entity that is using an electronic
medical record what constitutes that legal record. What is going to fully
describe the care that is going to allow us as a quality improvement
organization, but you can imagine other uses around utilization review and
perhaps litigation. What screens do you pull? What data do you pull? How do you
know what is the right information to include? I think we are lacking in
definitions and clarity around what constitutes the legal record in an
Secondly, a very different topic, Minnesota is a very rural state. My
organization has done a lot of national work in rural health improvement. Rural
health can be a model and a leader in many areas, but there are particular
challenges when it comes to uses of data and particularly when it comes to
transparency. Small provider organizations often do not have the capital and
resources to adopt technologies as quickly as their larger urban and suburban
counter parts, and I do not have an adequate number of cases to report that
does not explicitly or implicitly identify patients. Yet rural providers are
committed to quality. They are committed to patient safety and transparency. I
think the last thing we want to end up in this country is a two-tiered system.
I think we very much have to pay attention as we consider the uses of the data
for the rural environment.
I thank you all for your time, and I look forward to opportunities as Kevin
said to answer questions when we are all done.
MR. REYNOLDS: Thank you. Our third presenter is in the neighborhood
somewhere. It took us all a while to get here in the last few days. I would
like to go ahead and open the floor. Steve wants to ask a question, and then we
will keep asking questions.
DR. STEINDEL: Actually this is more of a comment than a question, and this
is to advise you that it has just been passed as a national standard, but the
HL7 has a functional model for the Electronic Health Record. There is a
committee that is defining the medical record that is being chaired by some
people from AEMA(?) that is now undergoing final scrutiny to be put up for
ballot as a profile.
MS. LUNDBLAD: That is terrific news because we have really been challenged
by that. Do you know when that happens?
DR. STEINDEL: No, I actually do not because I have not been paying very
close attention to it, but it will actually innumerate those portion of the
functional model that will be balloted as a legal medical record. Once it
passes, it will be published as a standard.
MS. LUNDBLAD: Great.
DR. CARR: Well this is a question for both speakers, but just great work
and so clearly presented and very helpful. What makes something like this
happen? There are a lot of obstacles that you encountered and yet somehow there
is an alignment of motivation and incentives. What do you attribute that to?
MS. LUNDBLAD: I am seeing if Kevin is going to jump in first. I do think
there is something to be said for that the way that the health system is
structured in Minnesota. I think it is not any one factor. I wish I could say
it was one simple thing and it is replicable and why cannot we all just do
this? Unfortunately it is not that simple. I do think the fact that we have
such an integrated health system state means that the largest players that are
trying to influence policy and trying to take action in our state are looking
fairly broadly across the care continuum because they are paying attention to
all parts of their organization, particularly hospital and ambulatory care. Our
integrated health systems are truly integrated and also reflect long-term care.
I think that is the context in which we do our work because of how the systems
have grown up in Minnesota.
Secondly, we are very fortunate to have ICSI, Institute for Clinical
Systems Improvement, which over the past fifteen years since they were formed
has really given Minnesota physicians and other clinicians the opportunity to
come together and shape by consensus what guidelines are for care delivery in
Minnesota. It is not that those replaced what comes from any of the other
consensus building bodies or guideline building bodies, but they reflect in
Minnesota the standards that we want to hold ourselves to, and because they are
very good at engaging physicians in that process, we compliment the integrated
health systems have good rounding in agreement in the guidelines and standards
Again, I am not sure what to trace it back to, but Minnesota is a very
collaborative improvement environment. It is also very competitive, but there
are certain topics and certain areas where most provider organizations agree to
set their competition at the door whether that is around agreeing to protocols
for surgical site marking or agreeing to immunization practices and common
messaging. Whatever it might be, there are enough topics around quality and
patient safety that they say that is not what we are going to compete on. We
are going to compete on all of the other things and all of the other services
but on quality and patient safety, we are going to take that as kind of a
common good or a public good and work together on those things. I quite
honestly cannot trace for you how we have gotten to that point. That is the
environment in many instances in which we are trying to do the quality
improvement work that we do.
DR. PETERSON: Jennifer, I would like to answer, and maybe my answer will be
a little more simplistic too. I think we cannot diminish the importance of the
system in Minnesota that has been really pushing this. I think that people have
always tried to do better. I have to say that part of this change in the
landscape seems to me is also because of very simple introductions of
substantially more electronic. You know we now have computers that can store
enough information. We have them that can process fast enough. I think in many
ways what we are seeing then is not so much the motivation of people, but the
ability to do it technically and technologically and I think that pushes some
of these changes.
MR. REYNOLDS: Paul, I think you had a question?
DR. TANG: I was just going to say something. Minnesotans are nice people,
and they play together well. It sort of–
DR. CARR: Is this about Boston?
MR. REYNOLDS: Simon?
DR. COHN: First of all, my congratulations. I think these are two very good
examples of work. Jennifer, I want to dig in a little further into—you
were talking about high levels about quality improvement. You are the first BQI
that we will be having testimony from. I am anticipating in July that we will
be getting additional testimony. In some ways, you represent sort of an example
of I think what the vision of transparency and the quality use case is really
sort of pondering this idea of 360 degree evaluation of providers both hospital
and the physicians development and somehow creation of quality metrics on a
variety of levels. I think that some of the implicit data stewardship
perspective, which is something that we are going to need to hear more
about—we heard some previously and now and shall be digging into this one.
We talked earlier and you were here when we talked about all of these
interesting security practices and all of that. Can you walk me through a very
basic and maybe real level about how you relate to your physician and hospital
environments in terms of how all of this works. Let me just give you a couple
examples of what I am talking about. Do you sort of publish things that you
want, and then you get things back from the practices like 78 percent? Do you
get the results back or are people actually submitting data to the BQI to
enable you to sort of put all of this together? Is it personally identifiable
information? Is it pseudonymised information? Maybe you can explain to me how
this works in the real world environment.
MS. LUNDBLAD: I will try to answer all of those things. As I said,
Minnesota Community Measurement before it was a BQI site was first taking chart
abstracted data that was used originally for reporting HEDIS measures at the
health plan level and they developed an attribution methodology to then attach
those results to medical groups. So it was using chart-abstracted data that was
done for the purposes of HEDIS and then attributing those results to the
medical group. Those were the data that were originally reported first
internally and then publicly and have been available now for the past three or
four years in a public way. It is ambulatory only, not hospital. It is measures
of things like diabetes care, asthma, cardiovascular disease, depression from
your typical slate of outpatient measures, and it is reported at the group
level. It is not at the clinic site level and not at the physician level
because that is what the HEDIS data would allow to happen. Now that more and
more of the clinic sites in Minnesota have adopted and are using Electronic
Health Records, Minnesota Community Measurement are piloting a direct data
submission process so that the HEDIS data can still sit there as the base. If a
clinic site or a clinic system had electronic data for those same measures,
they could submit that and that data would then supersede what was there as the
HEDIS. We are going to be doing some studies to see how does that line up? If
you have the HEDIS derived data and you have the Electronic Health Record
direct submitted by provider data, are we going to get the same results, and if
we do not where are there going to be differences? That is in its pilot stage
That is what has been happening, and again, it has been at the group level
soon to be at the clinic site level, and we have some real debates about
whether the physician level is appropriate or not. Many people believe that
most of the practices that we are describing with these measures relate as much
to what the individual physician is doing as to what the clinic systems and
processes are so that it is the site level that is the appropriate for public
reporting individual physicians for internal improvement purposes. That is a
pretty health debate we can have from this one. So that is what is going on.
Now we have the overlay of being selected as BQI site. Now, in addition, we
have the whole Medicare Part B claims data and some chart abstracted Medicare
data that will go through that same attribution process first at the group and
then at the site level to add Medicare data to be the comprehensive reporting
of what at this point has been commercial and state programs data. Does that
DR. PETERSON: Could I try to answer that too from a different perspective?
I think from the perspective of the group being measured, I think that there
are a couple of things that we have to be careful of. One of them is to the
extent that some measures reflect the patient population and not necessarily
the clinical practice. Jennifer, you mentioned the diabetes optimal care. I
will suggest that one. One of the five measures in the optimal care of diabetes
is smoking. Well, if the patient smokes then that is a point against that. It
does not make the optimal care measure. To the extent that the population then
varies in their rate of smoking between different populations, then some of
those characteristics can be reinforcing. A practice that is in an area that is
very poor and underserved may have problems with people affording their
medicine and not having very good control. What we then may begin to do is to
tie that to a last reimbursement so that a group that is in an underserved area
that the measure is reflecting the underserved and they become underpaid as
well. I think we have to be careful of that.
I think one of the other concerns that shows up is that when you define
what you want, then you will get what you ask for. That seems to be what we
want to do, but I would encourage that it not always—sometimes you get too
much of that. One of the examples that come to mind is the contract with the
National Health Service in the UK that came in last March. There were a few
measures in depression. If you do not measure depression, then you do not get
depression care. That is, when a person—if we measure what we feel is
quality, what will happen is that our physicians are going to focus on making
those measures. What we do not measure, they are not going to focus on. Be
aware that if we are not going to measure it, it might not get done. I
encourage just to measure things that are important.
DR. COHN: Okay. Jennifer, I did just want to go down just a little more in
specificity. I understand all that you are saying. When you talk about HEDIS,
you either get the HEDIS results or you actually start looking at the records.
I was just trying to get a sense of—now you are getting all of the claims
data on top of it being sent to. I was just wondering if there was anything
that you were doing? Are you getting patient identifiers and all of that on all
of the claims? You probably do not get it on the HEDIS information, probably
just getting the percentages coming out of the clinics or from the health
plans, but what parts of this are really patient specific data that you are
looking through? Is there anything going on to protect or make anonymous that
day prior to you actually looking at it? Is there anything going on there and
any concerns or otherwise?
MS. LUNDBLAD: Yes, I think there are a lot of concerns. The Minnesota
Community Measurement has just formed a quality audit committee and is trying
to address more and more now that they are deriving data from more and more
data sources what those needs are. I would say we are still very early on in
that. To date, we have really tried to not have identifiable data be what is
coming in to do that attribution for the medical group level, but as more and
more data sources are there that is going to happen. I think we are just in the
early stages of figuring that out.
MR. REYNOLDS: Okay, I would like to welcome William Nugent. You have that
look on your face of travel that all of us have been through. We are sorry
about that, and we look forward to your comments.
DR. NUGENT: I appreciate the opportunity to visit you. I have been asked to
come and talk about the Northern New England Cardiovascular Disease Study
Group. I have put together a brief presentation describing who we are and
giving you a little background in what we are trying to accomplish with our
cardiovascular population in Northern New England. Basically, I have no
commercial or financial affiliation with anything. I am representing NNE, the
Northern New England Cardiovascular Disease Study Group and there may be some
comments with respect to Dartmouth-Hitchcock Medical Center.
A rocket sled tour back in 1987, HIPAA decided to release institution
specific and ultimately clinician specific mortality data. This happened at a
time when Jack Weinberg had just published an interesting paper looking at the
variation utilization of transurethral resection in patients with benign
prosthetic lipotropy suggesting that based on the population that you are
associated with, it was more important where your prostate than how it big it
was in determining whether it was going to be treated medically or surgically.
These two things coincided to suggest that we could in fact collaborate by
sharing in very sensitive outcomes data on a population based environment and
maybe learn from that in a way that would protect and improve the outcomes of
our patients. In 1987, the entity was formed. Basically a collaborative to
exchange information concerning the treatment of cardiovascular disease,
regional multi-disciplinary group consisting of clinicians, hospital
administrators, and health care research personnel with purposes to seek to
improve the quality safety and effectiveness and cost of medical intervention.
It is a hefty mission statement, but it has been consistent. We have withheld
it and we have stuck with the mission statement for the last twenty years.
The entity relationship to its member organizations, I think the most
important thing to stress here is that we are a regional organization and we
have really put a priority on maintaining its regional focus. Therefore,
membership is offered to all institutions performing any sort of cardiovascular
intervention in Main, New Hampshire, and Vermont. We have been approached by
institutions outside of those three states, and we basically consider
membership on an individual basis. Once you have been agreed to be a member of
the NNE, you basically sign a contract agreeing to pay the necessary dues to
collect, provide, and allow validation of all data and to attain and host
meetings. Our meetings are held periodically throughout the year rotating sites
within the region.
Our funding began, basically, for the first ten years out of our own back
pocket. We paid for our own travel and our own meals and actually funded all of
our research activities through grants. That has since matured to the point
that all registry activities are now supported by dues collected directly from
member organizations. The nice thing is we have become sufficiently central to
the quality improvement and the quality assessment of systems of all
organizations that we are now part of their budget basically in order to help
maintain this registry. All of our research activities are supported directly
through grants and typically they are either through the American Heart
Association or through the NIH. We support our own travel and member
institutions pay for travel and eating et cetera. Since we rotate meetings
around the region, we basically each institution hosts and pays for each
meeting when their time comes up to have one.
It is hard for me to organize what we do in a way that could get through
this in fifteen minutes, so this is basically what we do. Our activities are to
develop and maintain a data registry. That really is our primary activity, and
I say with some emphasis that it is a registry. This is not a randomized
controlled trial through the registry. We provide outcomes data and reporting
system. I have brought some examples of that to show you today. We provide
clinical decision support tools. I have brought some examples to show you. I
think it is worth mentioning that if you are going to provide a population
based database and provide risk stratified outcomes data, you can turn that
data back on itself and provide risk gratified predictions based on patient
individual characteristics. Besides telling us or informing the clinicians how
they are doing with respect to one another, we also are providing decision
support tools for practice enhancement to help them make decisions about
individual patient encounters. We organize quality improvement activities. I
will show you examples of those. I think generally provided services to
clinicians and institutions as needed and as are appropriate.
This is a big slide. I apologize for this, but this is probably the most
important slide and probably will generate the most questions down here. I
think it is worth stating that although the registry analyzes the data and does
house the data, all data is the property of individual institutions and it
remains the property of individual institutions. All data has been reviewed by
IRB and IRB has signed off at all institutions with respect to the collection
of data. Most of the IRBs designated are registries of quality improvement
activity and therefore exempted because of that.
Data is protected therefore under a QA umbrella, and as I pass these things
around you will notice we put the disclaimer on the bottom. That is never to my
knowledge ever been challenged in any way, shape, or form. It is worth stopping
to pause for a minute just to mention that we have had at least three very
highly contested certificate in need applications within the region. We have
not used this data to either refute or to support it because it basically goes
against the tenor of what we are trying to do by promoting one institution over
another. I think that has been generally acknowledged and generally recognized
by all member institutions. I have wonderful anecdotes to tell you about how
important we feel it is to stay as owners of the data to stay in control of it
for appropriate reasons.
We have complied a consent sort of a general umbrella consent for the use
of data for QA purposes, which is part of the administration process when you
sign into the hospital. When we get into the nitty gritty, institutions are
responsible for submitting data and allowing for validation. Now our validation
is primarily procedure and outcome, meaning mortality outcome. We have elected
to use in hospital mortality as our benchmark outcome because it is so easily
validated against procedure. This is apposed to 30 day or procedural mortality
rates, which are harder to validate once the patient leaves the hospital. We
have done the due process to go back and compare a subset analyses that found
that the hospital mortality rate, which is easy to validate, does not change
the ranking on patients when you compare 30 day or procedural mortality rates.
Basically, the means of data collection is pretty much up to the
institution and we have examples of both paper and pencil submission as well as
the confirmation. We developed electronic into this thing and it is now
available free to all institutions. It is just a matter of when you want decide
to implement it at your institution. We have done that the last two years at
Dartmouth. So we are electronic now. We validate it approximately every two
years. I think we are very proud of our validation procedures. The hospitals
are at work with us to provide administrative data to compare against our
registry data to make sure all patients who have had the intervention counted,
and those that have died are counted, and those that have died twice or three
times et cetera et cetera. We actually go to the rigors of making sure we find
resolution 100 percent of the cases. I will say we are tracking 100 percent of
the patients that are intervened upon. This is not a sample initiative. This is
a 100 percent tracking initiative in the region.
We do compare our data to the NDI for long-term outcomes every five years.
This is gets into the sort of issue of patient identifiers. We have got to have
a patient identifier in order to prepare the NDI social security numbers. We do
have social security numbers. For a long time we were using Hogben numbers to
try to keep the patient identifier at the institution. It would have to
be—it would be figured out at the institution if you ever wanted to know
who Mrs. Jones actually was. We would be tracking her through what was called a
Hogben number, which was some combination of first names and last names and
initials and numbers or something. We now, for the last number of years using
social security numbers.
We have a way of bedding out new variables. We meet three times a year, and
occasionally someone will suggest a new variable be applied to our dataset. It
typically goes into a flux space and we make sure it can be accurately tracked.
We make sure that e have a representative of successful tracking before it is
codified into the registry permanently.
What is the status of our registry right now? This is pretty up to date.
The PCI register is about 100,000 patients. It began in January of 1990. We
really started this initiative with a search goal initiative and then grew out
into the PCI. Our CABG and Valve registry which goes back as far as 1987 has
about 76,000 and almost 77,000 consecutive patients in it through over a twenty
year period. We have added cardiac anesthesia and perfusion. As I quickly go
through some of the work that we have done, we found it was important to get
into the meet of the operation and therefore expand it into the anesthetic and
anesthetic and perfusion variables. You can see the numbers that are included
there. Interestingly, these are maintained by the representatives of those
specialties. They are tied to the central dataset, but all of the ideas that
came from what to track in perfusion and what came out of the perfusion is
similar to the anesthesiologists. That is the beauty of a multi-disciplinary
group like this. You can really think beyond your own specialty. We now have a
relational database that allows us to cross interventions from medical from PCI
to surgical interventions. It represented about 133,000 patients. We have been
tracking long-term mortality rate as far back as 1987 using the NDIs as a
This is our website. It is just an example of what is on our website in
terms of looking at regional outcomes. We post this as well as provide
individual institution reports. I brought the report from Dartmouth. These are
produced biannually. All of this other stuff I can leave. Even in today’s
transparent world where you can get this on the Internet, this is still pretty
sensitive stuff, and it is my only copy. These I can leave and you can spend as
much time looking at them as you like. Any way, you can see in terms of some of
the outcomes that we look at in hospital mortality, stroke rate, bleeding rate,
mediastinitis, this is sentinel admission mediastinitis rate. This is one of
our weaker outcomes because it tends to miss patients who are readmitted for
the disease. I add that as a postscript because it has been bugging me since we
have been cracking for fifteen years. Postop renal failure insufficiency, that
is twice the doubling of creatinine. Use of internal mammary arteries, mean
post operative length of stay, median and mean post operative length of stay,
and there are a lot of other outcomes that you will see in here, but this is a
sort of a thumbnail look at the region. This is what our PCI looks like in
terms of status and intervention, mortality, those going to emerging bypass,
and then vascular and stroke complications associated with the intervention.
I think this is one of the more interesting, and until we got into the
appropriateness works, one of the more appealing and sexier aspects of what we
do in terms of decision support tools, we now provide pocket cards for cardiac
survey, decision support for interventional cardiology and even an electronic
second opinion that is available that allows us to predict long term outcomes
based on whether they have a surgical PCI or medical therapies for their heart
disease. These are periodically updated. For example, the coronary
bypass—I will pass around, allows you to look at individual patient
variables, predict their mortality from CABG, their mortality from an aortic
valve and mortality for micro valve surgery based on their individual
characteristics. You can look at the likelihood of stroke or mediastinitis for
these as well. This is helpful when you are trying to get informed consent. You
can try to as accurate as possible in terms of assessing a patient and what
their chances are. This has come directly out of our dataset. These are what I
will pass around. I have a risk card for CABG and a risk card for vascular
complications following PCI. This other yellow one is a little narrow in its
scope, but it looks at the likelihood of low cardiac output. We went through a
period of time and are continuing to try to understand low cardiac output and
try to prevent that as an intermediate complication to debt followed by a
bypass surgery. This allows us to predict the likelihood of low output. The
hope is that you would take this and stratify care based on the likelihood of
developing low output from this card.
These are some of our quality improvement activities that are currently
underway. We have a readiness for revascularization. we decided that we could
do better particularly with transfers and just trying to meld guideline data to
our clinical practice. We have tried to codify steps that we think idealizes
the patient prior to an intervention, whether it be a PCI or whether it be a
CABG. So, we have started a multi-disciplinary and regional look at basically
readiness for surgery. That is the one I was interested in. There is a similar
one undergoing for PCI. we are trying to improve our decision-making by trying
to improve our decision support tools. If you look at cardiac surgery, we have
a specific intervention right now trying to tie actual actions at the operating
room with the likelihood of embolization whether what we consider
microembolization which we think is surrogate and a little bit more common
outcome for stroke and for some mental changes that occur following surgery.
So, we have an initiative going there. We have a prospective study. This is
actually one that requires full informed consent. We are trying to get as many
patients as we can to allow us to take a sample of their blood and freeze it
and at some point in time go back and look for biomarkers whether they be
markers for CNS, injury markers for myocardial injury, et cetera. We are
looking closely at blood utilization. It is a real interesting and quite
variable aspect of coronary bypass surgery in terms of the pernicious
use—I should say the pernicious impact of blood on patient’s outcome
post-op and the detrimental effects of excessive anemia. We are sort of
weighing two bad things. It is bad to get anemic, and it is bad to give blood.
We are trying to sort out those two issues, primarily by reducing the amount of
blood that we are transfusing.
The service to the institutions are summed up
here. We provide, and you have this book that is given basically to every
clinician as well as the reinstitution. We try to present or report the
institutions in JCHO friendly formats so they can really just read this into
their QA environment and then apply it to their JCHO credidation environments.
We are willing to customize reports based on individual institutional needs. We
support regional presentations, and we support national meetings. We have about
80 peer reviewed publications that are available on our website up to three
years ago. You will see what our bibliography looks like in here. We also
support about 100 abstracts or more at national meetings and peer review
publications. I think the beauty of this groups is it has finally found a way
to integrate the private and the academic sectors in way that is really
meaningful in terms of creation of new science within these specialty. We have
published some sentinel papers with basic private practice first authors. It
really provides a sense of pride and a sense of momentum to the organization
when you can do something like this. This is a beautiful melding of the private
sector and the academic sector.
I think it is worth mentioning that all of our
papers, in order to be published, you have to have at least one author from
each institution named on that paper. We have a lot of authors on our papers,
but it means that every institution is represented, and there has been active
input by at least one individual in each institution. That has been a tradition
within the study groups since its conception. You will see our recent studies
report, which I am passing around. We now have a website which you are welcome
to visit. We are almost fully transparent right now. Certainly the individual
institutions are moving quickly towards full transparency. All of our outcomes
short of mortality are fully transparent. There have been some legal glitches
trying to get that final mortality rate to be formerly transparent at our
website level. It is transparent at Dartmouth, and I think it is transparent at
most individual institutions. I think within the next few months, we will be
fully transparent with respect to coronary bypass surgery. It will take a while
before the cardiologists are willing to do that I think. We periodically meet
and try to figure out where we are going. About every three years we have a
I guess the next question is, that is who we are, now where is the beef? In
other words, what have we really accomplished over the last few years? Maybe
many of you who have heard about us, we sort made the papers a number of years
ago by realizing that nobody knew how to take care of patients in Northern New
England better than Northern New England doctors. When we decided we would
learn from one another in our early formative stages, we spent an entire
meeting deciding would we go the UABs and the Cleveland clinics and learn how
they took care their patients when we first learned from our own back door. We
decided we would learn within our own region. We went around where teams
visited all institutions and the opportunity to look under the rock at every
single institution in the region. This is really a sentinel event for this
organization in that we began therefore to gravitate our care towards the main,
I might say. We all started realizing we put our pants on one leg at a time
just like everybody else. A lot of the fear and distress began to fall away.
There is some wonderful spin-offs from that occur when that happens. Mainly,
you start trusting your competitors, and when you start trusting your
competitors, with trust comes a certain expectation. I think by getting to know
each other that well and that sort of intimately, our data got better. We began
to collect better data. We expected data, and we expected to give good data.
This collaboration in the form of regional meetings and regional benchmarking
really did result in a spirit of trust among what was considerably contentious
and uncomfortable environment.
We began to focus on moving from first order analysis, which is really your
traditional mortality outcome and move toward the second order analysis, which
is what really drives that mortality. We very quickly—while we reported
mortality, we really dove into this concept of what was it that was driving the
mortality. Was it low cardial gap? Was it transfusion? Was it individual
surgeon technique? Though we reported outcomes, we quickly learned that the
meat was in understanding process. By that, these are four examples of process
variables that are all within the surgeon’s control and completely and
totally linked by logistic regression to a better outcome. That is preoperative
aspirin. That is the use of the IMA to the LAD. That is avoidance of
transfusion. That is preinduction heart rate, adequacy of data blockade. By
gaining over the mortality thing and moving up stream and trying to find out
what drove that mortality statistically in a large population, we could go to
meetings and say, give aspirin before surgery. Use the internal mammary artery.
Try not to transfuse the poor soul, and get their heart rate adequately blocked
before you take them to the operating room and chances are that patient will do
So what happened? Our mortality as you see it—I have to think the 2004
thing was sort of reeling back from the coded stints arriving on the scene. We
have since adjusted and have gotten back to where we belong.
This basically looks at the five institutions
that started. I think the two things that we learned when we started this
initiative back in 1987 and published our data 1991 was not so much that the
mortality rate was high but there was significant wide variation between
institution practicing. You can see how that variation narrowed down over the
course of the years. Where it narrowed down is when we completed our regional
This is just an example and somewhat dated slide, but our work continues to
be this good. I think it is just worth pointing out when you look at the urgent
elective emergent of the patient and stratifying just look at the electric
population. You are looking at 1,500 patients throughout a three state region
where six people died in the course of the year. That is pretty good. That is a
0.4 percent mortality risk and suggests that our systems are sufficiently
robust to get the patients who are able to walk into the hospital carrying
their bag out of the hospital at least alive. I think when you begin to see
numbers like this, you begin to get to where I am going to come and sort of
conclude with this presentation. That is, you begin to become more concerned
about your regional rate than your individual rate and your institutional rate.
You just think of the occasions there as a physician.
So I am going to conclude by saying that clinical use of our data really
has begun to preclude all that we have heard about tracking outcomes in the
gaming strategies that are associated with this kind of work. It is important
work. There is no question, and even just monitoring is important. When you own
the data and control it, and actually use the data to change your practice,
then the gaming issue sort of becomes silly. It becomes ridiculous. Measurable
practice changes do and have occurred based on our data.
When I had a debt, I was more concerned on screwing up Northern New England
than screwing up myself in Dartmouth Hitchcock Medical Center. I had the
ability to stand in front of our entire Board of Trustees at one point and
count the fact that I had just came back from an NNE meeting on the risk of
mortality – the adjusted risk of mortality for anybody having coronary
bypass surgery in Northern New England is 1.7 percent and probably the lowest
in the entire country.
That is simply meant that anybody in that room which happened to be the
Board of Trustees for Dartmouth-Hitchcock Medical Center could go get chest
pain in any place in Northern New England with the same likelihood of survival
because there has been no statistical difference between institutions for the
preceding six years.
That to me was my first chance to take back the night as a clinician who
had been told what to do and who had been monitored for the proceeding x number
of years. Because we had taken the initiative here, we were suddenly now
looking at a population of patients not because it is going to make
Dartmouth-Hitchcock a better place, but it will make Northern New England a
So I will conclude. I apologize for that acronym, NNECVDSG. The Northern
New England Cardio Vascular Study Group has shown that high quality population
based clinical data can be accurately collected and creatively used for the
purpose of improving patient outcome. For that, I traveled long and hard to
speak. Thank you very much.
MR. REYNOLDS: I would like to thank all of you. You actually followed our
instructions. You were all concise. Well done. I am going to open it now for
questions and Paul Tang?
DR. TANG: That was a very compelling and engaging presentation. Thank you
for the good work. I have two questions. One is, you mentioned at least three
times that I picked up on about the importance of maintaining data ownership. I
think the first time or the second time you promised there were stories behind
that. So, I am wondering what — we are talking about trust. We are talking
about protection. You made a big deal out of it. Basically, there was something
about the ownership that was enabling. I am trying to figure out–
DR. NUGENT: This is not rocket science. I bet you could answer the question
for me. There is no ambiguity in terms of what we are here for. There is
absolutely no ambiguity. I am terrified of having a high mortality rate and my
institution winding up on the front page of a newspaper. That does fine. That
is the big stick. That is the stick that keeps me in line. Okay? This data does
not scare me. This data helps me. That is because I trust the data and I trust
those that are analyzing the data to use it for the reasons that we created.
Now the reason we are able to say that is because we have owned that data from
day one. We have never missed an opportunity to congratulate ourselves with
number one, how good it is and number two, what it is for.
So, I will give you a couple of examples. When one of our institutions
decided they would put up a—well we do not really have one of those big
signs—billboards in Northern New England. We would send out a colored
glossy advertisement that they were the best organization in Northern New
England when they in fact they were not any better statistically than anybody
else. It was the surgeon then at institutional that went to the consortiums to
say this was not right because they were using the same data that was basically
going into the consortiums to say they were somehow better than everybody else
and they were not. That brought the administrators to the table because they
came to us and they came to the registry and said, we need to be able to talk
to our third party payers. We also need to be proud about what we do. How can
we do that without jeopardizing the integrity of the organization? That allowed
us to sit down, and this is where you will notice we are tracking by 500
boluses in the last 500 cases. We have really given up the annual mortality
rate. Some hospital has 150 cases, some hospitals have 1,200 cases. That sort
of blows any inanity out of that data. You really look at a backside that is
representative and are looking at the last 500 cases as are metric. It is a
nice metric to use.
We just worked out ways where you can advertise your data, but also provide
confidence intervals afterwards that allow anybody to realize whether or not
that data is statistically better than anybody else’s. My point is it was
the administrators that came in and said, we do not want to make you mad. We
just want to be able to use our own data. So, by owning it, we were able to do
that sort of thing.
DR. TANG: The second question is, I saw that variance that you talked about
and I saw the 2003, and I heard a chronological fact that site visits. How do
we know that that was a change in the way the data—there was information
sharing about how to calculate data versus that all surgeons changed over
DR. NUGENT: I have been asked this so many times. Is this true to a relater
or true to an unrelated basically. In other words, was our group hub up there,
did it really have something to do with the outcome or not? If you look
at—there was a paper by Eric Peterson interestingly enough. I can dig it
out if you want to see it, where he compared the 1993 mortality rate. We are
going back in time. It is still representative of basically the time we are
talking about. He looked at the 1993 Medicare mortality rate by state and he
included the entity region as a state. He looked at how much it improved in the
preceding five years on an XY axis. He plotted it out which allowed you to have
four quadrants. Those with low mortality rate, low improvement; those with low
mortality rate, high improvement; those with high mortality rates—you see
what I am saying? Northern New England had the highest improvement and the
lowest mortality rate in 1993 after the improvement and the most improvement of
any place in the entire country now. There is nothing to suggest that this
would have changed by simply tracking, tracking, tracking, and tracking. There
was the intervention where we got together and watched each other practice and
compared notes and did slow diagrams and things like that. I never proved it
was a one on one relationship, but you will never convince me otherwise.
MR. REYNOLDS: Marc?
MR. ROTHSTEIN: Thank you. That was sort of my question. You are the first
person we heard from who described as sort of grand rounds. I am wondering
whether you think the success of that is attributable to the specialty, to the
region, and to other factors. How much stock should we put in that as a model
to recommend elsewhere?
DR. NUGENT: There is no question that the corner in population was the
poster child for this kind of work. It is not even in the corner of population.
It is disappearing. It was a beautiful homogeneous group of patients that could
be compared and sacred cows systematically slain in terms of my patients are
older and sicker, my patients are different. We were really working on the same
page throughout our region. There is no question that the specialty was the
ideal to start this kind of work because of the homogeneity and because of the
high cost and because of the burning platform. All of those things were in
place back in the early 90’s to get us together.
There is no question that the Northern New
England Environment, which is competitive, and I know you guys do not
believe me. It is and it was competitive. I hated those guys in Manchester.
They scared me to death. I had recently taken up a leadership in Dartmouth.
There was this coronary mill that was going to eat my lunch down there in
Manchester. They were growing incredibly fast. I hated them. Okay? It got me to
get to know them, and get to know them on a personal and medical basis. I have
to say that we were competitive, but there is something nice about your nearest
competitor being 70 miles away. So, we were the ideal region to give this thing
a shot. Okay? It has been hard to duplicate since. Believe me, I have eaten a
lot of rubber chicken in the last 20 years trying to duplicate it. Getting
beyond the mortality, beyond the outcome to the process has been hard for
regional or any other groups to transition to. All right?
Third thing is the fact that we published. That has been an incredible
important part of our momentum because it is has given us incredible
credibility around the country. It is as good as gone on CNN. It is better than
gone on CNN because it gives you a longevity in our specialty that transcends
any kind of 15-second sound bite on the evening news.
Now, can it be duplicated? I maintain that the important quality
characteristic of this group. I think where the rubber really meets the road
from the clinician’s perspective is the regionality of it. I think there
is something beautiful about knowing we are taking care of a corner of the
world, and we are taking care of that whole corner. All right? It is a large
enough population for us to understand and see relationships that are
impossible to see in an individual or even in an institutional basis. Okay? We
get rid of so much of the confounding and so much of the noise that you try to
tease out of your own practice or your institutional practice. You have a large
enough population to learn from these infrequent outcomes, but it is a small
enough group that I still have an identity. Every single surgeon up there has
its individual identity that if you really screwed up, you would see the
belief. You would find yourself up there. It is also this wonderful identity in
terms of who we are. To me, yes, I think it can be duplicated but there has to
be facilitation there. I think the key is the regionality of it. The Holy Grail
for whoever you are is to have this sort of national mark. I am going to put a
plug in for the fact that as much as I would like to compare myself to southern
California, I do not really care. I just want to make sure I am giving the best
care I can to my patients in Northern New England. I would put a pitch in. I
think that is an important attribute. I hope I answered your question.
MR. REYNOLDS: I am going to ask a quick question and then Kevin and then
Justine, but I may stop in the middle because I know Marc Overhage is leaving
and we may want a quick summary from you since you are not going to be with us
tomorrow. All of you have mentioned, and Jennifer you actually said the words
about the strong patient consent environment in Minnesota and so on. We are
talking about a lot of quality data, but nobody is talking about one of our
visions is, how do you the patients trust that you are using that data? I have
not heard anybody actually say that. So, any of you that want to respond.
Jennifer, maybe you first since you said the words and kicked it off.
MS. LUNDBLAD: I would just turn your attention back to the document that I
attached to what you have on the PowerPoint slide, which is the Health Records
Act. This is our current vision in Minnesota about how we get to that balance
of appropriate protections and patient consents and patient privacy, but still
being able to do quality improvement at the individual provider level, at the
regional, at the state, and at the collaborative groups. So, I think it had in
here a lot of those pieces that specify what we think of as that notion. It is
the balance. We are trying to achieve the balance. There is no magic solution
to it because there is going to be—I heard people characterizing that we
all get along and everyone works well in Minnesota, and they do not believe we
have competition there either. We really do. Part of the reason we have a
tradition of strong patient consent is because we have some of the strongest
privacy advocacy groups that are so active at our state legislature, and it is
about striking that balance in a way that is going to work for what our state
regs are that are going to compliment the HIPAA pieces. It is complicated. I do
not think this will be the end all. This is what our most recent hot off the
press actions have been.
MR. REYNOLDS: Dr. Nugent?
DR. NUGENT: This is not an issue for our patients. Honestly, the average
patient does not know the Northern New England Cardio Vascular Disease Study
Group exists. The average patient wants to know that they are getting high
quality care at their individual institution. Honestly, they want to make sure
that institution is as good or better as competitive and also available
institutions. We are working behind the scenes to do that. We have certainly
approached patients for specific studies, such as the biomarker study. It is
really our energy level and not the patient’s in term of whether they are
going to sign up or not because it is 15 cc’s of blood. In all fairness,
patients are really not aware that this is going on. It is not that we are
hiding it from them. It is not the sort of thing that makes the evening news or
the regional press unless you screw up.
MR. REYNOLDS: Marc, why do not you make your—if you have any comments
or summaries from today before you take off out of here.
DR. OVERHAGE: Well, I guess that the major thing that strikes me and I will
be interested to see if the bares out over the next week is that what I am
hearing is that it seems to me the task you are said invest—what is the
barrier to you using these sorts of routinely collected clinical data for other
purposes and especially for quality? It seems to me that mostly what we have
heard today is A, you can do it. B, there is no legal barriers to doing it.
There are a whole bunch of policy process uncertainty barriers to moving
forward. There are probably some technical things that would make life easier
for everybody. Those may be one of the places where it seems to me we need to
keep drilling down including the issue that we touched on a couple of times
with anemia discussions about commercial use and some of these other things. I
think we still have not really wrestled with that too much. The good news is
that while we could make things cleaner, easier, or simpler, the framework
legally and regulatory wise are permissive anyway for these uses. We have great
examples of successes of doing that. We just have to figure out half way in
MR. REYNOLDS: Kevin, your next question?
MR. VIGILANTE: Just a couple of quick questions. What risk adjustment
algorithm—did you develop your own or did it come from somewhere else? The
other is, is there any interest in extending this to noncardiovascular cases?
Say the ACF is mis-quiped adaptive program, and three—if you publish so
regularly, does anybody say you need to start submitting to the IRB? You do not
need IRB approval, but we do.
DR. NUGENT: All IRBs are reviewed. Every time we sign a contract each year,
the IRB steps in.
MR. VIGILANTE: I must have misunderstood when you said that.
DR. NUGENT: Most IRBs have passed it on to the QA and have allowed the
registry work to be QA. Most of our papers are purely descriptive. Whether or
not that is different than research, I do not know, but it is purely describing
populations. For the most part, that is what we publish. The risk
stratification tool—that it is why it took forever to get the micro-valve
in because it took forever even for a region to have enough numbers to provide
a viable and valid tool. Of course, any of these risk stratification tools are
used like any other clinical tool like a hemoglobin or chest x-ray. It is just
a conglomeration of data that spits out a number. It is not prescriptive. It is
just descriptive. These are all our own data. They are transferable. We have at
least tested that out on a couple of other risk stratification environments
back in the 90’s.
MR. VIGILANTE: Let me go to another question. What if somebody came to you,
say, a stent manufacturer and said, we would like to buy your data to look at
drug eluting stents versus bare metal stents? How would that complicate your
life? What would your response be? What would the barriers be?
DR. NUGENT: That actually happened. In the mid 90’s the Apache system
wanted to use our data to build their stratification tool for their software.
After our data was collected and analyzed, we did give them batches of that in
part of our early days. We have not been approached and that has since gone
away. How would that complicate things? I basically do not think it would.
Would I have to put a disclaimer up here?
MR. VIGILANTE: In terms of, would you feel – it is sort of uncharted
territory, but this notion that it is not a really visible patient who does not
even know this is being used. In this case, your registry data is—if that
were the case, you would have to disclose it or–
DR. NUGENT: No, in this case we certainly disclose it to ourselves. In this
case there were no patient identifiers. That was stripped of any of the data.
It left us as bland data without any patient identifiers whatsoever. You would
have to tell me. I do not know. It did not complicate things, and it was part
of our environment for a period of years. It helped fund us for a period of
years. It did not tie us to any organization. Basically, they bought it as a
MR. VIGILANTE: Let me jus say something interesting. Obviously this
stuff—doing what you are doing is not free. It costs something. The fact
that you are able to get all the revenue through to help sustain it is an
important consideration of when you think about sustaining these kinds of
activities on a broader scale.
DR. NUGENT: Right, I think that is valid. I think our last dues are roughly
60,000 dollars today just to give you an idea.
MR. REYNOLDS: Justine, last question?
DR. CARR: Thank you very much. I just want to go back to saying that we
keep hearing themes repeated, certainly the trust theme. But also going back to
what we heard yesterday how the continuum from clinical care and quality
measurement is blended because it is all one in the same as we heard from early
yesterday. Secondly, the continuum between research and quality. You cannot
draw the line here. It begins with quality and what they learned became
publications, but under quality. So, trying to dissect in between at what point
something changed. Then, I think what is especially great about this is the
fact that it is not just the reporting out. There is an absolute feedback loop
that resulted in tangible clinical decision support and shared learning. I just
think this is kind of a great place that we would want to be. The challenge of
one of the things that this group is trying to do is looking at all of these
regulatory things and the potential what if’s and protect them, but not
losing something that seems to work so well and began 20 years ago. It is long
DR. NUGENT: I will just comment on the research. My comment, when you say
this tie between quality improvement and research, I think, we can do that. It
does give so much credibility to what is a very soft sign. It is beginning to
structure and legitimize that work. I think to somehow divorce it because it
has got some sort of a negative connotation would be an incredible step
backwards. We need to legitimize and to structure and to formalize a lot of the
quality work that is being done. The peer review process is the best way I know
of to do it.
MR. REYNOLDS: Well to play off your comment, and for the rest of you, we
did not know who you were until today either.
DR. NUGENT: You do now.
MR. REYNOLDS: I say that all three of you have made a difference. We will
try to make that same difference for you. You all made a big difference for us.
That concludes our panel, but as a committee we are going to spend a few
Paul, if you have any comments about what you saw in the last day or so, go
DR. TANG: I certainly want to echo what has been said about trust. I think
that is the primary consideration here. I think one of the things we need to
recognize is that the public does trust for many of these allowable uses such
as research and policy and a blended version of those. I think a sense that is
not a problem.
The other I think we heard from Glen and others and what Marc just
reiterated is that the barriers are really actually quite low to conduct those
things that we consider research and quality activities. I think where we get
in the trouble or concern is in the data world. I think it arises because Dr.
Nugent’s way of saying it is ownership. It is the same thing with
patients. It is ownership and control, knowing where my data is going to be
used and for what purpose. When there is a surprise factor where something they
find out about does not meet their expectation then I think that is something
that we need to weigh in on, on the patient or public’s behalf.
I think it really focuses our attention in to this particular area where
its data to the extent that we can delineate the various types of that as you
were saying where they drove down in that commercial—what they call
commercials. I think that would serve us and our constituents as well.
DR. SCANLON: I think I am very much where Marc is in terms of I think we
have heard some very good applications, and they are successful applications in
the current world. Things are feasible. For me, I would have to go back to
yesterday to old world versus new world in the way that data are gathered and
used. I think I would like to see a lot more new world which is take advantage
of the Electronic Health Record, be able to sort of—we have examples today
in terms of demonstration of where people are using Electronic Health Records
and then sitting down and manually abstracting information. This is where we do
not want to be down the road. The question is, what other way can we facilitate
that, and what are the barriers to prevent that?
I think that I came into this the least informed
of anybody in this group. What I have heard is that there might be
barriers, but there certainly is uncertainty. I think for the nervous among the
population, uncertainty often creates inaction or inhibits action. Therefore,
if there are things that can be done to improve the certainty. That is
something that we should think about. At the same time, recognizing how complex
a task it was to get to where we are with respect to things like HIPAA. You
should not let the perfect of the enemy of the good. You should not be thinking
about, if I pull on this string here, things are going to be fine because that
unravels much too much. I am reserving a lot of judgment for what I hear as we
move along because, again, I feel like I came in here with the least amount of
experience. I think that the successful application suggests that things are
very feasible and the question is now sort of what is the best strategy for
facilitating improvements in them, and that is where I am reserving the
MR. REYNOLDS: I take a little different view of what you have to say. As
you say, the people have trust. I think a lot of what we heard today is that
they did not know. I think a lot of these studies and everything, they did not
necessarily know that their data was going—as was mentioned, but that
still might be all right. As long as there are frameworks of controls and
frameworks of structure and the PPOs. So, I am not as sure I can go as far
having listened to that that the people automatically trust. That is the words
I heard. I am not saying that is where you are going. But I think that agreeing
with everybody that someone said—I think there are structures, and there
are environments, and there are regulations and rules in place that allow those
things to go on. I am not quite sure I can jump that far.
DR. TANG: You may have misunderstood me because what I was doing was citing
the surveys of consumers that say, would it be okay if your data went for
public health or went for clinical research? Like the 80’s percent were
saying, yes. What I mean is that is something they would trust, that kind of
use. I am not saying where data is going at all, no. That is not their fear.
Again, Carol summarized that the fear is it going to these other things which
would not fit with their expectation of what happens when they go to their
doctor and data is collected about them.
MR. REYNOLDS: Other comments from the committee on anything you have heard
or anything else? If not, we will see everybody at 8:30.
[Whereupon at 5:06 pm the subcommittee meeting was adjourned.]