[This Transcript is Unedited]
DEPARTMENT OF HEALTH AND HUMAN SERVICES
THE NATIONAL COMMITTEE ON VITAL AND HEALTH STATISTICS
September 21, 2012
Hubert H. Humphrey Building
200 Independence Ave., SW
CASET Associates, Ltd.
Fairfax, Virginia 22030
TABLE OF CONTENTS
- Call to Order, Review Agenda – Dr. Justine Carr
- Standards Administrative Simplification Letter – ACTION Outline of HIPAA Report – Dr. Walter Suarez
- Privacy Community Health Data Report – Action – Ms. Linda Kloss
- NCVHS – Summary Steps and Future Directions – Dr. Larry Green
- De-identification methods for Open Health data – Jonathan Gluck and Khaled El Aman
P R O C E E D I N G S (10:08 a.m.)
DR. CARR: Welcome to day two of the National Committee on Vital and Health
Statistics. I am Justine Carr, Lame Duck Chair of the Committee, Stewart Health
Care, alive and well, and no conflicts.
DR. TANG: Paul Tang, Palo Alto Medical Foundation, member of the committee,
DR. FRANCIS: Leslie Francis, University of Utah, member of the committee, no
MR. QUINN: Matt Quinn, NIST, staff of the Quality Subcommittee.
MS. KLOSS: Linda Kloss, member of the committee, no conflicts.
MR. BURKE: Jack Burke, Harvard Pilgrim Health Care, member of the committee,
DR. SCANLON: Bill Scanlon, National Health Policy Forum, member of the
committee, no conflicts
DR. COHEN: Bruce Cohen, Massachusetts Department of Public Health, member of
the committee, no conflicts.
DR. CHANDERRAJ: Raj Chanderraj, private cardiologist in Las Vegas, member of
the committee, no conflicts.
DR. SUAREZ: Walter Suarez with Kaiser Permanente, member of the committee,
DR. FITZMAURICE: Michael Fitzmaurice, Agency for Healthcare Research and
Quality, liaison to the full committee, staff to the Subcommittee on Quality
and the Subcommittee on Standards.
MR. WALKER: Jim Walker, Geisinger Health Systems, no conflicts
DR. WARREN: Judy Warren, University of Kansas School of Nursing, member of
the committee, no conflicts.
MR. SOONTHORNSIMA: Ob Soonthornsima, Blue Cross Blue Shield Louisiana,
member of the committee, no conflicts.
MS. MILAM: Sally Milam, West Virginia Health Care Authority, member of the
committee, no conflicts.
DR. GREEN: Larry Green, University of Colorado, member of the committee, no
MS. GREENBERG: Marjorie Greenberg, National Center for Health Statistics,
CDC, and executive Secretary to the committee.
MR. SCANLON: Jim Scanlon, Deputy Assistant Secretary for Planning at HHS,
and executive director of the full committee.
DR. MAYS: Vickie Mays, University of California, Los Angeles, member of the
full committee, no conflicts.
(Introductions of staff and guests)
DR. CARR: Jim, I would like to turn it over to you. You have an
MR. SCANLON: Let me give the committee an update on a couple of things.
About eight weeks ago, we made a set of recommendations for reappointments and
new members for the full committee. I am pleased to say, late yesterday, it was
at the end of the day, so we couldn’t really announce anything during the
meeting, the Secretary approved all of our recommendations. What I can announce
today are the reappointments. For new members, we were asked to wait until we
heard from the formal confirmation for the new members. We should be getting
four members who will fit in very nicely with the committee, and will bring us
up to the full complement of 18 members.
Number one, I am pleased to say that Sally and Walter have been reappointed
for a 15-year term, 4-year times. They negotiated a 7 percent increase in pay,
but we had to take away their health insurance. Then, let’s see, we have
Justine, our esteemed chair for quite a while now, a member of the committee.
She has served two terms and so has reached the limit, we can’t renew her. She
will be leaving the committee, but staying on as the chair of our working group
on data access and use.
The Secretary has asked Larry to serve as the chair. In a moment of
weakness, Larry agreed. I think what we can do is, as of the end of the day
today, probably Justine can turn over the gavel to Larry, and Larry will be
chair for the next meeting subsequently.
MS. GREENBERG: I just wanted to say that we have four new members that we
have to wait until they accept their appointments. We, of course, hope to bring
them in, they will have done that and they will have completed their paperwork,
so we can bring them in for the November meeting. We would also be prepared to
bring in, to make the transition complete, those, I guess it is just the two of
you right now, Justine and Judy, for that meeting, as well.
Now, you would be coming in any event because of the working group. You
wouldn’t be voting members at that point, assuming that they are voting
members. Sometimes, they haven’t gotten all of their paperwork in, so don’t
think that those dates are free now on your calendar. On the other hand, I
don’t want to put undue pressure on you, but I just wanted to clarify that.
DR. CARR: Actually, I want to just take a moment to share the things.
Everything I needed to know, I learned at NCVHS as a member, I want to share
things. We’ll have to pass this along to new members, but it is relevant to
current members, all members. Just a couple of things, and then a couple of
things for Larry as the incoming chair.
As a member, I urge you to ask dumb questions, to make grids to understand
complex issues, very helpful, express your thoughts once clearly, less is more.
Turn off your mic if you are prone to sidebar comments. Read transcripts, your
wisecracks may not be as good as you thought. Don’t multitask, you’ll be sorry
what you missed. Bring snacks to meetings, no public dollars can be spent on
food. Go to all the dinners and kick back.
Linda, this will be for you, bring a calculator and do the math for Marjorie
at the end of the day. Now, for Larry, start on time, end on time, set timely
deadlines and meet them. Ask for PowerPoints before developing a letter. Keep
track of themes. When concepts are muddy, go around the room.
When discord emerges, embrace it as an opportunity to learn. Appreciate and
recognize colleagues. Edit letters for one voice. Don’t sign anything you don’t
understand. For the members and the chair, recognize the honor bestowed on you
and live up to it.
One other request, can we do a class photo? It would be fun.
(Pause for class photo)
DR. CARR: Okay, we are ready for the standards, Walter and Judy.
DR. SUAREZ: Thank you, it will be a tough act to follow. In the spirit of
the theme today, and to reflect on our leader, this letter truly reflects some
of the things that you just said, Justine, we hope. When you sign it, we hope
that everything will be clear. More importantly, I think it always comes back
to all of the things that you always told us to think about. Why are we doing
this, who is being affected by it, and why is this good? I think those four
questions that you always ask us to follow are truly reflective in the work
that was done in this letter.
We presented this letter yesterday to the committee and we worked through
the comments. Thank you again to everyone who provided us with those comments
yesterday. We came back with this final draft. Yesterday, the subcommittee
reviewed it and edited it, and is submitting it for your consideration.
The first change, I think, was down in the introductory party. There are a
couple of places, Yes, on the ICD10, I think we added a couple of points. One
was this point about ensuring that, through this transition and through this
adoption of ICD10, we minimized disruptions in the business of delivering care.
We heard testimony, and even specific examples of this one entity that has
gone out of business basically. More importantly, the fact that we need to be
mindful, as we think of this transition as the Secretary and HHS, through the
recommendations, begin to develop this strategy to minimize disruption to
business of delivering care. We inserted that as one statement.
We also inserted another bullet below this, which is basically the last
bullet, that highlights the importance, and we also heard this during the
testimony, promoting the establishment of testing areas and test methods,
including sample data, and to allow for some innovative opportunities for
testing during the transition period, so we wanted to insert that.
All of this to ensure a couple of things, clinical consistency, so that when
we move from nine to ten, whatever we express in ten is consistently clinically
with what we were expressing before in ICD9. Financial neutrality, so a very
important point about not seeing any deviation from what we were doing before
with ICD9. Those two elements were added to the section here.
Those were the only comments in the first part of the letter, although the
comments were in the recommendations, so we will go down to the
recommendations. I think the first comment on the recommendation was on
recommendation three, I believe. I think we did actually, yes. That’s right, we
added, as the clarification point on the dental codes to address some concerns
that the way this was phrased might suggest some direct relationship or
oversight from NCHS to ADA. We added, it was suggested, by testifiers, that
NCVHS look into the maintenance process of all HIPAA name standard code sets.
It wasn’t just to look at one at or at look at a specific situation, but to
look at all of them.
In addition to that, I think now we get into the recommendation side. The
first change was really in the recommendation in the testing. Testing, we
mentioned yesterday, was one of the most consistent and important themes. We
had originally one recommendation. It was actually a long paragraph, and so we
broke the same paragraph into the three recommendations. The first one did not
The second one, we changed the word industry, working session to ensure that
it is different from the other listening session that we are recommending in
the letter, that CMS convene. I think we added the Research 3C, which was
specific to ICD10, and it was highlighting a couple of points. Again, the point
about establishing the testing areas and test methods, and then the expectation
that NCVHS would look forward to receiving reports on the status of ICD10 being
tested on a regular basis.
DR. COHEN: We had talked about adding the word, quickly, or expeditiously or
DR. SUAREZ: Yes, putting some prioritization, I think that was another point
that I don’t think we capture here. We should.
DR. COHEN: CMS should, I don’t know whether quickly promote or expeditiously
promote the establishment. There needs to be more of a sense of urgency for
this to happen.
DR. SUAREZ: Expeditiously?
DR. COHEN: I would really like to see that.
DR. SUAREZ: Timely, expeditiously, pick one. That was the main addition to
this recommendation. Any comments or additional questions on this? Okay, then
the next one in the location and outreach, I think we also added the importance
of two things, targeting the safety net providers and small entities with
limited resources in this outreach effort. The point, I think, that was
mentioned, and I think that was it basically. That was the concept of we need
CMS to certainly prioritize, given the constraints and resources, so it should
target safety net providers and small providers and entities in this outreach
DR. FITZMAURICE: Walter, would you want to consider outreach should be
targets to help and educate safety net providers?
DR. SUAREZ: Target to?
DR. FITZMAURICE: Just to help safety net providers.
DR. SUAREZ: Help safety, sure, health safety net providers and small health
care. Oh, help, not health.
DR. WARREN: Isn’t that redundant? We are talking already about education.
DR. CARR: Do I have a motion?
DR. SUAREZ: We still have, I think, a couple of more. The only other one
with recommendation seven, we added the word sample, so both conduct adequate
sample of compliance audits, so that it did not give the impression that it was
complies with audits across the board, to everyone, but the sample of
compliance audits, which is something that CMS is —
DR. FITZMAURICE: To conduct an inadequate sample?
DR. SUAREZ: Yes.
DR. FITZMAURICE: Thank you.
DR. SUAREZ: Those were the main changes to the letter.
DR. FITZMAURICE: I would take out – and put its.
DR. SUAREZ: Thank you.
DR. CARR: That is now where you want to go.
MS. GREENBERG: I assure you, after we finalize these letters, I read through
all of them but I always appreciate an extra set of eyes.
DR. SUAREZ: Those were the changes. The word industry adoption, or the term
industry adoption, changed. We had before industry absorption, we changed
absorption to adoption.
DR. CARR: Do I have a motion to approve?
DR. CHANDERRAJ: Move.
DR. COHEN: Second.
DR. CARR: All in favor?
Any opposed? None. Any abstentions? None. Well done.
DR. SUAREZ: We want to next probably just bring up the additional point
about the HIPAA report. As you all know, every year, we had been providing a
HIPAA report to Congress. Actually, it hasn’t happened really every year since
HIPAA was passed, because we are in the 17th year. Last year we
submitted our 10th report to Congress. We don’t do it every year,
and the subcommittee had a discussion about the timing and the appropriateness
of preparing a HIPAA report to Congress this year, in light of a couple of
The first one was our very extensive detailed report was delivered late last
year, basically almost early this year, to Congress, the 10th HIPAA
report to Congress, in which we had a number of things identified, discussed,
and proposed to be worked on into the future.
The second major consideration was the fact that there has been a number of
changes done in the administrative simplification activities, including the
adoption of the new version of standards, starting this year in January of
2012, the adoption of operating rules, which will start in January of 2013, the
delay of ICD-10, the adoption of a health plan ID.
All of these are new things that are going to be changing and shifting the
way we continue to attempt to improve administrative processes. The
subcommittee really felt that I was early to provide any type of report about
those changes, because in reality, some of them just started to happen, and
some of them are to happen in the next few months.
The decision, and that is the recommendation of the subcommittee, is to not
prepare and submit a report this year to Congress, but actually wait until next
year, when we would have the experience of at least a year and a half or more
of implementation of the new standards, the new version of the new standards,
as well as the implementation of some of the new operating rules and some of
the other processes.
We also, in the letter that we just approved, recommend that CMS begin to
consider putting some resources into assessing, into making a more deliberate
and complete assessment of the implementation of this standard. All of the
regulations really related to administration and publication. In light of that,
we are recommending that we not report this year, or not prepare a report to
Congress this year, but wait until next year.
I will turn it back to you, Justine, to see if there’s any.
MS. JACKSON: Just in term of dissemination, Linda’s suggestion about making
sure that our documents get out right to the people who want them. Every time
this report goes out, as you can see, I’ve got a note that this is a final
copy. People really enjoy this report, so I’m kind of happy that we have
another year of being able to disseminate this.
As you go to your meetings that can use it, please just send me an email.
Let me know how many copies, we do have plenty. I am sure when the next report
comes out, we will have the excitement of an Apple 5. We want to make sure that
this gets out and uses this time for this year while we can.
DR. CARR: I think the work that was done last year was tremendous, and I
think that when we do these reports, they deserve that amount of attention.
It’s too soon to do it again, so I think I agree, accept, endorse your
I have to say Walter, you are an extraordinary powerhouse. You, and actually
Judy and Ob, what you have accomplished, since I’ve been here, is nothing short
of extraordinary. I thank you, on behalf of the committee, and also for myself,
because you do such great thorough work. Thank you very much.
We will now move to the Privacy Community Health Data Report. I think the
way that would be a good way to tee this up is to say, what were the takeaways
from yesterday and the actions that you did, and then we can go into the
MS. KLOSS: We gave you both a markup, as well as a clean copy, because it’s
hard to encapsulate what we did to it. We did a lot to it. We did combine
principle one and three, so we reduced the number of principles from ten to
nine. We took out musts, and made them shoulds.
We addressed all of the other points we believed that came out in the
discussion yesterday. I think we might do best by going through this, paragraph
by paragraph, but not worrying about wordsmithing or editing or typos. We
didn’t have the luxury of time to do a final polish on any of that. We just
wanted to make sure that we’ve captured the issues that were most compelling.
DR. FRANCIS: We have got the clean copy up, but if you want to see what got
changed, you have a markup copy.
MS. KLOSS: On the first page, we changed a must to should, you will see in
the second paragraph. We also added reference to the original Code of Fair
Information Practice, produced in 1973, to underscore the historical nature of
the evolution of these principles. Maya suggested that and added that, and I
think that does strengthen.
DR. FITZMAURICE: Could you give an example of maybe one way the communities
are using this digital data? I am interested in, how did they get it, as well
as what are they using it for?
MS. KLOSS: Well, the reference in communities today are using digital data
to tackle important health issues, is just a general statement, that then is
supported on the next page by specific reference to the committee’s report on
community health data initiatives.
DR. FITZMAURICE: I am saying are they looking at registry for say, influenza
vaccines, and then sending out the police to knock on doors saying, you didn’t
get your vaccine.
MS. KLOSS: Are you suggesting we should put an example right here?
DR. FITZMAURICE: No, I am just trying to understand it. I am not suggesting
any change, I am just trying to understand.
DR. CARR: You are right, it is a bold statement to say, digital data to
tackle important issues in ways never imagined.
DR. FITZMAURICE: I’m trying to imagine that they get the data and use it
without violating HIPAA.
DR. FRANCIS: We can take digital out, if that would help. Basically, that
was meant to be a summary that came from the community health data report.
DR. FITZMAURICE: I don’t mind digital, just what do they do with the data
and how do they get it?
DR. CARR: You are right, it is the opening kind of introduction, so look
forward to achieving the promise of that sentence.
DR. KLOSS: We could put footnote the report right here, if that would help.
We also looked at ways to flip from anything that seemed like it was on track
to be regulatory, to more of a positive. That really, at the top of page two,
topic of the letter of stewardship framework, enabling communities to use data
to improve health in a manner that fosters trust. Again, positioning these
statements in a positive way.
DR. CARR: Just in the second paragraph, these developments should be
encouraged. Yet, they should be coupled. I am just wondering whether we want to
set it up. These developments should be cultivated and divided by data
stewardship. I also don’t know what the difference between appropriate data
stewardship and data stewardship, or data stewardship practice is or data
I think we just need to decide, data stewardship is a frame of mind that is
carried out in some practices. I would take out appropriate and I would take
out practices, and I would make it and guided by, or perhaps by guided by
building the trust through data stewardship or something like that.
DR. FRANCIS: When I tucked in appropriate there, my reasoning was that we
didn’t want to suggest that there was only a specific set. That was the reason
for that extra word, but it is fine to take it out.
DR. SCANLON: I like appropriate. I mean, there is the risk if somebody
claims that they are doing data stewardship and they are not doing the right
DR. CARR: I would say it is appropriate data practices then. I think
stewardship, but we will see how is signing this letter.
DR. FRANCIS: We have got an appropriate later on, so we could just take it
out. Go ahead and take it out.
MS. KLOSS: The sentence will read, these developments should be cultivated
and guided by data stewardship practices. I am missing the appropriate in
there, because I mean the custodianship of data doesn’t mean it is good
custodianship. That was the point, I think, right?
MS. BERSTEIN: Did you want to take out the word encouraged, or just say
encouraged and guided by? You didn’t want to remove the word encouraged, did
DR. CARR: I would say it cultivated, I would even say, and guided by data
DR. FRANCIS: Then, appropriately comes in at the end of that sentence,
because we combine two sentences, yes.
MS. KLOSS: Okay, page two. This is where we inserted the definition of
community. There’s two new sentences there.
DR. MAYS: I want to go back before you start the community definition. The
very last part of the sentence at the top of the page on two, yes, right there.
The topic of the letter is stewardship frame, enabling communities to use data
to improve health in a manner that fosters trust. I am not as comfortable with
in a manner that fosters trust, because I don’t think we provide enough of a
case for that. Instead, enabling communities to use data to improve the health
of their communities, I would suggest is a friendly amendment.
MS. KLOSS: I think we were trying to very directly link stewardship to
fostering trust, and so that was the point.
MR. SOONTHORNSIMA: Trust is the operative word here, I think.
MS. GREENBERG: The way we modified that sentence, still has the kind of
negative aspect by the if. If you change the if, I think it is these
developments should be cultivated and guided by stewardship practices, so that,
I think I would say. Then, since you mentioned trust there, do you need to
mention it again?
DR. CARR: Where you go down to data stewardship, comma, if individuals and
communities are to trust. I would suggest that we say rather these developments
should be cultivated and guided by data stewardship, building the trust of
individuals and communities.
DR. FRANCIS: How about changing if to so? So individuals and communities can
trust. The reason for making sure trust is there is that that what was called
out in the CHIP report as the next step.
DR. CARR: I think the subtle things we are saying is that we don’t want to
say we have an emergency, it is broken, we have got to jump in. We want to say,
we have a continuum of practice that is building. As we grow in our use of it,
we grow in our understanding.
DR. COHEN: I don’t know that we need may. Why don’t we just say trust, so
that individuals and communities trust that their information.
MS. KLOSS: Then, the lead-in to the sentence you’re questioning, Vickie, the
bottom of that page, these frameworks however have not been developed to attend
to the topic of this letter, of stewardship framework enabling communities to
use data to improve health in a manner that fosters trust.
DR. FRANCIS: The reason for that sentence was to indicate that although
there are a lot of data stewardship frameworks out there, they are not directed
to this circumstance.
DR. MAYS: Now, I understand the trust stuff. I am just feeling it over
promises what is to come. What is to come is focused less on the process about
trust, and more about the process of the use of data. It is just a little bit
switching it in some way. That was why I kind of dropped it at the end, but let
me see if I can edit it differently. I think that it is on the data and the
community, and not the process of trust.
DR. TANG: Isn’t the sentence building on the Fair Information Practices, and
what we are saying then is that the framework exists. We are building on the
framework in order to accommodate the broader sharing or dissemination of data.
That is the uncovered thing that wasn’t present in 1974. I think maybe what we
are trying to express is we are building on instead of that wasn’t covered.
DR. FRANCIS: Let’s change that sentence to use your language, to say
something like this letter.
DR. TANG: This letter builds on these frameworks to account for the broader
use and sharing of data. I mean, that’s what community creates. What it builds
on previous frameworks to accommodate the wider use and dissemination of data.
MS. KLOSS: Well, we might want to reference the specific use of communities
to use data, to improve health.
DR. FRANCIS: To encompass rather than to accommodate?
MS. KLOSS: The middle paragraph on page two inserts the definition from the
report, many different types of communities today are using data to improve
health. The report described some of these efforts. The 2011 report defined a
community as an independent group of people who share a set of characteristics
and are joined over time by a sense of what happens to one member affects many
or all of the others.
Then, we go on to say, and reinforce that communities are diverse, although
stewardship matters for all, application of stewardship principles may differ,
depending on community characteristics.
DR. CARR: I don’t understand what that means.
DR. TANG: The stewardship principles are uniform and universal, because
communities are diverse and they have diverse uses and applications, we need to
apply the principles somewhat differently.
MS. KLOSS: Principles are not processes.
MS. GREENBERG: An example would be, we heard from a number of Indian tribes.
In their case, they have a very structured process of approving anything, any
research or data that is collected, a whole governance structure because of
their tribal relationships and their position. They are going to have a
different, there’s going to be a group you have to go to, et cetera, to
manifest some of these principles, whereas other communities wouldn’t have
these structures already set up.
DR. CARR: Is it application of the principles or implementation of the
principles. Because I think if we believe these are the principles, we apply
them, how we implement them.
MS. KLOSS: This says application.
DR. CARR: I’m saying, I don’t think that’s the right word. I think
implementing, in other words, this makes me think that you have 10, I can
DR. TANG: Okay, so it is like Marjorie said. We said that we need to consult
with the community. Well, when you implement that in the Indian tribes, that
means you go to the tribal council.
MS. GREENBERG: I think we all agree, but Justine is suggesting that
implementation is what you are talking about.
MS. BERNSTEIN: Implementation strikes me as something you do with an actual
rule or a procedure.
MS. KLOSS: I think we were trying to be a little more general, and reference
how they are applied, rather than getting into specific practices. I think that
is why we use application.
MS. GREENBERG: Maybe you could say, although the principles, not just
stewardship, but although the principles matter for all, their application may
differ, depending on community care. Make it clear that the principles are
relevant to all.
DR. WALKER: I have what I hope is a substantive question. This document has
in view the governance of data stewards, correct, not the governance of
DR. TANG: Correct.
DR. WALKER: I think that could be clearer, partly because in some instances,
the community is the data steward. In other instances, it is definitely not. I
think there are places here where it’s not clear whether we are governing the
Then, the question that comes out of that, so say there is a data steward
that serves 12 communities. Which communities’ characteristics does it match
its data stewardship to, if they are different? I don’t think we have
addressed. When we say it should be matched to the community, I get the point.
In practice, that will be a little weird, in some cases, I would guess in many
DR. FRANCIS: One way to deal with this, because actually I think we can
envision that, although stewardship matters, there may be cases in which a
particular stewardship principle isn’t applicable at all. We wanted to keep
this very general, so I understand you.
DR. WALKER: What if there is one data steward, there are 12 communities,
this principle is appropriate to this community, but not to that community.
Data steward has two different stewardship responses?
DR. FRANCIS: They might have different responsibilities to different
communities. Suppose what we do is just say, application of stewardship
principles may differ, and then drop the depending on community
characteristics, because they might differ, depending on a number. That is why
I was trying to address that.
DR. TANG: Might we be emphasizing the wrong thing? We want to emphasize
actually the universality of the principles across diverse communities. I think
it is a bit of an exercise to the reader, in terms of how do you do it with the
consortium that you happen to be dealing with. Our main point is that we all
have the principles universally apply, despite the diversity of the
DR. COHEN: I agree both with Jim and Paul. I think that universally apply,
but it’s not a function of the characteristics of the community. It is a
function of the characteristics of the data holder, because stewardship
principles are very different for a local government than they are for an NGO
or a community coalition, or a provider in a community.
MS. BERNSTEIN: That is true, but I think if you look at all of the
principles that doesn’t universally apply, so for example, one of the
principles is transparency and notice essentially. It doesn’t really have to do
with withholding the data, but who you are communicating with, how you are
going to go about giving that notice.
It matters more about the characteristics of the community, its size, the
way that you communicate people effectively, and not on the fact that I am
holding the data, I am going to communicate with everybody exactly the same
way. Depending on which principle you are talking about, that might be right or
it might not be. Maybe it is more than just community characteristics, but it
is not only the characteristics of the steward.
DR. SCANLON: I think what you are saying is there is a set of principles,
and they are universal. They need to be adhered to and their application
varies, all kind of circumstances, data holder, community, some other external
factor, et cetera. The principle, transparency is the principle. You just stick
with a set of principles.
DR. CARR: This gets back to my PowerPoint recommendation, we are at the end
of this thing, we are proposing these 10 things. We are asking the Secretary to
accept that these are the 10 things, and we are asking the Secretary to do
research to further understand these things.
If I could, what is it, this is a letter to the Secretary, and we are asking
what? We have recommendations, but we are talking about a lot of things in the
beginning that don’t follow through in the recommendations. I just want to take
and tie what we are recommending.
DR. COHEN: Are we recommending, should this last sentence be, this letter is
to help HHS facilitate the stewardship of community health information?
MS. KLOSS: Yes.
DR. COHEN: So should that be the last statement, this letter is to help HHS?
MR. SCANLON: Isn’t this guidance to communities as well?
MR. SCANLON: We don’t necessarily want to do something to them.
DR. WALKER: I think that is a good point, and I would say there are two
parts of one letter, or two letters. One is what are the responsibilities of
data stewards, whoever they are. Then, another would be quite a different
thing, as sort of what are the things a community should look for, almost a
consumer’s guide to data stewardship, that is the flip side of that.
Then, I think the two need to be very clearly distinguished. I think you are
right, we need both. Both need to be provided, but if they are not clearly
distinguished, it gets unclear who is being governed, who is doing the
governing, whose responsibilities are.
MS. BERSTEIN: Governing is too strong a word, right? We are not governing
DR. WALKER: We use it a great deal in the letter.
DR. FRANCIS: Could we take out facilitate the stewardship? What we are
trying to do is lay out an affirmative framework, that we hope will facilitate,
that will help HHS. Letters have different functions. Some letters say,
Secretary, do X. Other letters can say, Secretary, here is an understanding of
the territory, which we see some specific things that you might do. The most
important thing is to understand that there is a territory here, and that is
helpful. That is our goal in this letter.
DR. MAYS: Can I just suggest that rather than health information, it is
usually health data. The focus is really on the data. Information is a little
too broad, I think.
DR. FRANCIS: I would also take out local, because I think that is too
narrowing, too. Where it says, although stewardship principles are universal,
application of stewardship principles may vary. The communities whose efforts,
because that is the point you want to make.
MS. BERNSTEIN: I am having trouble with the switch from information to data
for the following reason. I think of data as the raw material that we collect
and manage, and that information is what we discern from it and what is
reported out after. This is supposed to deal with the whole life cycle of what
we do, what we collect, how we use data, how we manage it, and eventually, how
it gets disseminated.
At the point of dissemination, it is not data anymore. In some cases, it is
really information. It is results and analysis and so forth.
DR. CARR: I need to interrupt for a second. We have got some parallel
processes going on here. This letter, clearly, you have done a tremendous
amount of work. We are seeing it for the first time, and new themes are
One question is, is it realistic to finish this in the allotted time. The
second is that Ed Sondik is strongly encouraging us to do a field trip down to
the NHANES trailers. We spend a lot of our time talking about surveys and data
collection, and none of us, most of us have never seen it. The proposal on the
table right now is to differ, and to give the members an opportunity to read
this and set up a call, or some input onto this letter, and use the remaining
time to tour the NHANES trail.
I am just saying this is what was asked. I am going to ask for a vote of
hands as to what we can do. This is the only time that we can do it, so go
DR. FRANCIS: All I wanted to say is that we have gone through the hard part.
There is another probably five or 10 minutes of looking at it. We really have
done the hard part. This is where all of the changes were. I, for one, would
feel very sad if we’ve put all this amount of work and we are so close, and to
have what feels like —
MS. GREENBERG: Obviously the invitation to tour the MEC has to take a
backseat to the work of the committee. My concern is, is that there are a lot,
maybe we haven’t, you know, gone through the hard part.
I think you guys have done a lot of work, but I look through this marked up,
and there’s just a lot of changes. It is very hard to personally, I feel, I
mean, maybe the committee’s is more agile than I am, there have been so many
more comments now, just on these first few pages, and I don’t even think we
have reached consensus. I am not even sure we have reached consensus on the
purpose of the letter.
I think the letter is very important. I think it is important for the
committee to issue it. I don’t know that it is critical that it be issued at
the end of this meeting. I particularly don’t want the committee to be doing
this under such a time pressure, that then you have to go back and sort of
revisit it. That is my concern, but I defer obviously to the members.
MS. KLOSS: May I suggest though that even if we aren’t ready to approve it
at this meeting, having everybody here wrestling with these issues is
invaluable. We will lose that.
MS. GREENBERG: I concur with that.
MS. KLOSS: Then, who will be back here in November, because we have got some
MS. GREENBERG: We may not be ready to take a vote.
MS. KLOSS: Let us walk you through what changes, so that you can read it.
DR. WALKER: I think it is unfair to the subcommittee as far as to not give
them our full attention. I think it would be a mistake to send them off into a
vacuum, and to work some more and have this happen again next time. I think we
ought to take what time we need and get this done.
MS. KLOSS: Definition.
DR. WARREN: Are we still on this data information question that Maya was on?
MS. BERNSTEIN: I said we will drop it.
MS. KLOSS: In the takeaway here for me was that we need a little more
discussion about this last sentence. This letter sets out a framework to do
what? I mean, we really need to be crystal clear here, up front. As Leslie
said, our goal was laying out the case why more work needs to be done on this,
not laying out a definitive set of principles ready to be implemented. We hope
that has come through.
DR. FRANCIS: If anyone thinks that the purpose of facilitating and
supporting effective stewardship is not what should be the focus, it would be
great to tell us that.
DR. WALKER: We might say this letter sets out principles for stewardship of
health data, because I think we all agree that is what it does.
DR. FRANCIS: It should be a framework in principles, not a framework in
recommendations. That is great.
DR. WALKER: Is it a framework?
DR. MAYS: You had said something about two letters. I was just curious.
DR. WALKER: I just following on what Jim said, that there are two
communication tasks. One is to say to data stewards, this is the universal set
of principles that apply to data stewards. There is another communication task
that would say to communities, highly resourced and not very resourced at all,
this is what you can expect when someone holds your information, so that it
would be the same thing, just the obverse or whatever you call that.
If we want this ecology to really work, part of what we need to do is enable
communities, to make sure that we know what these principles are, and have a
way to assess whether they are being followed in their case. The baby blood
dots is a perfect example, where the community found out belatedly that those
principles weren’t being observed. How do we help just make sure we do the
other side of it? Often we kind of forget the last mile.
MS. KLOSS: The early thinking was that the next step from this might be a
revision of a stewardship primer, directed at the community. I think you are
right. Are we ready to move to page three?
Here, the major change was to delete references to HIPAA, and generalize
that into just a discussion of current structures for data protection and
ethical use, such as individual informed consent for identifiable data,
de-identification. We tried to generalize that because the discussion yesterday
was concerned that this was setting up these principles to be an addenda or an
extension of HIPAA, and that wasn’t our intent at all. Once we read it, we
didn’t really feel we would lose anything by just deleting that reference.
Then, we go on to add one sentence, that is inserted. These approaches may
not always be adequate or practicalable for community health data uses.
Communities need good stewardship principles to use data to improve their
DR. CARR: Just to clarify that thought, so in other words, there are times
where community data is governed by informed consent or the common rule?
DR. MAYS: I was just going to ask, maybe we can say these approaches may not
be practicable for community health data usage. I like the not always be
adequate out, because I get concerned that it makes the procedures that we use
seem like, oh, they don’t protect me as much. We do a lot of work with the
communities, to help them with some of that first part. If we could just drop,
I don’t think we need it, we can just drop, may not be adequate.
DR. CARR: Is it that it is practical or that it is relevant?
MS. BERNSTEIN: Getting individual consent, for example, for a very large
population is very hard to manage. In some cases, we go directly to records or
we waive consent through an IRB, or we do other things. That is just an
administrative problem, for example.
DR. FRANCIS: We might want to change adequate to applicable. That would
capture Justine’s point that sometimes they do apply. We are not denying that.
MS. GREENBERG: The common rule includes waiving, or you have waiving, so you
have covered it.
DR. FITZMAURICE: I notice you have an example here of the common rule.
Aren’t they more likely to be governed and be abiding by the HIPAA privacy
rule, and maybe security rule, than they are the common rule? How many clinical
trials do they face, but they are going to face a lot more uses, which will put
them up against the —
MS. BERNSTEIN: The common rule covers more than clinical trials, it covers
all federal grants that involve human subjects, including surveys that are not.
DR. FITZMAURICE: Don’t deny it.
MS. BERNSTEIN: Many of these communities are not covered entities and most
of them are not covered by the HIPAA privacy.
DR. FITZMAURICE: How do they get protected health information if they are
not covered entities?
MS. BERNSTEIN: You don’t have to be a protected health information to
receive the information. For example, a public health entity can receive
information, as long as the covered entity is disclosing it properly under the
HIPAA rule. Once it is disclosed, it is not covered.
DR. FITZMAURICE: Agreed, so we are talking mostly about public health
MR. SCANLON: To think in almost every case, the product will be
de-indentified. We want it to be.
DR. CARR: Communities have names of people who found to have certain
diseases, identifiable in detail.
DR. FITZMAURICE: It is strange to take out the reference to HIPAA, if you
are talking about HIPAA de-identified data.
MS. BERNSTEIN: We are not talking HIPAA de-identified data, and that is why
we took it out. We didn’t want to connect the concept of de-identification
specifically with the definition of the HIPAA rule.
DR. FITZMAURICE: Their definition of de-indentification.
MS. BERNSTEIN: De-identification existed before HIPAA existed. We used these
kinds of things before HIPAA. HIPAA only covers specific kinds of coverage.
DR. FITZMAURICE: I understand, I am just wondering what the definition of
de-identified data is then, but let’s move on.
MS. KLOSS: Page four, we emphasize in the first paragraph that communities
differ in many ways, different governance structures, needs, values and
successful stewardship much address these differences in a flexible manner. We
just tweaked that paragraph.
DR. COHEN: Again, it is not only the communities, it is the entities that
hold the data. I think that needs to be added.
DR. WALKER: Just as a general thing, I think we should use data stewards
over and over and over again, to make it clear. That sentence, communities,
researchers, data users, consumers, I take it that some of those are data
stewards and some are not, but they are all in that sentence together. That is
what I mean about just being crystal clear about who is being protected and who
is being whatever, governed or whatever, to protect them.
DR. FRANCIS: Just say guidance is needed for data stewards.
DR. WALKER: I think what this sentence is saying is, and the purported
beneficiaries of data stewards, the potential victims, too. That is what I
mean, I think this idea that there are two fundamental groups, data stewards
and the people for whom they hold the data, are two different groups. I think
our position is one that needs to be protected, the other needs to be managed
or something, some word like that. They are very different.
It is confusing because some data stewards are the communities. One entity
can be in both groups, but it is just everything we can do to keep those two
groups separate, data stewards and whatever the other is, communities roughly
DR. FRANCIS: Maybe just tuck in, after desired by, data stewards, comma,
communities, researchers, data uses and consumers. We heard from all of those
MS. BERNSTEIN: I thought it was trying to get at the comment that you were
making before, that the principles that apply to data stewards also consumers
or users or whatever, understand what can be expected. The guidance is
beneficial to all of those different groups at different times.
DR. WALKER: I think of it as car makers and car buyers. There is a framework
that controls car makers. They have to have seatbelts and there are a bunch of
things they have to do. The recipients of those protections, the consumers,
have a different perspective. They need to be informed about their rights. If
you don’t keep those two separate roles, separate, than when you read this, it
just gets hard to tell whether we are talking about the consumers, just say it
that way, or communities, or we are talking about the data stewards.
MS. BERNSTEIN: I understand that. The way I look at that, with your
metaphor, is if I am a consumer and I am aware that car manufacturers are
required to put seatbelts, if I see a car without seatbelts, I am going to be
worried. If I know what those rules are that apply, then that helps me to
understand what is safe and appropriate for me, even though I am not the person
who has to implement that.
DR. WALKER: If it is a good safety regimen, I am allowed to be ignorant of
that all that and still buy a safe car.
MS. BERNSTEIN: I don’t think that we want communities to be ignorant of all
DR. WALKER: I don’t think we want them to be ignorant, but it is not their
Constitutional obligation to be educated, either.
DR. MAYS: Since I think what we are doing is advisory, I just want to say
that I think this notion of trying to see the letter. This is why I was asking
about the two different letters. I think I am now coming to a different
convergence, and that is trying to make this distinction.
When I edited an earlier version of the letter, that is what I was really
struggling with. I would see some things as if I were in role A, which would be
I am the data steward, I should do it this way. If I am in the other role of
being the person whose data is being used, I kind of felt like I had a little,
there was a different nuance to that.
To the extent possible, as the revisions are done, either having a
reorganization where there is a separate section that talks about kind of each
person’s responsibilities or what they have to gain, I would suggest that. Or
to the extent possible, that we go through and make sure that those two things
are separate. I am more worried about the community. I just don’t think the
community has the same responsibilities as the data steward, and the data
steward is where we really are trying to push for the change, then, for the
community to be informed.
MS. KLOSS: We will clarify that, because when we are using community, we are
using it as data steward. We will be more deliberate in doing that, because we
are addressing this to the organizations that came before the committee. And
they said, here are all the wonderful things we are doing, but we are worried
about trust of data. I think we can fix that.
MR. SCANLON: It is directed at the sponsors of these community health data
initiatives, maybe stewards who are probably the stewards, should be stewards.
MS. KLOSS: Right.
MR. SCANLON: They are clearly the sponsors. They may be stewards.
MS. KLOSS: We will be very explicit in terms of who it is addressed to.
MR. SCANLON: Just say community health data, because when we say community
health initiatives, it takes care of this.
DR. CARR: Just a question on the next sentence, the community recommends
that HHS develop guidance on stewardship practices for use of, would this be
community health data then that we are talking about? Is that the same
recommendation that appears at the back?
DR. FRANCIS: Justine, I think part of what we need to do is combine those
last two sentences. The committee recommends that HHS should develop guiding
principles and resources to enable data users and data subjects to understand
the chain of trust required to be effective stewards. That captures both sides.
It doesn’t look like we are making guidance documents, which is what we were
trying to avoid. Develop guiding principles, take out the develop guidance,
develop guiding principles and resources. Take out from develop principles and
resources, okay, to enable.
DR. CARR: We have incorporated here a starter set, I guess, that is missing
in the recommendations. We also recommend that those principles address the
following 10 things. That doesn’t make it through, I don’t think, to the
DR. COHEN: Can you explain to me the difference between the principles and
the framework, or how they work together? I am just unclear on that.
DR. CARR: I go back to the data definitions, who is the steward, what is the
framework, what is the guiding principle.
DR. COHEN: I see the principles, but I don’t see the framework.
DR. CARR: Let’s hear from the authors, what is your concept of a framework,
and Bruce or Judy, what is your concept of a framework?
DR. FRANCIS: The reason I think that we use the idea of a framework was that
we did not want to imply that this was a set of principles that are necessarily
set in stone. They are a starting architecture that should frame the way we
think about stewardship. There may well be other principles. I mean, these are
principles that fall under a stewardship framework.
DR. COHEN: I see those principles, but a framework implies a structure to me
about how you integrate them, how you follow them, and how you apply them. I
don’t see that framework in this letter, I just see the principles.
DR. FRANCIS: We didn’t tell you we are doing the full framework. We said we
are starting on one.
DR. CARR: I think the word has different meanings to different people. We
either need a data definition in the beginning to set expectations, or we need
to stay away from framework.
MS. KLOSS: We get a little closer to the framework, as you are describing
it, Bruce, in the appendix, which does differentiate between individual
responsibilities and community responsibilities. I think that we saw that that
is where we needed to head.
I will say up front that we did not take out some of the musts in the
appendix yet, so that part is still needing to be tweaked. That clearly is
moving in the direction of a framework.
MS. GREENBERG: Our primer uses the word framework, doesn’t it? It is a
stewardship framework? I don’t know if it uses framework or not.
MS. MILAM: It is the way it used when you look at any other stewardship or
privacy framework nationally, as well as internationally. You have principles
making up the different components of your framework.
DR. WARREN: The problem I have is all I see and hear are principles. I don’t
see any components of a framework. If you are telling me the components of the
framework are in the appendix, then my question is, why not put them in the
letter. Your letter is a stewardship framework for the use of health data. If
you put the development of the framework in the appendix, to me, that says it
is not that good.
DR. COHEN: I don’t see the table in appendix A as being actual framework. I
just see it is a list of responsibilities for different parties, with respect
to the principle.
DR. FITZMAURICE: Suppose we just say stewardship principles for the use of
community health data?
DR. WARREN: Then, I think what you can do is at the end, say this is step
one of developing a stewardship framework. That tees us up for the next big
MS. BERNSTEIN: What else would be in that framework, that makes it a
framework and not a set of principles? We have talked about how we really can’t
place a governance structure, because the Indian communities, some have
governance structures, some don’t. That was the point of this, is that the
governance structure that is appropriate to the community or developed by the
community is what they should use, right, which we say.
DR. FRANCIS: That actually argues for just talking about principles here. I
am fine. We will rewrite this so it talks about principles, not framework.
DR. COHEN: Maya, you need to apply the principles in some kind of ordered
fashion to have a framework, and that is what is missing. That is the
distinction I would make.
DR. WALKER: I think the consensus is that principles is a perfectly
brilliant place to start. It is not a problem.
MS. KLOSS: We aren’t ready for a framework, as you’re defining it.
DR. WALKER: Can I ask another framing question, I am sorry. We say community
data stewards, does this exclude other data stewards? We don’t mean to exclude
other data stewards from this set of principles, right? I think the fact that,
we have enunciated these principles in response to a community data steward
request. That is different from implying somehow that this doesn’t touch other
data stewards. Every time we say community health data steward, I am just
thinking we ought to say health data steward, and that applies to everyone,
communities and researchers and all of the others.
MS. BERNSTEIN: I think we were just talking about stewards of community
data, community data stewards and not community data stewards. I don’t think it
was more than that.
MS. KLOSS: I think we were being cautious to have this more specific and
narrower audience in mind, but realizing at the same time that this set of
principles are more universal.
DR. WALKER: Then we ought to take the Texas Department of Health, because
see the issue there is that this data steward betrayed these principles. Then,
it is not relevant if we are really talking about. I would, by the way, say
stewards of community data, then nobody could misinterpret it.
MS. KLOSS: That is perfect. See, we couldn’t have done this over the
telephone or on SharePoint. Let’s just take a few minutes now to sail through
principles themselves, because we worked hard to take the musts and the
prescriptive language out of this, and just lay them out as topics. Principle
one, openness and transparency, there we moved what had been in three, which
was the communications principle, we moved that into this one.
MS. BERNSTEIN: We made more specific connection between the blood spots,
which was having to do with outreach in particular and communication.
MS. KLOSS: Number two, purpose specification and use limitation. You can
give us feedback on the stuff within that, in the next iteration, but let’s
make sure we have got the titles right. Three is gone, so a new three is
involving communities in decision-making.
DR. WARREN: I have a question on two. One of the things that we are learning
about health data is that later on, with new knowledge and science, we may want
to repurpose the use of the data. How is that handled, because you have got
limitation? You have purpose specification and use limitation. We can say we
DR. FRANCIS: No, we just say you reevaluate it. That is what the second
MS. GREENBERG: I would still just say purpose specification and use, and
then talk about it. That limitation right away is a red flag.
MS. KLOSS: How about purpose and use specification?
MS. GREENBERG: Yes, but I would take limitation out of the title.
DR. FRANCIS: That actually comes from the original Fair Information Practice
MS. BERNSTEIN: That would be a red flag on the other side, to people in the
privacy community and other kinds of advocacy community, because that is sort
of how it is stated. A basic principle of privacy is that information collected
for one purpose should not be used for another purpose, without going back to
the data subjects. Now, we do that in research sometimes, but that is by the
DR. WALKER: I think what Maya is saying, if I understand it, is if we take
it out of the title, that will be read as a message. I think that Maya is
saying is from the privacy community’s standpoint, it won’t be seen as a good
message. It will be seen as a betrayal of privacy. It is just something to be
aware of. I am not saying we can’t do it.
DR. FRANCIS: Actually in FIPS, typically purpose specification and use
limitation are two separate FIPS. It wouldn’t look unusual to just have purpose
specification. We do, in the body of this, say that if there are changes, that
it should be reevaluated. That is not saying that you can’t do it, it is saying
that you have to think about whether, as a good steward, this is a change.
MS. GREENBERG: I think that is reasonable to say.
MS. MILAM: At the same time, when you look at every other framework that is
out there, and there are dozens, not having this, and as I think Maya said, it
is usually a standalone principle. Not having it in its entirety will be a red
flag to the privacy community.
DR. CARR: The way I look at two and three is that today’s reality is we
collect data, and now we make connections where that same data can inform,
enhance, improve health. Even raising the question of, should we go back to the
others, I don’t see how we can go back to the original people and now say, oh,
we now discovered something else. I think it ties more into involving the
community. There you say, this data now can answer that question, and how do we
engage the community around that repurposing of that data.
MS. BERNSTEIN: That is going back to the people. That is one way to go back.
DR. CARR: I guess I read this as if you have to go back and talk to the
original people who were in that cohort, to get their opinion. They may have
been long gone.
DR. FRANCIS: We don’t intend to have it be read that way. What we do intend,
one way that, I will just speak personally, that I see as deeply problematic
from the point of view of public health, is that there has been a lot of
insistence on going back. Part of the whole point of some of the earlier
framework stuff is that this is not a good model, the one you just described,
for many public health circumstances. That doesn’t mean anything goes, so that
is why we wanted to say reevaluate under these stewardship principles. Maybe
the best way to do it would be to say specification of purposes and uses.
DR. CARR: Maybe what needs to be explicit is in this new world, data will be
repurposed. Maybe that is the message that people need to get when the data is
used, and not the expectation that it will never be used. I think it is all
being reused, that broadens the purpose.
MS. BERNSTEIN: This doesn’t say it will never be reused. This says, if you
make a significant change, you have to reevaluate how to do that. That is all
this says, but you don’t want to get into the Havasupai case, right, where you
have got something, where you collected information for one purpose, used it
for something completely different, and you have got a lawsuit on your hands
because the researchers went off on some track that the community completely
didn’t expect, and objects to.
DR. CARR: I guess researchers are guided by their IRB common rule, whatever
that kind of stuff. I had thought we were talking about the kind of public
census information that was used to define a community. Now, we can take it and
marry it up with something else, and tell a new story. When I gave census
information, I thought they just wanted to know how many people lived in my
house. Now, I find that it is being repurposed for some other thing. That is
what I think we are talking about. Anything that is guided by a consent, an
IRB, privacy research, that has a whole separate set of rules. I think maybe
that is where we are getting confused, for me.
MR. BERNSTEIN: Also, things that are also in the original consent form may
be more narrow than future uses. I think this is Judy’s point that, in the
future, someone may look at the data and go, you know, I can tell a different
story with this data that the original consent form did not anticipate. Now,
what do I do? Even if I had an IRB at the time, I mean, we may not.
DR. CARR: These are two separate tracts, and I think trying to create a
middle ground is what is making it difficult. Anything covered by consent needs
a separate thing. Data that is in the public domain, maybe collected, you have
a heel stick because you wanted to know if my child had homocystinuria.
I don’t even know if that is the right answer, but things that are in the
public domain already, that have been repurposed, I think people should know
when data is collected about them in the public domain, it likely will be
repurposed. That is all of what we are talking about today, of matching stuff
MR. SCANLON: I think this is really sort of getting at the crux of this,
plus who exactly does this. We are mixing up publically available data that is
identified and often statistical, that is meant to be used for the vital
statistics rates, the infant mortality rates. Who cares how many people use it
over and over again for counties or cities? That is meant to be an indicator.
We are not revealing anything about individuals or causes of death, it is a
The whole other area, where you have done community research, you may or may
not involve HIPAA, it could be re-identified, or its publication would result
in harm to the community. Those are things that are almost a different set of
guidance for the data developers, I would think. Mixing them together, I think
you are scaring people here with a level of governance and regulatory
framework, even though we are not saying it, it is publically available data.
MS. BERNSTEIN: There’s no governance here, there’s no regulatory framework,
and these principles apply to both what you are just talking to.
MR. SCANLON: You have to make this distinction or no one will know what to
DR. WALKER: I wonder, it sounds to me like what we are talking about is
something like accountable reuse. If you are going to reuse data, there is an
expectation that you justify that to yourself, to others, in some kind of way,
to be specified further, and maybe undoubtedly situation specific. It is
accountable reuse, and something like that, I think, is what we are trying to
Yes, information will be reused. Some of it was even designed to be reused.
Whatever the case is, when you do that, there should be a set of questions you
ask and answer, and record publically probably, that prevent the public feeling
blindsided, communities feeling like it has to sue.
DR. COHEN: This discussion just needs to be expanded.
DR. CARR: I want to hear from Glen and then Vickie and then Bruce.
DR. NICHOLS: Just very briefly, thank you, I just wanted to have everybody
go back and read this sentence. It just says stewards also need guidance about
when types of data might be considered. It is just saying be mindful of it and
that we ought to develop standards to govern the reuse. There is nothing, I
mean, I am as nervous about having avenues to research cut off as anybody. I
don’t think that is what is happening here. I think what is happening is a
recommendation that we actually think about what it means to design a principle
around which reuse would occur. I think that is all they are saying.
DR. CARR: It is the first sentence, the purpose of data collection and use
should be explicit. Today, I am explicitly collecting these data for X, Y, Z.
What do I do tomorrow?
DR. NICHOLS: It is saying we should develop guidance. It is not saying you
can’t do it. It is saying we need to think about how to govern that situation.
DR. CARR: I am saying that there is certain data, by definition, is for
reuse, and that is what we need to say.
DR. NICHOLS: That would be part of the guidance, I think.
MR. BERNSTEIN: That is an explicit purpose you can specify up front, if you
know up front. This is a question of what happens when you don’t know up front.
How do you deal with that situation, because if you know up front, Justine, you
can give that notice.
DR. FRANCIS: Then, actually what it is trying to say is, when you have the
new purpose, you should state it. You should think about, not that you are
limited, but you should think about under these principles whether the due use
is okay. That is all it is saying. If we are not clear about that, that’s an
DR. CARR: Vickie and then Bruce and then Marjorie.
DR. MAYS: If these sentences stay, the purpose of the data one, the
significant changes, then there needs to be at an earlier place in the
document, a longer discussion about these different types of data and I am
going to tell you why. When you say significant changes from the original
purpose should be reevaluated and the principles of community health data
stewardship, the person in the community is not going to understand the
different between my NIH grant, in which I had an IRB that had community
members on it, that allowed me to have brought a different use later in my
Then, I can see I will be out of the meeting and they will say, then, we
have to do these principles of community health data stewardship. It is like,
no, we already had consent to do what we did. It feels like then that, on the
subject of, I did something that wasn’t honest or something.
There are too many different types of data, and I think the notion of what
we are really about to experience, which is what I think we want to deal with,
it is almost like our group that is coming later actually is going to start
trying to figure out ways to connect all kinds of data together, to give it to
It is going to be repurposed, beyond what I think any of us are sitting here
imagining. We want to be ahead of the curve. I think you actually will do an
incredible contribution to help the community understand the notion of
repurposing, and how then technology, as well as the departments like open use
and trying to facilitate use of the data more, for them to think about those
things would be great. I mean, it would be exciting to the community, I think,
to hear it that way.
This way, there is another side that, unless we tell them, well, you were
given the chance. We did tell you that we would come back and do this. I think
we are at a place where we can make an incredible contribution, that would be
great for the community.
DR. FRANCIS: Could I just ask a question, because this could be written in a
way that says, with research data, we have the following regime that applies.
With data originally collected for public health, we don’t have this regime
that applies. There are still questions about repurposing. There are also
questions about repurposing research data. They look just different in
We could write that. It is at least three paragraphs to write, and the
letter is really long if we do that. I guess one of the things we were trying
to do was cut a balance between length and raising the questions. If having it
be this short is seriously misleading, then we cut the balance the wrong way. I
just want to raise that for reactions, because at least my sense is that in
order to be responsive to a lot of the kinds of comments, this is going to have
to be a longer letter.
DR. WALKER: Vickie, I would have thought that when you are in that public
meeting, you just would have said, this is the original purpose that this
information was collected for, and here is the process we went through. It is
probably a teachable moment for you.
I would have thought the first sentence here would have taken care of that,
because what you are talking about is not reuse, it is just someone
misinterpreting your original specified carefully-worked out use, as reuse. You
just need to explain to them, no, the reason this data was collected was for
this exact purpose.
DR. CARR: I do have Bruce and Marjorie.
DR. COHEN: A couple of things. I think you are correct. I think in your
attempt to be parsimonious in your words, you lost the richness of the context.
Some of these basic declarative statements need more context, so everybody
reads them and understands them the same way. That is what is I happening. It
is clear to you what you meant when you wrote them, but it is not clear to the
reader. I think the letter needs to be longer, to explain some of these things.
That is my first point.
My second point is, you say stewards also need guidance. Guidance from whom?
Is the intent there from the community, from some other body? I don’t know who
is going to provide stewards guidance. Again, that is another example of your
attempt to be discreet, but it raises more issues than it clarifies.
MS. KLOSS: It does relate back to the discussion of where this came from,
from the testimony of the communities, that indicated a need for guidance. We
were referencing that back.
DR. COHEN: The intent here is that HHS is going to provide that guidance to
DR. FRANCIS: Or facilitate someone else doing it.
DR. COHEN: Okay, whatever the answer is, it just needs to be here, because
it raises more questions to me not knowing who is going to provide that
MS. GREENBERG: This has been a very interesting and rich discussion. I think
there is no doubt that there is need for something like this. There also is no
doubt that some additional work, I think, needs to be done by the committee
before we can ask anyone else to do additional work, like the department or
I think what you said about needing context is obvious. I loved the letter
when I read it the first time because it was clear, it was crisp, it wasn’t
bogged down. All right, maybe lack of context, too. I think where you suggested
a few times, Linda, that really what this might be leading to is an updating of
the stewardship document, the primer, which could really spell out these
different types of data, different types of uses, different types of issues,
either in several appendices or something else. I think that that is probably
what is really needed. Whether the committee has the bandwidth to do it, we
have already started it, so maybe you do, at least to take it to a certain
I just want to challenge the concept that data will always be reused, you
just have to be clear about that. I think there are cases where certain
repurposing is inappropriate, and as Maya said, could get you in a lot of
trouble if you have not gotten either, whether it was informed consent or the
waiver, or whatever understanding under which you collected the data, really
doesn’t permit that.
I don’t think we want to go on record saying, we just want to be clear that
that is going to happen. It shouldn’t happen sometimes without either going
back to the IRB, going back to the individuals, doing something. Exactly what,
it depends on the circumstances. This is maybe kind of a wordsmithing, except
that I think when you are talking about number two, I would rather you said the
purposes and uses should be explicit or should be clear.
It isn’t always, you don’t have to only have one purpose of data collection
or one use. It is just that, if you have multiple purposes, you should make it
clear. It is like you said, if purposes are allowed, I mean, we collect some
data through HANES where we tell people, this is the way it is going to be
used. It might be used in these other ways, too. The point is that you have to
really be transparent and open.
I think these nuances need to be, if not in this document, in that updating
of the primer. I agree with Vickie that it will be very useful. It obviously
needs some additional work. Now, the question is whether that is where you want
to go next, or whether you want to start with this letter and say that you are
going to do that, and then that would be your next product, one or the other.
DR. CARR: If I refer to our agenda, we are actually at the juncture where we
say summary steps and future directions. I think that is a good way for us to
wrap up this conversation, this very rich discussion, and then also we turn to
the discussion we had from 8:00 to 10:00 this morning, and decide on next
Larry, I am looking to you a little bit, to put your perspective and how you
would like to see next steps, because you will be sitting here.
MS. BERNSTEIN: Before we move on, I could just ask that we wrap up this
letter by asking members of the committee, if they have further comments, one
of the comments was that we hadn’t seen the letter, that we continue to work on
this, that they really make an effort to read it carefully and make comments,
send them to myself.
MS. GREENBERG: You want to make changes to it, based on this discussion,
DR. FRANCIS: We will make changes and send a new version around. A couple of
things, before I do that, I really want to be sure of, which is that it is okay
to do a letter because earlier on, we actually asked this question, whether it
should be a letter or a primer, and we got the sense of the committee that it
should be a letter first, so we wrote a letter. It should be understood that we
are going to start with a letter. If people really think we should start with a
primer, we should know that.
The second thing really early on is that it will be longer because it is
going to need to give context. If anybody has any trouble with that, we ought
to know that now, too.
DR. CARR: I think the whole discussion of letter, primer, length is
secondary to what is it that you want. What is the ask, what is going to move
this forward? If there are things we need from the Secretary, we want to
articulate them and move that forward. If there isn’t an ask, it is a
reflection on what we heard, then it shouldn’t be a letter. I don’t want to get
locked in stone. I think the dialogue has been very rich, and I think the
sensitivity about the importance of this type of document has helped us be very
meticulous in saying what we mean.
As difficult as it is to kind of hammer through this, this is really when we
do our best work. I think it is because this deliberative group, community,
committee really does deliberate, because no one else will. If it feels hard,
it is because it is hard. Every contribution to this is making this sharper and
more helpful. Take it as an affirmation we are on the right track, and it
should be a letter if there is an ask.
DR. FRANCIS: That is something I have never understood because couldn’t a
letter inform the Secretary? In order to take the form of a letter, does a
letter have to say, Secretary, we need you to do A, B and C?
MS. KLOSS: I think there is an ask here that isn’t something that we need a
new taskforce for or something like that. It is an ask that underscores the
dynamic of what is going on in community health daily use, and raises this as
an issue that needs attention.
I think there are a lot of different ways that can be carried out by the
Secretary, through new thoughtful provisions perhaps in granting and other
varied ways. I think that we were seeing this as thinking that needs to
permeate a lot of things, not being one single sort of project to be done.
MS. GREENBERG: Let me just say that that kind of letter, the committee has
done that in other cases. This is where we are in thinking about this issue. I
remember with the PRMI standards. Then, it gets it out to the broader
community, the health industry, the communities, whatever, at large, too, so
there are some opportunity, if it works right, for people within the
department, outside of the department, et cetera, to communicate back to the
committee and say, we think this is going in the wrong direction and this is
going in the right direction, whatever. I think there are purposes for letters
that go beyond adopt this standard for this transaction.
MR. SCANLON: The purpose here is to inform, somewhat persuade. If the
committee said, we have become aware of this, this represents what we heard in
our thinking to date, we will continue to look at ways of approaching this. I
don’t think we have something that HHS can do much with at the moment, other
than do that. Even explaining it to our data holders, I am not sure they would
I think you have identified an issue that is an emerging issue, and I think
the letter should be an informing letter, and say you are doing it and you are
looking into it and so on.
DR. TANG: Maybe I might summarize it a little bit, in terms of the ask. We
are pointing out a need for universal protection. Because it is universal, that
is the federal government kind of responsibility.
The suggestion or ask for the Secretary is it would be wonderful if we had
uniform guidance about how to be a good steward of community health data. That
is voluntary at this point. If it is widely abused, then it should be
mandatory. It is still front and center. When it is privacy, it would be
wonderful, we could do things voluntarily. Because we are state-based, we would
like to have some uniformity. They ask us for uniform guidance, that could
address that thing which really is a universal right of citizens in this
DR. CARR: Also, we would like to seek input from the working group on data
access and use this afternoon. I don’t know if we will have an update on the
letter, or at least maybe we will take a look at the ten principles and provide
you their feedback, as well. The plan will be that we work what we have,
circulate it to the full committee with the timeline for their feedback,
incorporate that. Then, it will go to the executive subcommittee in preparation
for the November meeting.
DR. MAYS: Can I just ask for clarity, because I hear two different things?
There is going to be an ask in it or not an ask. Maybe at the very beginning,
tell us that, so that as we read it, we will know what we need to help with.
DR. CARR: I think that will be helpful, because there are rich things in
there, and I think that just calling them out, deciding on the ask is good.
DR. FRANCIS: Jim put the point of the letter, I thought, very well.
DR. CARR: That brings us to our third theme. That reminds me, there may be a
little more work to do to tighten up the document that we discussed yesterday,
which was actually just the minutes of our meeting. At some point, I think we
may want to revisit that document, and perhaps seek input on that, in terms of
our guiding principles and our work.
MS. GREENBERG: Were you thinking of revising? People do it all the time, I
was going to say we don’t want to rewrite history. History is in the view of
the writer obviously or the historian. There is one thing that just documents
what the executive subcommittee discussed. At the same time, I think there was
some useful and good input yesterday. You just have to decide how you want to
DR. CARR: Kind of having it, I think someone said it yesterday, Jim said it,
dynamic document, because it is a moment in time. The principles are things
that we have learned along the way, we may revise, et cetera, and the focus.
MS. GREENBERG: We can even just introduce them as a summary, as modified by
discussion with the full committee, so we have a single document.
DR. CARR: All right. Then, that brings us back to our discussion from 8:00
this morning. Then, Larry, I will turn it to you, in terms of how you would
like to see next steps on that work.
DR. GREEN: First of all, Sally and Paul and I all want to thank you for
coming to the 8:00 a.m. discussion. I think our first next step is we get a
nice summary of that discussion, sculpted toward a conceptual framework and
also narrowing that framework down to some focus particular work. I think the
next step is a written document that allows us to say, yes, it is the same
meeting I went to.
I think the second step is we need clarification of the federal players that
are relevant to care about and involved in this theme, what we were talking
about. To make that specific, I studied the minutes from last time, and we
heard from the ONC about issues that are pertinent to this.
We heard about the Center for Consumer Information and Insurance Oversight.
There is the new workgroup and I will be engaged in the workgroup in this area.
Brian Civic is the new CTO for HHS. This is extremely pertinent to his charge
and his work. We have the CNS Office of Information Products and Data Analysis,
that is pertinent. It goes on, and I am frankly befuddled by all of that. We
need some clarification, I think, probably from Jim and Marjorie. When we start
chewing on this theme, these are the folks that have to come along. I am sure
our liaisons can help us with that, too.
MS. GREENBERG: In fact, in that regard, I just wanted to mention that Seth
Foldy, he had a two-year appointment at CDC, and his two years have ended. He
has gone back to I don’t know if it will be the private or the public sector,
maybe both. He obviously will no longer be the CDC liaison, although he was
working to get a new one.
I think first of all, I would like to propose that we write him a letter,
thanking him for his liaison function during the time that he was liaison to
the committee. Also, that this project or this theme is the one that I think we
could use the most help from CDC, to say as we discussed this morning. This is
an area that they work in and have done work and are very involved with, so
that we could actually rather than just asking for a liaison, we could ask for
a liaison particularly who could maybe help support and bring CDC expertise to
this project, and help define it for that matter.
DR. GREEN: A key thing that I have heard mentioned at least four or five
times in the conversations is we want to not do something redundant. We want to
step into a space that needs to be stepped into, and where we are positioned to
be the right group to be doing it. It seems to me, as part of the next step, we
had better get clear about that, so that a year from now, we don’t have a
discussion saying, why the heck did we get here.
MR. SCANLON: I was at a meeting the other day where everyone talked about
being agile and lean, and not necessarily have a biblical outline before we
start a project. Basically, take it step-by-step. Here, I think we need a fair
amount of exploration before we know what it is.
I am wondering if we should start with some facts like what exactly does CDC
have in the nature of community, health information, products and services, and
maybe CMS and others, just so we have a better sense. I think there may be
organizations, I think the folks know, that do this, as well. There is the
whole community health indicators project that we were doing.
I don’t know. It is probably maybe a subcommittee hearing or just a meeting.
I just think that we haven’t really don’t a good environmental scan. You are
exactly right, Larry, and I don’t want to commit to some specific report when
it probably was true that someone else had a lot better.
DR. GREEN: Staying pretty operational, I think the co-chairs of the
executive subcommittee are going to have to work with Marjorie to have a
manager approach for ourselves. Now that we have got the three themes
consolidated, and we have a new workgroup and we are about to be 18, and we are
going to suddenly have people sitting around the table, have no idea what the
heck we are talking about.
I am thinking that the next step is to do as much preparation as we can in
orienting them, so that we do our best to put them in a position where they can
start being effective now, rather than two years from now.
I am going to hang myself again. You don’t often have opportunities to hang
yourself in public twice for the same crime. The last meeting we said, we want
one of the Susans. It appears to me that we have lost one of the Susans, I
don’t want to lose the other one. Susan Queen is working with another group, so
for our theme, we want the other Susan.
The point here is that, I bet we have got unanimous opinion. We are putting
pressure on our staff. We cannot pursue this community as a learning health
system them. After the conversation I just heard, I am pretty sure that to be
successful, we are going to have to really tighten up our staffing, and people
assess the implications for that.
Right now, at the most, the publication subcommittee has no lead staff. The
next step is resolving that, there has got to be lead staff here, I think. I
want to invite both Sally and Paul to add other next steps. I have one personal
one, and this one I am going to go off the ranch from reporting about
communities learning health system theme, and pretend that I am about to become
the chairman of the whole committee.
These are work assignments for you guys. Pretend you are in fifth grade and
I am your teacher. This is a homework assignment, okay? Think about it that
way. Read before you get here, read it all. I am absolutely convinced from the
conversations I have heard last night over dinner and here this morning, that
some of us arrive here not knowing really what has already been done. We
haven’t digested it, and we learn about it as we go. That slows us down.
In the instance of the communities learning health system, please read our
report, from cover to cover. If you can spare a little more time, read the
appendix, because so many of the issues that come rolling out on the table,
they are already there. They have been debated by those whose shoulders we are
standing on, they have been expressed. Then, if you read it and you say, this
is wrong, bring it to the table immediately, please. Consider that a homework
The second one, if you have never read anything from or about the Folsom
report in 1967, go online and track something down about that, to the point
that you can come back next time and know what a community solution is. What is
a community solution? It was the key idea, produced by the American Public
Health Association in the 1960s, after doing a bucket load of work. Once we do
that, I think it will provide us with some common understanding where we won’t
have to have some of the discussions that we seem to have to have right now.
Thirdly, would you take about 10 to 15 minutes, and go on the web and Google
NIH community engagement. There is six years of work funded by the CTSAs in 60
locations in the country that have various levels of community engagement stuff
that often say what I have heard you say maybe five times this morning. It
seems to me that we could build a common understanding of where the country is
around this community engagement stuff, and why this letter from the privacy
committee is so important.
I am going to end with this. The discussion so far, Justine has reminded me
of two things. One is that Obama video, after he was elected, but before he was
in office and the economy collapsed. Our first thought was, can we get a
recount? After that discussion, I wonder if I can reconsider.
DR. CARR: I want to then take it back. We talked about this letter and
standards. Walter, you updated us, was there anything more in the work on
DR. GREEN: Before you say that, can I say one more thing and then go to
Walter and then I will quit. I will be done, I won’t have to do anything else.
The other thing is I had this quick exchange with Matt at the break. He was
telling me about a sergeant that he had a discussion with, about how they could
put a nuclear weapon on a shell in a Howitzer and shoot it off to distances as
far as 28 miles away. They developed a work plan for that. It was going to be
all right because they are going to wear special suits.
Where we are with this theme of the communities of learning health system is
that everything has changed, except the way we think about it. That is why we
have got hard, difficult discussions to have, because the committee’s role and
position for these communities of learning health system, is to step into this
space where there is a missing infrastructure. They don’t even know what a data
steward is. There is no place for the data steward to live. There is no one
that will pay the data steward.
This is a momentous shift. What is happening at a community level around the
country, the change is unleashed and it is rolling right along like crazy.
There is urgency for this framework that this committee is calling out for.
There is going to be trouble here. There is already trouble here. You want to
predict that because it is already here.
We have serious work to do on this. I want to ask each of you, as members of
the committee, to do two things. One is, help me help you. I am going to shift
roles after this meeting. My number one job here is to help you get these
themes explored. You can help me by doing your homework assignments, coming
prepared. I am going to get Justine to write down and send me that list of
things she blurted out really fast. We may pass that out every meeting, we may
pass it out in the middle of every meeting. I am going to try to become a
manager of the process. This is, I know, basically impossible. Please, I would
ask you to just see me that way. I want you to know that I need your help to
manage this group.
Secondly, I want you to know the following. I have come to admire each of
you. Because of that, I am really quite confident. I can hardly wait to meet
our new members, because I don’t know them all. It is a great group, and never,
ever think that I don’t have respect for you. Never think that what you have to
say won’t matter to me. It does, it has and it will.
When you see me get frustrated, I will be frustrated because we seem to be
inextricably stalled, going around in circles, making the same points again. I
will try to unstick us. If I unstick us in a way that offends you, you will
tell me and I will try to do damage repair and that sort of stuff.
I will quit being a strong advocate for the communities of learning health
system in 30 seconds. I will become a strong advocate for this committee,
making progress in all three themes. You will have to help me, and know that is
my intense desire. I will do my homework, you do yours.
DR. CARR: Walter, was there anything you wanted to add?
DR. SUAREZ: Why do I get to follow all of these difficult? Just to build up
on what you just said, Larry, I think standards is a reflection of a lot of the
things that are happening in the market. It needs to be and it needs to support
what is happening in the market.
As Larry said, a lot of things are changing and a lot of the things that we
focus on in the standards world are changing significantly. The administrative
transactions of the past are now being looked upon and saying, are they still
the right ones. In light of all of the transformation of the health care
system, the experience, should we be looking at that.
Since a lot of things are changing, and we haven’t changed the way we think
about them, I think that is what we, in the standards community, are going to
begin to do, is change the way we think about things, based on the changes that
we see are going to happen.
We are already talking about, for example, how we were asked to identify
standards for attachments. The word attachments and the word claim attachments
are probably a relic of this old view of how health care is being done. We are
not thinking or going to begin to not think about that, that way, but more the
importance of the need for information exchanges that are happening already,
that are going to expand and are going to expand not just between providers,
but in the content of the exchange itself and the substance.
Our task, I think, into the future, and we already started talking about it
at the standards committee, is to really charter our course with that in mind,
the changes that are coming into the future, how we need to really transform
the way we see those, and how we need to think about the changes and the
standards that need to support those transformations.
MS. GREENBERG: I want to just ask regarding the third theme. This morning,
at least I came to the thinking that, our discussion this morning was very much
around both the first theme and the third themes. We kept talking about
convergence, we talked about all. My original suggestion that we spend maybe a
half a day on the 15th or whatever, trying to address that third
theme, now I am thinking that it would be different, more theme one than theme
We could try to do some work with the chair and the relevant subcommittee
chairs, et cetera, with Susan Kinon, to look at past work, look at related
work, a short sort of environmental scan, prior to that meeting. Is that
something that you are interested that we would at least poll for, to see how
many people could stay over for a half day on the 15th? Or maybe we
reorganize the two days that we have. We still would want to, I think, have a
half day for the working group, right?
DR. CARR: I really think the 8:00 to 10:00 timeframe today was terrific. I
would frame it as committee time, working on these themes. It is not
populations, it is not quality standards, it is everyone. I think continuing to
create, within the time we are here, to work on those things. That is one part
I think the other thing, a little bit of housekeeping, we need to identify
who is on the executive subcommittee, because we have got co-chairs that become
chairs, we have chairs that have become. You may want to do that offline.
MS. GREENBERG: We are in this transition period, but I think Sally and
Larry, who are the co-chairs of population, except now in five hours or
something, Larry will be the chair of the full committee. He can’t be the
co-chair of that. They have asked someone else to serve as the co-chair. Did
you want to mention that?
MS. MILAM: I would like to let everyone know that Bruce Cohen will be the
new co-chair of population health with me, and I am really excited to work with
MS. GREENBERG: We have Paul, who is at least two people. At the same time, I
think this is part of what the executive subcommittees. Obviously, Paul is on
the executive subcommittee, Bruce, Larry, Sally and Ob and Walter, and Leslie
and Linda, so I think that continues. I think we will need to have a call of
that group. I think that is clear.
What we will have to discuss is, it is very possible that one of the new
members would be a good co-chair for quality, except if quality is not going to
have a separate agenda. That needs to be discussed. Obviously, at this point,
we are not prepared to name a new co-chair, I assume, of quality.
I would suggest a call of the executive subcommittee as soon as possible.
What I would hope is it could be in the next two weeks, because then I am going
to be going on some international travel, and we come back and there is a
meeting. We will poll for that, and with Susan, who will be involved, as well.
DR. SUAREZ: One quick object from the standards committee I forgot to
mention. During the standards subcommittee meeting, we actually talked about,
well, we are going to be having conference calls monthly, but convening a half
day hearing on the 15th. We thought the day before, that is Monday
the 12th, which is a holiday. The 15th is the day that we
were targeting for.
MS. GREENBERG: That is why the executive subcommittee needs to have a call,
as I said, in the next two weeks. Talk about the November meeting, talk about
the 15th, maybe you want to use a half day, and this other activity
be the other half day, or how we are going to structure.
I think right now, as I understand it, we have one action item for the
November meeting, and that is a letter, the stewardship letter, yes. Do we have
any other action items? If not, we can move as much of the full committee
meeting as possible for some of this continuation of this morning, et cetera.
Maybe then just seed the half day to you all. I would like to have that call,
so that we can all agree on that.
DR. SUAREZ: The other one that we already started to plan is in February. It
is not too early to do it. In February, we will need also at least a half day,
if not a full day, hearing before the committee meeting. I think it is
Wednesday, February 27.
DR. CARR: Let’s take it offline, because I realize we have 30 minutes for
lunch, and we have our speakers are calling in at 1:00. We really do need to be
here in place, ready to listen at 1:00.
(Recess for lunch)
A F T E R N O O N S E S S I O N
DR. CARR: Welcome to the afternoon session of NCVHS. Do we have our speakers
on the line?
MS. GREENBERG: Do we have Jonathan and Khaled on the phone? We are just
reconvening here from lunch.
DR. CARR: Well, let’s bring this meeting to order. Thank you very much to
our speakers, Jonathan Gluck and Khaled El Aman. We are very grateful for you
making yourself available to us, on the topic of de-identification methods for
open health data.
As you know, we have a working group now of the NCVHS that is focused on
data access and use. This is a topic of particular interest to that group, as
well as to the NCVHS in an ongoing fashion, going back to our report on
secondary uses in four or five years ago.
I will open it up to you. Do we have slides or is there anything we need to
follow this presentation?
MR. GLUCK: The first half of the presentation, there are no slides. For the
second half, which is Khaled’s, Khaled does have some slides.
DR. CARR: We will open it to you, Jonathon, thank you.
MR. GLUCK: Good afternoon. My name is Jonathon Gluck and I am a counselor
for Heritage Provider Network. I also do other special projects for Heritage,
such as manage the Heritage Health Prize.
Initially, I want to thank you for giving me the opportunity to speak to you
today. I apologize for having to do this over the phone, but I just simply
couldn’t get away for two days.
I think it is important to start off by giving you some background into who
we are, why we created the prize, and describe how the privacy issues drove
many of the decisions we made about the structure of the prize. Khaled will get
into a more detailed discussion of the de-identification methods that were
used. I think it is very important to understand the business decisions behind
the prize, and how they were impacted by the privacy issues.
To start with the brief description about Heritage Provider Network,
Heritage is a fully integrated physician’s network that was founded by Dr.
Richard Merkin about 30 years ago. Heritage is spread throughout Southern
California, from San Luis Obispo to the north and to the west, to San Diego in
the south, Bakersfield and Palm Springs in the east.
In Southern California, we have approximately 35 physical clinic locations,
which range from 100,000 square foot, almost mini hospitals, to small offices
that might only have 10 doctors. We employ approximately 400 doctors at these
locations, and then contract with an additional 3000 primary care doctors,
30,000 specialists and 100 hospitals to provide care to the members.
In the industry, we are kind of what’s known as the clinical model with the
wraparound IPA. We also have operations in Arizona, as well as the five
boroughs and Long Island and New York. We have approximately 700,000 members
for whom we care.
Heritage is a full-risk, fully capitated medical group. By full capitated I
mean that we are fully at risk for both professional and hospital claims. As a
full-risk group in California, we have a limited Knox-Keene license, which is
the license required by the state to take hospital risk. It has far more
stringent and tangible equity requirements, reserve requirements. For an
average medical group, because the state wants to make sure that the licensee
has the wherewithal to pay expensive hospital claims.
Because we are at risk for those hospital claims, controlling hospital costs
and reducing unnecessary utilization of the hospital is critical. To that end,
we have lots of programs which aim to reduce hospitalization. We have, for
example, chronic disease case management, where we risk stratify our population
to provide them different case management techniques, depending on the severity
and type of the illness. Programs for diabetics, COPD and CHF patients.
We have pharmacists that will go to the home to do medication,
reconciliation, post-discharge, home-visiting doctors that will visit a patient
who can’t get out of the house to get to the doctor, because we know that the
alternative may be to dial 911. We want to prevent that unnecessary
hospitalization. It can be prevented simply by a doctor going to the house.
Then, we have 24/7, 365 nurse/doctor hotlines the patients can call, all in an
effort to prevent hospitalization.
In addition to these programs, we wanted to see what we were missing and
what other component could we create that might add to what we are already
doing, and specifically to do something through the use of data that would
allow us to find new ways to attack this ongoing problem of unnecessary
utilization of the hospital.
Dr. Merkin, who is the founder of our group, is also on the board of X
Prize. You are probably familiar with X Prize, they are the ones who created
the Ansari X Prize which awarded $10 million to the first group that sent an
individual 100 miles into space and returned them safely to earth.
We wanted to do a prize that involved health care. Dr. Merkin is a
mathematician by training, and for a long time has believed that the use of
data in health care has lagged somewhat behind some of the other industries,
such as the tech industry and possibly the finance hedge fund industry.
We also wanted, through the use of a prize, to open up what we considered to
be some of the best young minds in the country, to the possibilities that would
exist in the health care field, that they may not realize. Typically, when we
speak to these types of individuals, when I have spoken about the prize at
Strata Conference or elsewhere, these individuals are really thinking about
tech or finance or some other industries, and don’t really think about
healthcare. We wanted to use the prize to kind of open them up to the
possibilities in health care.
We began discussing a prize to predict hospitalization, which we believe
would solve a real world problem, and do so through the use of readily
available data. Now, the goal behind predicting hospitalization is simple. We
know that unnecessary hospital utilization in the United States is a $40
billion a year problem. We also know that you are not going to be able to
prevent every hospitalization, nor should you.
We also know that among hospital visits, there will be many that can be
prevented through the use of preventive care measures. Indeed, many of the
types of measures we have used for a long time. We wanted to do a better job of
identifying those members, who would benefit from the preventive measures. We
began discussing the creation of a data prize to find these individuals, and
predict these individuals who would benefit from the care protocols.
When we discussed the prize, there were really two critical components that
stood out above everything else. Number one, we wanted to make sure that the
prize was real world usable and it had real world results. We do a lot of
analytics work today, just as does most of the other larger health care
companies. We attempt to use the data to risk stratify the population, to
decide who would benefit from which care management protocols.
This work, however, relies largely on physicians, using their years of
experience to place the patients in the risk bands I discussed. We wanted to
make sure that the winning algorithm would do a better job than simply the
human beings that had previously been working on the problem, or would add
additional knowledge that could then be used in the real world to give us
The second, and I must say equally as critical a component as we were
discussing this, was the need to make sure that the data was de-identified.
First, we obviously had to be HIPAA-compliant. I mean, that was a no-brainer,
it goes without saying. This obviously had to be HIPAA-compliant.
On a more mundane level, we could not take the risk that the data would be
re-identified. This was one of the first times that such a large and detailed
data set had been made available, generally online. As a for-profit business,
even if the data was HIPAA-compliant, re-identification of the data would have
been a public relations nightmare. That de-identification privacy issue was in
our minds just as much as the real world usability issue was.
We quickly realized that doing the de-identification in-house was going to
be challenging, to say the least. We are obviously HIPAA complaint, but we are
not data de-identification experts. We asked around and quickly were led to
Khaled, who came on-board to do the de-identification process on our data.
Now, as a full-risk medical group, we have claims data, encounter data,
pharmacy data and lab data on our patient population. The original intent was
to provide the competitors in the prize with the full data set, each of those
components, to each of the competitors. We knew, in speaking with people who
have run these types of prizes a number of times, the richer the data set you
can provide, the better solution you are going to get. The more information you
have to pull out, the weaker the solution is going to be.
However, after discussing with Khaled, it quickly became apparent that we
were not going to be able to provide all of the data we had wanted to provide,
without running into two greater risks for re-identification. Khaled is going
to discuss the details and specifics as to what we had to pull back and why.
We had to make a number of revisions to what we intended to release, to
ensure that the data remained de-identified. This did not allow us to provide
as rich a data set as we had originally intended.
In addition to having to pull back data from release in order to assure that
it was not re-identified, we also tried to create a strict legal structure
around the release of the data, as we possibly could. We made entrants enter
into what many of them considered, because many of the people we were dealing
with think that data should be for everyone, and data that is released should
be used as anyone wants.
They thought our legal structure was very onerous, but we made everyone
agree to keep the data private, not attempt to re-identify the data, as well as
certain other legal hurdles we made people jump over if they wanted to
participate in the prize. I don’t know if the legal structure we had to put
around it deterred anyone from participating. However, we are very happy with
the participation we did get. I could not say that there are people who did not
participate because of the legal structure. We do believe that legal structure
has somewhat acted as a deterrent, because we have not really got wind of
people attempting re-identification such as was attempted, with certain other
prior data prizes.
Finally, we also commissioned an adversarial attack on the data before
release, to determine if we thought it was stringently enough
de-identification. We hired an individual at Stanford University who had
actually done the re-identification of the second Netflix prize data set, to
see if they could re-identify the Heritage Health Prize data set.
The conclusion was it would be extremely difficult to re-identify the
Heritage Health Prize data set. However, we also realized, through having to
hold back some of the data that we wanted to release, that the solution we will
ultimately get is not the optimal solution, or likely will not be the optimal
Where we sit today in the competition with approximately eight months to go,
we don’t know how much more robust the solution we are hoping to get will be,
from what we can already do with our doctors and our private protocol that we
have used, to identify the population. We are hopeful. We have approximately
seven months to go, but we simply don’t know yet how much better this result is
going to be from what we have always done in the past.
Now, speaking of the business, it is looking for a business solution to a
real world problem. I wanted to leave you, before we get to Khaled, with the
few key takeaways related to the de-identification issues. We have run other
prizes before. This is the first data prize we have run, but we have run other
Prizes in competitions derive their benefit from the numerous individuals,
from all walks, that participate and try to solve the issue. We have realized,
as others who have run prizes, you just don’t know where your best solution is
going to come from. This is why you don’t want to hire five people and have
them try to solve the problem. If we had done so here, they undoubtedly would
not have done as well as the competitors in the prize competition.
This has been born out in the year and approximately four months, during
which the prize has been ongoing. Most of the leading solutions have not been
created by people who are working in the medical space. Indeed, they are
mathematicians, they are hedge fund managers, they are people from all other
types of industries who happen to have a gift for doing data work.
We certainly could not have hired all of them. Even if we could have
afforded them, they probably would not have all wanted to come work for us.
That is the benefit of doing a prize model.
On the other hand, data prizes require the release of data sets. There has
to be a way to balance the individual’s privacy interest with the greater good
to society that would be achieved by solving somebody’s bigger problems through
crowd sourcing and prizes. Clearly, the problem we are trying to solve is a
We spend $40 billion a year, as I mentioned, on unnecessary hospital
utilization in the United States, and we have a health care crisis on our
hands. The general use that we are trying to make of the data have larger
implications in the general attempt to move from a post disease provision and
care model to a pre-disease prediction, prevention and cure model.
I am going to have to, however, leave it to people smarter than myself to
figure out where the balance between the two lie. Thank you for letting me
address you today, and now I will turn it over to Khaled, who is going to
discuss much more specifically the de-identification of the Heritage data set.
MR. EL AMAN: Thank you, Jonathan. I have sent you some slides. I am not sure
if they made it, but I will still talk to the main points on the slide.
MS. QUEEN: Khaled, this is Susan. I just sent you an email a few minutes ago
about a different person to send the slides to. Have you received that?
MR. EL AMAN: I am Khaled El Aman. I am the CEO of the company, focusing on
data de-identification, which was contracted by Heritage Provider Network, to
do the data de-identification work.
I will give you an overview of the technical issues that we faced while
doing de-identification for the Heritage prize. I am just going to start off
with a number of general observations. First of all, we use, at the time which
was around 18 months ago, we use best de-identification practices that were
available at that point in time. There have been had been a number of
improvements in methodologies, metrics and algorithms that we have developed
and others have. This is a very active area of research. I think more can be
done, if we were to start again. I will discuss some of the improvements as we
The other point I would like to make is that re-identification attacks are
hard to do. They take a lot of skills and resources to do successfully. This
should be also kept in mind. We did a review of publically mandated attacks
that was published in Plus One at the end of last year. There were 13 attacks,
six of them on health data, but two of them were on data sets that were
properly de-identified. We used the HIPAA standard as the basis for definition
of properly de-identified.
In these two cases, the hit rate was quite low. I think a lot of the
conclusions were if you de-identify a data set properly, using contemporary
standards, the probability of a successful re-identification attack is low. The
stories we hear about people re-identifying data sets stem largely from the
fact that these data sets were de-identification properly when they were
released. The systematic review, I think, makes that point clearer when you
look at all of the evidence on one page.
This next point is about reasonableness criteria which, is the way this
issue of address and HIPAA. I am going to read from the regs the definition of
identifiable health information. Health information that does not identify an
individual, and with respect to which there is no reasonable basis to believe
with information can be used to identify an individual, is not individually
identifiable health information.
There is the no reasonable basis term in the privacy rule. Also, it requires
that the risk is very small and that the information could be used alone or in
combination with other reasonably available information. Again, the
reasonableness criterion is used in the privacy rule. Here, I am talking about
the typical standards for de-identification in HIPAA. We are not striving for
perfection; we are striving for something that would pass the reasonableness
test. Of course, we have to figure out what very small risk means.
I will describe how we approach that here. In terms of the data sets, so we
started off with original longitudinal data sets that had information on
175,000 patients or members, over a three-year period. That data set included
claims, to have diagnosis and procedures, as well. We have drug data and a lot
of information. We had the three domains in the original data set.
What we ended up releasing was a three-year longitudinal data set with data
on 113,000 patients. It was a subsample from the original data set. It included
the claims data on some drug information, and no lab information. A decision
was made not to release lab information as it truncated drug information.
Again, I will describe the reasoning behind that, as we go through this.
Another important point is that there is a lot of missing variables in the
original data set, for example, length of stay. This was normal for data sets
that come out of clinical information systems. If you do look at the Heritage
Health Prize data and you notice a lot of the missing data was missing in the
original data, as well. It was not necessarily a function of the
The data set has information on some basic demographics, information about
the specialty of the provider, place of service, CPT codes, ICD-9 codes, length
of stay, then pseudonyms for provider and vendor and the information about the
payment. Then, the drug also had information about the number of drugs
dispensed to the patient over a certain period of time.
In terms of the technical issues that addressed, the first issue is what was
the definition of very small, according to the statistical method and the HIPAA
privacy rule? We chose a probability of 0.05 for de-identification of a single
record. A maximum of 0.05 for the de-identification of a single record, and
that was our definition of very small. The reason we chose that was we erred on
the conservative side.
I think throughout the whole project, there was a general sense that it was
necessary to err on the conservative side, because of the volume of data,
visibility of the competition and also the potential consequences of a
successful re-identification should it happen. We used the maximum probability
of 0.05. That is a little bit higher than the threshold that was used by CMS
recently to release their claims data. They used a maximum probability of 0.1,
so we are more conservative than them. One of the reasons was that the
longitudinal nature of the data, but also the details of the data was more
detailed NBCMS data that was released.
That a .05 threshold was consistent with other public releases of data, so
we are involved with other agencies that use data publically and they use the
probability of .05. It is not completely inconsistent. However, it is more
lower than the more recent CMS data release over claims data sets.
Also, the fact that we use the maximum probability, rather than the average
probability is important. Again, that is erring on the conservative side. It
meant that we took the worst case scenario and upgraded on that, while it
minimized the risk on the worst case scenario, rather than looking at the
average risk across all of the records and the data set.
One of the other factors we looked at, we looked at two types of attacks.
One was an adversary who may know a member of HPN and will try to re-identify
that member. It could be a nosey neighbor scenario or it could be a famous
person who is an HPN member. Then you maybe have a member of the press, for
example, trying to re-identify them.
The second type of attack we looked at was matching against external
databases. The two databases we considered were the voter registration list for
California, and the state and patient database for California as well for the
three years that were covered by the data set. We did some matching experiments
with the state.
We did some estimations and simulations for the voter registration, using
Census data to estimate the probability of a successful match with those
databases as a potential attack. This was strictly speaking, it was not really
necessary because when we managed the risk of a single record being
re-identified, we can show mathematically that that manages the risk from
matching with the other databases. We did it anyway, just for the sake of
completeness, and to see how much buffer we had. We want to leave a bit of
contingency with a 0.5 threshold for that data.
Also, at the outside, we removed the patients that had sensitive diagnoses.
The NCVHS actually has published a report on definitions of sensitive
information, so the definitions we used were certainly consistent with that,
plus the previous work on rare and visible ICD9 codes. We also used common
sense, of course. The paper that we published, we just list those diagnoses and
procedures and types of visits that were removed.
So in theory the de-identification we did would have reduced the risks for
those individuals, those individuals with sensitive information. These are
members with HIV, substance abuse, certain types of mental health diagnoses and
so on. De-identification would have been principally protected those
individuals. We were concerned about experiences.
The data set was so rich, and as we know, health medical records tend to
have multiple domains that are strongly correlated with each other. The concern
was if we remove only certain pieces of information, we would also have to test
that that information could not be inferred from other information we were
disclosing or releasing.
For all of the different types of sensitive information that would have
been, at the time, not possible, so the decision was just to remove those
individuals from the data set. This is consistent also with practices from
other agencies that release data. They would just remove those with the most
The matching experiments we did with the voter registration list on the
state and patient database shows that for building a successful match for any
of the three years, the highest was 1.7 for the certain combinations of
variables, age, length of stay, sex, condition groupings, procedure codes, CTP
codes. The hit rate was lower than our threshold for all individual years and
combined years. Also, the numbers were very small for the estimated match,
should someone try to match with the voter registration list.
Now coming back to the correlation issue, because I think that is quite
important. We are concerned that if we try to reduce some of the details in the
claims data, and also provide drug information and lab information, that our
adversary can use the drug and/or lab information to predict the diagnosis
We did a number of experiments with pharmacists, where we wanted to get a
number. We know that if you give the pharmacist the drug information, and ask
them to essentially reverse engineer the diagnosis, we know they can do this.
The question was, how accurately can this be done?
On this particular data set, because if the accuracy was low, then that
inference channel would not be a concern for us. If it was very high, then of
course it would be a concern for us. We did a number of experiments or
empirical studies with pharmacists, where we gave them incomplete medical
records, and asked them fill in the gaps.
We found that the success rate varied from about 30 percent to 60, 65
percent, depending on the level of detail we ask them to predict. We didn’t
find much of an experience effect. We felt that that rate was sufficiently
high, and that was a driver for curtailing the amount of drug information that
was disclosed in this data set. We have to get out the claims data, but we
didn’t want the adversary to use the drug data to enhance or create more
information in the claims data set that will increase our risk level with these
Then, for the same reasoning, we recommended that we don’t release the lab
data. For the lab data, we actually built a number of models. Also for the drug
data, there was a number of machine-learning models that would predict one
domain from the other. It turns out that even simple models, such as naïve
days(?) had a remarkably high accuracy for predicting values that we tried to
generalize or suppress.
Again, the more information you have, the drug and lab, the models were very
accurate. If you only had drug, predictive diagnosis or procedure codes, they
were less accurate, but still very high F scores. That was some of the
decisions around how much drug data to release and not releasing the lab data.
The other issue was the number of claims. Some members had a very large
number of claims. They really do stand out. We removed or truncated claims, to
essentially cut off the tail of that distribution, because some of the members
were really extreme outlies just by the number of things that they had. If you
had some basic demographics like age and gender, and then look at just the
number of claims, they were quite unique.
We developed those methods to do this claim truncation, so that these
individuals would not stand out as extreme outliers. They represent so that
only the claims that were most unique in the data that were truncated, so that
way, we wanted to minimize the impact on the data set. The argument being the
extreme outliers would, in many data analyses they would be removed anyway. We
would try to be as careful as possible to minimize the number of claims
truncated, but also focus on the ones that were really outliers of all the
variables that I mentioned.
Another important concept was that of adversary power. When you have a
patient with a hundred claims, normal risk assessments would say, at least
historically, I have an adversary who would know what is in these hundred
claims. That is quite an implausible assumption. Nobody would know what is in a
hundred claims about anybody. Even the patients themselves don’t have that much
detail about themselves. With each claim having about seven or eight variables,
that is 700 or 800 pieces of information. It just didn’t seem plausible.
There is the concept of the adversary power, where we assume that an
adversary would only have information about a limited number of claims. For
example, if we say an adversary would have background information they can use
for re-identification on five claims, then we can assess the risk on that
basis. That will, of course, have a dramatic impact on risk, without the
release of more data, using a very plausible, a very reasonable assumption.
The other problem is if you have 50 or 100 claims, which combination of five
claims do you assess? It is a combinatorial problem. We have various methods to
assess risk, taking into account this adversary power concept. This is a
concept that has existed in the computational disclosure control for some
years, not applied to longitudinal health data, but I think it provides a
reasonable way to evaluate risk for data sets, where you have multiple
instances — they have transactional type data sets, claims data, visits data,
Another important concept was that of patient diversity. Some patients who
have chronic conditions, if you know the information in one or two claims, you
can predict the remainder, or a lot of the remaining claims throughout the rest
of the year or subsequent years. A good example would be a patient receiving
dialysis. It is a recurring pattern for those patients. If you know part of
that pattern, you can fill in the gaps.
Then you have patients where they have a lot more diversity in their claims.
They have a number of acute incidents that are not directly related to each
other. Then in computing what an adversary would know, we took that into
account. The kidney dialysis patient, for example, if an adversary knew one
thing, they could predict a long trail of claims for that patient. Whereas the
second type with high diversity, the information content in each claim was
smaller because you can’t use it to predict other claims. The claims were very
diverse and they didn’t have a pattern.
We developed a number of diversity metrics and used those also, to determine
how much power or deciding how to compute the adversary power we considered the
diversity of each patient, as well. Patients that had low diversity, we would
give the adversary a lot of power. Those with high diversity, we give them less
power. That way, we essentially tried to customize or adjust the risk for each
single patient, so that we can release more data in a defensible way.
For generalized suppression, we used an algorithm called optimal lateral(?)
optimization, or OLA. This is a globally optimal generalization suppression
method that we had developed a few years prior, and we are using that for this
data set. I think the article references some material on that.
As I mentioned before, we subsampled the data sets to add another
subsampling, of course, it increases the uncertainty and works with the risk
metrics. We released 113 out of the 175, again to allow for a little bit of
buffer, in case any of our assumptions were violated in the future. We assumed
a power of five, but what if there was an adversary with a power of 10. Would
that increase the risk beyond that threshold? We did some (inaudible) analysis
and we found that our assumptions would have to be violated quite a bit to have
the risk above that .05 threshold. We needed that buffer, which was achieved
partially through the subsampling, in order to maintain this insensitivity.
Then, the final thing, in terms of data modification is that we linked
them(?) to protect provider identity, because we had provider IDs, we had also
vendor IDs and information about place of service and so on. The adversarial
attack that Jonathon had mentioned identified using information about provider
IDs and so on, in order to draw some inferences.
This is not necessarily a privacy issue, per se, or patient privacy issue,
per se. It was more on the provider confidentiality. It was possible to figure
out the identity of a provider from the information here, by looking at the
pattern of patients that an individual was seeing.
You can determine, to some extent, which facility people were looking in, by
looking at the age of the patients and how many patients go there per year, if
it is pediatric or adult. If it is pediatric, you can look at how many visits
to determine which is the bigger facility and which is the smaller facility.
There are paths that you can walk down, where you can draw inferences about the
provider. We made some additional modifications to thwart such attempts, from
the perspective of protecting provider confidentiality.
For the de-identification methods that we use, in terms of lessons learned,
where would we be now if we were to start to do this again? I think we would
look more at the average risk, rather than maximum risk. I think a good case
can be made that this can be a reasonable compromise between data quality and
the protection of privacy.
In general, I don’t think it is necessary to do these matching experiments
because by managing the risk from an individual being re-identified, you
essentially manage the risk from matching with the external databases. It will
always be low. In principle, thinking about these matching experiments may not
be necessary for future de-identification efforts.
The issue of correlations within the data sets and what it in the data set,
especially across domains, diagnoses, procedures, et cetera, it is complex and
requires careful consideration, especially if you are trying to release a very
detailed data set.
Then in terms of improvements in algorithms and so on, an active area of
work has been improve claim truncation(?), algorithms to compute adversary
power. There have been a lot of advances in that over the last 18 months, which
can result in more data to be released, just because the optimizations are much
more effective compared to that.
I think that that would be it. These are all of my comments. That gives us
some time to answer some questions, thank you.
DR. CARR: This is really fascinating, a very fascinating analysis. I will
open it up to questions. I believe Leslie Francis has the first question.
DR. FRANCIS: I have a very simple set of questions for Jonathon. One is,
would you be willing to share a copy of the contract that you ask everybody to
sign, or at least some parts of it, with us. The other is, do you have any way
of following up, so that if person number one gets a data set, and somehow they
were to use it to re-identify, could you figure out that it was the release of
the data to person one, rather than to person 53, that had been the source of
the data breach, or the source of the effort to re-identify?
MR. GLUCK: With respect to your first question, I would be happy to share
with you a copy of the contract. Again, we had outside counsel who works
specifically on — I did not even know that such a practice existed, but they
do. Prize, rules and contracts for companies, such as McDonald’s when they do
their buy your Big Mac, I guess, and get your game card thing. I would be happy
to share the contract with you.
If someone, for example, downloaded the data, and then shared it ten ways
down the line, I don’t think we would be able to identify necessarily who that
individual who breached the agreement was. Khaled, do you think differently? I
don’t know that we would be able to do that.
MR. EL AMAN: We did have a discussion as we were doing this of watermarking
the data sets, so that if a breach occurred from this invisible watermark, you
would be able to determine which account downloaded that version of the data
set. It was deemed to be quite complex because it would have generated such
variations in the data set each time. The watermark would have to be embedded
within the data. You have different versions of the data set that would have to
be generated dynamically, and it was a very large data set.
Then, the second issue, of course, was concern that a different entrants get
different data sets, would that be a fair competition.
MR. GLUCK: One of our concerns throughout this entire competition was given
the magnitude of the final prize, we had to be very careful we didn’t wind up
with anyone suing us because, like Khaled said, someone got a slightly
different data set, which they thought prejudiced their ability to win.
If someone could send me the information on where I should send that
contract, I should be able to get that to you.
MR. SCANLON: Jonathan and Khaled, thanks again. This is very interesting. It
sounds like the model you are describing is not an open health data set in the
sense of you just put it out there and de-identified It sounds like it is more
for restricted use, where you sort of chose who you would release it to under
the protection of a data use and contract agreement. This is not something you
would simply put out.
MR. GLUCK: I think it is kind of a hybrid, because while it is limited to
people who have agreed to sign up and abide by the rules of the competition, I
believe that clearly we are over 5,000 competitors. It is not like we chose or
handpicked people who could compete. As long as they were willing to agree to
the rules, and they didn’t live in certain countries which were excluded, they
were able to download the data.
We tried to put some very, very broad guardrails around it, but generally,
it is pretty open.
DR. GREEN: You may have covered this, and I just didn’t digest it. I am
interested in what part of an address, if any part, of the individuals whose
data are in the data set is included in the released data set.
MR. EL AMAN: There is no address information. There is no ZIP Code
information included. I think it was made at the very outset, not include the
code information. If someone really tried very hard, they may be able to infer
the facility where treatment was received, by looking at the size of the
facilities, and just focusing on the large and the smallest. I think that would
still be hard to do. In terms of geography, there was little there.
DR. COHEN: Many of the open data sets that we work with use county as the
lowest level of geographic identifier. How sensitive do you think your
de-identification method would be, with the inclusion of county as an
MR. GLUCK: I would like to begin the answer and then Khaled can follow-up on
my answer. One of the issues for us specifically is that we are in 11 counties
throughout Southern California, some of which have much more sparse population.
Including counties, together with certain conditions, might get us too close to
be able to re-identify. If we were talking, for example, only about L.A. and
Orange County, I don’t think it would be a big deal. We were including in our
data set individuals who lived in these much more sparsely populated counties.
Khaled, do you want to follow up on that?
MR. EL AMAN: I think also it is an empirical question. We could have added
the county variable, and then done a risk assessment when we measured the
de-identification. We would have been able to measure the risk with county
information included, and then, determine whether that was a problem or not. I
think, for this particular data set or any particular data set, the exact
answer would require including the variable and measuring the risk on that
DR. GREEN: In the rules of the game, where the contestants allowed to make
use of any other data set they wanted to, besides yours?
MR. GLUCK: They were allowed to use certain publically available data sets.
If they used a data set, it had to be something that everyone could use.
DR. GREEN: They could go to Healthdata.gov and use any data that they found
there in the contest?
MR. GLUCK: As long as it was publically available. One of the things, and
Khaled, I don’t know if you covered it completely or if you want to address it,
one of the things we did actually have to take into account as we were doing
the de-identification, this was something that actually arose somewhat at the
last minute, was the realization that, by cross-referencing a different data
set with ours, it would have upped the re-identification risks and having to
account for that. Yes, if it was a publically available data set, people could
MR. EL AMAN: As I mentioned, we included the explicit analysis, the state
and patient database for California, that covers the hospital discharges, and
data matching experiment with a few years’ worth of data for that, and also
looked up voter registration lists. The age data or the OECD data covered a lot
of the variables in the claims that included diagnosis procedures,
demographics, length of stay and so on. That was a good big to match against,
because a lot of the fields that were included in the prize data sets were
matchable to that discharge data. It gave us a good sense of what the risks
were. Those results were also taken into account and the de-identification.
MR. SCANLON: That is the question that I think I was interested in. While
there are folks who tell us that from the motor registration list and the voter
registration lists, and they are publically available health care data sets.
They can often re-identify. You actually did this in-house to see what was the
probability and likelihood that re-identification could take place.
MR. EL AMAN: Right, for the voter registration list, we estimated it. We
used a number of estimators to compute, with the help of some consensus data,
what the match rate would be if we got the voter registration. In Southern
California, you are not allowed to get the voter registration list for purpose
unrelated to an election. We couldn’t use it for a re-identification. We
couldn’t get it and use it for that purpose.
We were able to estimate the risk, and the match rate was quite low. Then,
we obtained the data and did actual matching experiments for the three years
with the possible discharge data. Again, I think that the metrics that we used
would have anticipated the results of those matching experiments.
When we managed the maximum probability of verifying a single record, if we
ensure that that probability is 0.05, then the proportional success for the
matched records would also be less than 5 percent, any database that overlaps
with this data set. I think matching experiments are good for assurance, but in
terms of them revealing something completely surprising after you managed the
original type of risk, would be unlikely.
To answer your question, if you do this well, the matching with these
external databases would not be an issue.
MR. SCANLON: Depending on the geography, I guess and the detail. It sounds
like you curtailed the information on diagnosis and dates of service and
procedures, or did you not have to?
MR. EL AMAN: We did, to some extent, yes.
MR. SCANLON: That is traditional.
DR. CARR: This is Justine Carr, maybe I will make the last question. Do we
glean from this that there is a generalizable application out of this? Are we
to learn from this that, at the end of the day, labs are dicey and ZIP Codes
are dicey? Or are we to learn that, given your own data set, you have to put it
through these maneuvers to come to your own measurement of de-identification?
MR. EL AMAN: I think there is a general process that you have to follow,
because the answer will depend on the data set. We have tried to spell out the
steps in the article. I also covered them in the presentations. If you think
through all of these issues, then I think you can produce a data set where you
can maintain good utility, and then also have strong guarantees at the end.
MR. SCANLON: The contest is still under way, right?
DR. CARR: Wouldn’t that be the measure of the utility of the data set?
DR. FRANCIS: Do you plan to recapture the data after the ending of the
prize? The reason I ask that is that data sets that are available now may not
be the same in five years.
MR. GLUCK: I am not sure what you mean by recapture the data set.
DR. FRANCIS: Is one of your requirements that people give it back at the end
of the prize time period, without retaining copies or having sent copies to
anyone else? The reason I am interested in that is that if the data hang around
for 15 years, and the other kind of available data sets that are available, the
landscape of what is reasonably anticipatable, that somebody might get ahold
of, has really changed.
MR. GLUCK: The rules require that the data only be used for purposes of the
prize. It is not to be used for anything else without special permission. We
have actually had a few research institutions who were unable to get other
data, specifically ask for the ability to use it for other research purpose. We
have typically granted those, if they are reputable.
No one is allowed, under the rules, to use it for any other purposes.
Because it is data and they have downloaded it onto their computer, at some
level we are going to have to, I guess, trust people that they are not going to
use it for other purposes. I don’t know that a requirement that it be returned
or destroyed would add that much to the requirement that it not be used for any
DR. CARR: Your work is very stimulating, and as by evidence by that, we have
two more last questions.
MR. QUINN: This is Matt Quinn. This seems like something, de-identification
validation, and re-identification ability, seems like something that NISS could
provide technical guidance towards, if they haven’t already. My takeaway is
that, as opposed to everybody reinventing this for every contest and
everything, that guidance does exist today. I will talk to Kevin Stein and Matt
Shoal(?) at NISS to see that. It seems like a great joint project with HHS and
MR. SOONTHORNSIMA: You talked about a lesson learned in terms of balancing
the trade-offs, because along the way, you talked about truncating data, claims
data, promissory data and so forth. The richness of data, because of
re-identification risk, therefore, you start taking away pieces of information,
pieces of data. Therefore, the richness of data and the ability to stratify its
more useful purpose may have diminished as a result of that. I guess, what is
your reaction to that comment?
MR. GLUCK: I would agree. I think again, that is why the balance has to be
struck. I am not sure where I am sitting now, I guess about my job, that is a
difficult question. I have to leave it to policymakers to figure out where that
balance should be. I agree wholeheartedly with the comment.
DR. CARR: Thank you very, very much. We really appreciate you taking the
time and very exciting, very thoughtful work, and we are looking forward to see
who wins the prize. With that, I believe it is almost time to conclude the full
committee meeting. Before we do, I want to again express my gratitude to all of
the people that I have worked with, as chair of the committee, particularly
obviously Jim and Marjorie taught me so much, and our incredible staff,
Catherine and Debbie, Marietta, Janine and Nicole, Susan, and also to John and
Shanda for helping us with our acoustics. Of course, to the very able staff, to
Matt, Lorraine, Maya, all of you, it has been really my privilege to serve as
chair of this committee. With that, I will entertain a motion to adjourn.
(Whereupon, at 2:02 p.m., the meeting was adjourned.)