[This Transcript is Unedited]
Department of Health and Human Services
Working Group on Data Access and Use
March 1, 2013
Hubert H. Humphrey Building
200 Independence Ave., SW
Washington, D.C. 20201
CASET Associates, Ltd.
Fairfax, Virginia 22030
TABLE OF CONTENTS
- June 3-4, 2013 Datapalooza: WG Presentation
Integrating Themes of Year 1
- Discussion of the “Demand” Side of Health
P R O C E E D I N G S (1:00 p.m.)
DR. CARR: I would like to welcome everybody to Workgroup on Data Access and
Use. We have an actually very interesting array of presentations this afternoon
and therefore I’m really going to make sure that we keep to our timeline. So
let’s start by introductions. I’m Justine Carr from Steward Health Care and I’m
Chair of the Workgroup on Data Access and Use.
(Intro around table)
MS. GREENBERG: I might say actually as the Executive Secretary that those of
you who are not on the Full Committee did not need to fill out the 450s that
special government employees need to fill out and so probably, you may have
But you don’t have to actually disclose them except that I do, officially
because we want to know whether there are going to be conflicts and we make you
fill it all out so that if you don’t remember we do. That’s part of my job.
But I do think because this is an open advisory committee environment and
all of that that it is good, it’s important still that if you’re actually
working in some space or feel that it could be a conflict, you can still
participate but that you acknowledge that.
But I just wanted you to understand that it’s a little different than with
the full committee members. Now the full committee members who are also members
they’re still on the hook.
DR. CARR: So welcome everyone. We are really riding the wave of a very
exciting momentum coming out of the NCVHS meeting. I want to let you know that
on April 30, May 1, and May 2, half a day, there will be a workshop on
Community Data Needs that we are not only invited, but encouraged to attend,
because very integral part of how communities use data is their access to data
and what the data is.
So I think from our work charge it becomes very important for us to hear
what the communities have to say and I think as we do, we integrate it into the
information that we, our reactor role back to HHS. So again, and Marjorie, I
guess if members of the Workgroup have an interest in attending even part of
that workshop that would be great and actually, do we have the final agenda? Is
DR. FRANCIS: We have the agenda structure but we don’t have the list of
DR. CARR: Okay, so what I would like to do is if we can get that out to the
workgroup members and you can look at what is the theme for each day and if you
can only come for one day that will help you.
DR. FRANCIS: It will help from an informative perspective but this is
deliberately designed to have people participate in discussions, not just to
listen, and so the longer people can stay, the more valuable because that way
they’ll be able to actually participate in the deliberations.
DR. CARR: Totally understood. I’m just saying three days is sometimes more
than some people can offer. So our contribution, particularly workgroup members
contributions, we have a couple additional folks who want to introduce
(Intro new arrivals)
MS. GREENBERG: Regarding the upcoming hearing, I think we should actually,
if the subcommittee has pretty much agreed on the structure and it can always
say tentative, even though we don’t have any names associated with it, I would
like to get it posted on the website, not just in the SharePoint and shared
within the committee. Is that okay?
DR. CARR: Okay, so what I would like to do is briefly revisit that work that
we talked about this morning and now enhanced by some of the further discussion
about where we are, where we’re headed. Then, as you know from the agenda we
want to focus a little bit on the Datapalooza for the first half of the meeting
and then in the second half we really want to go on a much deeper dive on
demand. We have talked about supply side a lot.
Today is the day to talk about demand and Lea and Chris have put some things
together. A couple of things actually we welcome Lilly Bradley, as a staff to
the committee and she has put together the information we’ve been asking for on
a number of visits to the website, et cetera.
So first step, let me take you through the big picture that we talked about
today. Just a reminder, our workgroup charge, to monitor and identify issues
and opportunities, to make recommendations to HHS on improving data access and
innovative use, including content, technology, media and audiences. And advise
HHS on promoting and facilitating communication to the public about HHS data
and thirdly, facilitate HHS access to expert opinion and public input regarding
policies, procedures, infrastructure to improve data use.
This is a slide one put together to try to capture how our thought process
has been coming together. As you can see on the top, we’ve spent time talking
about the supply side. So that’s the HHS data, what happened, there was an
inventory of the data, it was configured to be available for the public and it
And when we get down to the lower half, the demand side, the NCVHS has done
already a lot of work on understanding what communities need and what their
data needs are. But what we need to understand is how do they define their need
and then how do they translate that need into specific data. What data is
needed? How do they find it and how do they use it?
And with those two themes going on, as you can see in parallel, there’s a
gap in between. There’s a gap in terms of how the data that’s currently
available is known to these groups, how usable it is and similarly we need to
know more about what the demand is in communities so that there may be the data
but it’s making it available in a way that is useful.
So our current state as we’ve talked about, we’ve gone through the datasets,
the presentations last summer. Susan’s put together a nice inventory of the
datasets and their uses and we’ve identified a number of supply side issues,
frequency which we will hear this morning, how often people actually access it
in its native state.
Secondly, the timeliness of it, the timeframes that the data is from and
then usability both technical and content so how usable is it for developers
and Josh gave us a nice overview on issues such as taxonomy, et cetera. Also,
for unsophisticated users we talked about last time what helps them to use a
data browsers or social enabled open source opportunities.
A third issue around usability is what can we learn from current use and we
heard this morning from the Populations group, Susan Kanaan showed us a number
of ways that data is already being used, and then was it Paul, who said
meaningful standardization, no that was Walter, and sort of putting that as an
overlay on some of this.
And then on the demand side, that’s what we’ll talk about today. What are
the needs? Who uses the data? How to navigate the resources ,and we heard about
the similar datasets at different sites and the marketability or no, the
e-commerce, to say if you like this, you might also like … And then how to
assess the appropriateness of the available datasets.
So the next steps then are really to continue to define the gap between
supply and demand sides of the data. As I said we’ll have the learning session
April 30th. And then defining strategies to bridge the gap, and this is where I
think we really want to kind of come away with some very specific ideas today.
We’ve certainly heard a number of them.
We heard today, is there one place, Bill I think said, that you can go and
find what everything is and I guess, defining everything is the real question.
And then Bill had also put forward, and we’ll talk about this today, I think
that was sent out to members of the workgroup and actually we might to have
paper copies of Bill’s – we have them – so that when the data is there, it’s
one thing to say there’s data but it’s another thing to say this one you have
to pay for, this one is local, this one is state, this is federal, to begin to
make more of a roadmap to that data.
And we talked about e-commerce, trying to borrow from helping people to get
to what they need, stimulating innovative uses, and we’re going to see some of
that from Josh shortly, I think.
And then the important thing, marketing, really broader diffusion, what are
some strategies for broader diffusion of available resources. This is why we
want to make sure we have a presence at the Datapalooza. And just marketing,
when you Google, we were Googling the measurement dashboard today and it was
number seven on what comes up, so what are the things that we can do to make
our data come at the top of the list so people can find it.
And then finally we heard about feet on the street, that was Paul, that
we’re up here thinking about what people need, and a lot of times as we’ve
learned from Meaningful Use and the extension centers, getting right down side
by side, you learn more about what people’s needs are and you can help them be
And we talked about participation. And then Ed actually has given me a
little outline. Ed had hosted a fabulous session last year and was inviting us
and the Full Committee to participate in that as well this year.
And then these were three takeaways from Susan Kanaan, three points of
understanding why communities need data. Also the point that the communities
need information on data assets as well as problems, and how to foster a
So I’m going to stop there and that’s kind of the overview to set the stage
for today. Bruce.
DR. COHEN: I want to just to make sure that when we talk about demand or use
of data, there’s the end user which is the community and then there’s the
intermediate user which is the developer so I think really when we talk we need
to be explicit about what user we’re focusing on. I think they’re both very
important audiences for our work but I think we need to draw that distinction
DR. CARR: I think I have that in my slides, that there really are, there are
the sophisticated developers who need to know all of the mapping of all the
data and then there are the community folks who just need a kind of frontend to
get information as opposed to building out. Questions.
DR. BLEWETT: When you say community, do you mean substate, so you’re not
talking state, substate geography.
DR. CARR: That is an excellent question. We use the term community all the
time and we probably mean all of the above. More granular than federal and
state, and beyond that I don’t know that we have a lot of precision. But you’re
asking probably with a specific thought in mind?
DR. BLEWETT: Well, whether state was included in that?
DR. CARR: State and substate. We are thinking about substate.
DR. COHEN: We’ve gone through this whether it’s place or affinity group, so
there are different ways to define communities. Some communities can certainly
cross states and if they are for instance a particular race, ethnicity, if
we’re targeting among people for data collection. But when I think of community
I think of substate geographies primarily, but again if the National Health
Interview Survey right now is only available at the regional level, we could
think of states as a target community for expansion of NHIS. So we’re
DR. BLEWETT: I would like that very much. I’ve been working on that for a
MR. SCANLON: But I think community, you’re right though, there are really
different communities and it’s not just geography and it’s not just population,
it’s also a group that has a common focus on a specific issue. It could be
patients. It could be cancer patients. It could be providers. It could be
plans. It could be … So I don’t know if we’re only focusing on the geography
or the subpopulation. I think for the purposes of this initiative, we’re
probably focusing on all of those audiences.
DR. CARR: For the April 30th event are you thinking geographically on that.
DR. COHEN: I think primarily, place defined, and secondarily, affinity group
defined. So we hope to have folks who have data needs for their specific group
that isn’t focused on one place.
DR. CARR: So let’s move now to some nice work that Lilly has pulled together
about our – I’ll just let you –
MS. BRADLEY: So I jumped into this about two and a half weeks ago, so I just
want to let you know I’m not sure exactly what everyone is familiar with, not a
curse of knowledge so much as a lot of assumptions and if there’s anything I
need to back up to familiarize you with – I don’t know how much time you’ve
spent on healthdata.gov or you’ve visited it at all but you should. And user
feedback is much appreciated.
So I think that at an earlier point Joshua Rosenthal has sort of presented
the problem as we’re partnering and publicizing our data. We’re monitoring site
and resource data and tool usage and this is going to help us to allocate
resources and develop data products based on value and usage going forward.
The presentation that I put together today is using Google analytics. So
again, with the background and objectives for this particular presentation,
DHHS developed a catalog of data assets on healthdata.gov, that’s abbreviated
as HD.gov throughout the rest of the presentation, and this is supposed to be a
landing page I think if you want to understand all of the data resources that
you could access.
The catalog is such that it links you to the original data files that has
somewhat to do with where the data wants to be housed according to the
different agencies. These slides supply some info about the use of the website
since its launch in June 2012 and I hope that the working group might help us
consider where this website should fit in to our overall goal of expanding data
access and innovative data use. Help us determine how we should be managing or
measuring progress and success. And hopefully also helping us to understand how
we should be best addressing filling gaps in information.
Since its launch healthdata.gov has had 43,000 unique visitors. There have
been 64,000 total visits. Over 50 percent of these tend to be new.
So if you look at that top graph, across the top we compare, so there’s a
double Y axis or a secondary Y axis. You can see that the usage fluctuates. It
actually fluctuates by the day of the week so the low usage is on the weekend.
Here is some preliminary information. It just sort of let’s you know that folks
spend about 5 minutes on average when they visit the website. Visitors look at
about five pages which may also correspond with how many tabs we have.
So how do visitors come to the website? I can basically take you through a
user experience. So how do you get there? Well, 42 percent of people come
directly by tying in www.healthdata.gov; 35 percent are
referred in. So the top referring websites, which is the most of the referrals,
come from HHS.gov, data.gov, twitter – so these would be twitter posts that
link back to our website and then innovations.cms.gov.
Search traffic also sends about 23 percent; 87 percent of these searchers
use Google. So we should pay attention to Google’s algorithms for selecting
who, which hits come up first.
Understanding the referral patterns could help us improve promotion. This
tale is really long, so I presented approximately the top 20. These are all the
government referrals which you can see that’s a lot of the absolute numbers are
quite high, twitter being the next one.
Now, this is rough data, so I’ve color coded here in the orange social
networking websites. You can kind of see them as a group. This is interesting,
blog.visually is a designer community and they create infographics. They are
looking for fun kinds of data to help send a message to people. CDC has
actually got some stuff on their website on how to access their data and I
think it’s a great avenue to pursue. I’ve already talked to somebody there
about helping to advertise our websites.
So developers here, so programmable web, these are resources for them,
software device, I didn’t go through all but there are catalogs of public
databases so we should be checking who can promote us there. And then of course
there’s academia talking about us.
There is some information about landing page but there’s also this general –
most people will at least look at our home page and then this is the main
dataset tab. And then I think this overall usage shows you that folks are
clicking on every tab at the top.
Down here you can see that 3200 folks were trying to use search. We are
investigating what terms they are searching on to help us I think understand
what are the tags, what are they typing in, how do they want to organize the
data, and we don’t have that information at this moment.
This is interesting, 3000 people go to the starter kit but then less 10
percent end up going to actually download it or go to slide share which I’ll
show in a moment where that information is. So we might consider improving
I think we have to first figure out why they are not clicking, is it because
they are being redirected?
End page analytics over time will help us understand how users are
navigating our website. At this time, it’s pretty preliminary. These numbers
which you can’t really see are very small, so 33 percent of people here clicked
on the data tab and then all of the others were pretty much one percent or
We can also test out how different displays of information may be more
attractive or engage our audience more.
So then what do visitors do when they leave? Many click through to valuable
datasets. The outbound traffic is around 41,000 events. And a handful of these
top destinations are includable. So a lot of people close the window, about
half of them, but a lot of people go on. I can you a laundry list of 50
destinations that they’ve gone to.
But I think it’s better to think about what kind of data it is. This was one
way of looking at it. So here’s roster data. I know coming from the private
sector that roster data can be really helpful because we’re trying to look for
targeting and this Meaningful Use accreditation file includes information like
exactly which hospital is using exactly which competitor. So if I were
launching these HIT technologies, I would be using that to help figure out who
else I should go to and who my biggest competitors are.
And then of course you want to know where doctors are, you want to know
where hospitals are so I think that’s what this flat file is, I couldn’t
actually open it.
Community stacks, so a very dirty analysis, quick analysis, I think
suggested that it’s about a 2 to 1 ratio of folks who are coming to the website
to maybe do more public health use versus the one that would be doing more like
a commercial use, trying to market sizing. Like we used to see your data a lot
to understand where to invest in future oncology products because we wanted to
know who had what. It’s hard to distinguish at this point looking at the
numbers. You just kind of have to know from your personal experience why people
would be accessing it.
But that’s something we’d like to think about over time and I don’t know
what the right mix is. I don’t know if the committee has an opinion or if it
really matters but as far as if you wanted more of the private side to be using
it, perhaps we should be looking at creating more data basically with roster
data or what data would help them in getting out useful technologies, or
helping hospitals perform better.
Are there questions?
MR. SCANLON: Very nice presentation. I am curious about the analytics for
healthcare.gov also, but that’s the health reform website but we’ll do that. I
think the previous workgroup meeting, the thought was that the way we enter
data files or tools on healthdata.gov now that with some additional
meta-tagging approaches, I forgot, there were some other things, that it would
make it more meaningful to the innovation community and others because they
wouldn’t be looking at it necessarily this way, they would be looking at it
through the meta-tagging idea. I’m forgetting, maybe Bill or maybe Josh
mentioned that previously.
So what sounded like, and maybe we could look at that down the road, are
there relatively minor, not minor but some tweaks to healthdata.gov that would
bring us into these other communities and you land there from Google or from
other websites more easily.
DR. ROSENTHAL: This is Google data, public explore and some of those things
and we did some frequency counts showing how people are already using HHS and
some of the arc data in there at pretty significant magnitudes and so one of
the ideas we had floated around was actually not only allowing them to do that
because it’s public – they are already doing it – but actually encouraging them
to do that because the use count was so high. And the Tableau Public did
something with ReadWriteWeb and it was 16 year old with diabetic comorbidities.
So there’s kind of one approach where you build the destination site and you
do – this is fantastic right, this is unbelievable. You do the basic analysis
for the demand side and see what’s going on and that’s great.
And then there’s another approach we heard in the first session, how do we
master engineer multiple sites with different things – and you know what?
That’s really tough to do. It doesn’t work in other verticles and it’s doubtful
it’s going to work in health care and so another approach is build your own
destination site, make it work and then send your data out to the wind and
allow the browsers to reconcile it.
MS. BRADLEY: We will also be looking at the meta-tags. One instinct I have
when I go to a website is if the categories are created such that I don’t find
them very useful, like this had a lot of things, like based on year. I go into
the search box and I try to search it the way I want to think about it and
categorize it and I think when we have that data that will help us. I think
also implementing the ecommerce style of people who looked at this also looked
at this and then we’re going to have a later discussion that Bill Davenhall’s
going to lead on trying to do some rating to help people also navigate the
DR. ROSENTHAL: One way around that taxonomy in the metadata piece, Google is
a browser where you put it out there versus building a site. Then there’s a
discussion about tags and who creates the tags and one of the things we talked
about last time was you can have authoritative tags as a type of tag but then
you can also user generated or community tags and that’s one way to accelerate
those dynamics very easily in kind of an 80 20 or all as well.
DR. CARR: Leslie and then Bill.
DR. FRANCIS: You may have answered this and I didn’t understand it but can
you tell the number of times people have actually downloaded the data, for
example, into an Excel spreadsheet the way it was described by Susan earlier
today that people could.
MS. BRADLEY: I am investigating this very question and part of what needs to
happen – so some of these are actually direct links to a file that’s housed on
another website so that first one is one the ONC website but it’s an Excel
sheet so when someone clicks on it, it is an automatic download. Others we have
to actually go back to each website, the owners of them. So I’ll have to get
the analytics from each one of these to tell what they downloaded but we’re
very interested in that information.
MR. DAVENHALL: I have a question that related to your outbound traffic as to
whether you analyze the .edu’s versus the … is there any analysis?
MS. BRADLEY: I have not done that yet although I think if you really go – do
you have suggestions .edu would be interesting?
MR. DAVENHALL: It’s only helpful in identifying the personas of the people –
MS. BRADLEY: – right, sort of like this but on the backend, right? That
sounds good. I’ll look into that and send it out.
MR. CROWLEY: Nice presentation. Do we have a sense if people are finding
what they are looking for? From this data it’s probably a little difficult but
we might want to think about other ways we can embed some sort of two way
information traffic, maybe not on every search, but did you find what you’re
looking for? Or what were you looking for? What you’re looking for feeds into
some master list and there’s sort of ways the community can say oh, if you’re
looking for this, I know –
PARTICIPANT: So anecdotally I’ve heard it’s hard to find.
DR. CARR: Actually I was a little bit amused because I was also getting hung
up on the same problem Susan had and then I looked at the questions people sent
in and it was like – my thing crashed, I got hung up on it and someone wrote
back, it’s working now. That was not helpful. It’s kind of like very cut and
dry but I think it was an important thing because Susan’s site, and actually
this one too, data.gov, I went to some of those apps and I got stopped. So I
don’t know if it’s the building. But it was interesting because even
understanding that, even like what browsers do you need, if I’m saying this
right, to get into that.
MS. BRADLEY: It is about 40 percent IE, it’s about another 40 percent
Mozilla, and I was surprised to see Safari was down the list there at 14
percent, which maybe just helps me understand that health services researchers
are using IE still. Also maybe the Microsoft folks and I’m coming from a
heavily Bay area perspective at this point.
DR. CARR: Speaking of IE as we’re thinking about who is looking at it
besides .gov, .ie would be also interesting because I think Leo was saying that
yesterday, that China is looking at our health statistics.
MS. BRADLEY: IE I meant Internet Explorer. I was actually looking to see
where under Google trends who types in the term health data and Kenya is
actually on the rise.
DR. CARR: This is something that we’ve been looking forward to getting and I
think you put it together in a very valuable way and so valuable that you’ll
now have a lot of other assignments out of this.
DR. SONDIK: On the last slide where you showed the click throughs, that was
right after they left, this is where they went?
MS. BRADLEY: That is correct, they login in Google analytics as an event and
there’s about 40 percent of people who are just closing their browsers but then
other folks have clicked through or, I don’t know, it’s called an event.
DR. SONDIK: What percentage or can you estimate what percentage of users
actually do that, click to something else that seems meaningful, they don’t go
MS. BRADLEY: Yes, they are clicking links on our website. I believe it’s
around 55 percent of them click through but I can’t tell what they do once they
are there. Do they continue to look around? We actually have to circulate back
to CDC’s website, if you know anyone I should talk to who could tell me about
the Google analytics at CDC that would be great.
DR. CARR: I think this could be a standing feature because I think that you
have, getting the further drilldown would be very valuable. Let me turn it over
to Mo to kind of give us an overview. Mo and I are both on the Datapalooza
DR. KAUSHAL: Before I jump in, I would just love to hear people’s reactions
on those numbers, so the 40 to 60K range just seems to me really low. So I
think there’s huge room for improvement there.
MR. CROWLEY: What happened in October 2012? That’s where it spiked?
MR. SCANLON: I think it was a blog that brought us the attention there from
a lot of our users.
DR. CARR: Those are the kinds of things that are really the marketing piece
that we need to understand.
DR. KAUSHAL: A great segue into the demand side, which I think is the next
topic of conversation. So first of all I fully agree with the earlier comments
that Datapalooza is just one strategy I think. Demand side is just imminent
which is why we’re focusing time and attention on it but it’s not a panacea by
And to a previous point, it’s where the intermediaries of the users of this
data will be. Credit to Greg and the rest of the team, it’s just become a great
community and ecosystem to pushing out data. It’s not only public health
individuals, there are innovators, there are investors, there are large
companies. So it’s a great place for us to showcase.
So to that end we were thinking of two key messages or ideas for us to push
forward. The first being a little bit more descriptive, just essentially
describing what are we doing, the purpose of our committee and then maybe
highlighting, at a very simple level, the types of data that we’re pushing to
The more powerful piece though is a second area of focus which is ideally to
highlight a company or a piece of software or a value generating application
that is taking this data and created something very meaningful with it.
So to that end, we’re talking to Healthwise to see if they can do something
with it. To be candid, it’s going a little bit slow. And then we have Josh,
who’s already in our committee and he’s actually created something which I
think is very cool and no obvious conflict, I’m not investor – I think what
he’s created is actually really interesting.
So what Josh is going to do now is first of all tell us the data sources
that he has used. He’s going to demo what he has created and I think the frame
for us is, is this the type of thing we would like to showcase at Datapalooza
in terms of our committee’s work.
DR. GIBBONS: Just one question, the discussion that you reference with
Healthwise, are you asking companies to do things and you’re supporting them or
do they have to fund it on their own nickel?
DR. KAUSHAL: So what we’ve done – we’ve pushed out the data sources to them.
The ideal win-win is a company that’s already done stuff with this data and
then we could just use that as a highlight. Healthwise has come back and said,
look we haven’t really used much of this data, we’ll think about if there are
ways that we can use it. They have not asked for funding and to be candid
again, there is no funding from this side.
DR. COHEN: When you say this data, are we talking about the data that
appears on – all HHS data, these data is a huge –
DR. KAUSHAL: So there has been no bias on our end at this stage to say one
piece of datasets versus the other. The one comment we have made is that it has
to be government data, so ideally something on healthdata.gov but we’re at the
early stages of figuring out who is doing the most interesting things that we
DR. BLEWETT: I have a question before we start, just a little context, did
you give him an assignment? Did you give him a dataset?
DR. KAUSHAL: We were talking about this as an issue and then Josh put his
hand up and said actually I’ve already done something.
DR. ROSENTHAL: Just to clarify like two things. So I did a couple HEI
sessions last year. One of them was on education and one of them – and by the
way, on the seven other things just that we had talked about previously, I
think some people in the room know this but we had talked a lot about synthetic
datasets and CMS released one and it’s fantastic so that’s had markedly good
adoption with entrepreneurs in this space. That was one of the things where we
had the security experts come in and CMS and the fine friends at Norton did it
and it’s wonderful.
So I had done a couple sessions at HDI last year around education let’s call
it basically and so we played some games, we did speed dating and madlibs and
the upshot was how do you take this government data and how do you turn it into
something meaningful for a startup or for a legacy entity in the health care
And so to that event, we brought in 30 people from private equity to venture
capital to accelerator guys like Mo as well as C suites of major payers,
hospital systems, analytic companies and people who had started and founded
These are some folks up here, you can’t see Greg in the hat because he’s off
in the distance, but that’s a CIO of Aetna, that guy there is John Birkmeyer,
who was chair of the Board of American Medical Journal for a while. The guy
over on the right, so on and so forth – and they all had to wear these huge
hats and there’s white hats – so the point of this story is since then one of
the other things we had done in the committee is say, we need to further
And so synthetic file for data access was interesting, furthering education
I have put together an educational packet building on last year’s HDI scenario
and had done that had Harvard with some of the premed folks and Hopkins and
elsewhere and it’s been received and very well received. And it’s a lot of fun.
It basically tells you how to go through and sort of figure out the health care
market and what data sources there are.
And that’s when we hit some of the websites, that’s the feedback I handed
over today as well as Google and some other things. If any one in the group
wants access to that, what are the sources that I do in these workshop and how
do you read them and how do you figure out what’s going on in the health care
market, I’m more than happy to make that available.
And so the point of the story is I’ll be doing that again at this year’s
HDI, committee members and workgroup members are more than invited. I can do
that again and pull you in.
And then as part of the committee, Mo had basically said, look, we should
have a session where someone actually uses the data to do something meaningful
and we had approached some folks, Healthwise and others, and with no
compensation why would they do it, of course, it’s a significant investment.
And we’re having some trouble getting some traction on doing that.
And personally my philosophical bias is I really, really don’t believe that
you need teams of developers and years of development to use this HHS data. I
just strongly don’t believe that. That said that’s actually what I do. So let’s
talk about conflict.
So, I was very loathe to raise my hand and do that but Mo talked me in to it
and so I’m more than happy to show it if you want me to and we’ll put on a fun
find session. When I did it last year I invited different folks to come up and
talk about things and basically I framed up the issue very, very quickly saying
the government’s releasing this data, it’s worked in weather, blah, blah, blah,
here’s how you think about the market, public social good, health care has a
bunch of issues, haha haha haha, build a silly app, so here’s a summary.
So we’re putting out a lot data, don’t build this, that never really works
and there’s a bunch of perverse issues in this space and we’re going to play
some games and I brought up five companies real quickly, 5 minutes a pop and
said, look at them do a quick presentation, answer what business problem are
they solving, what data are they using and what is the output and to prime the
pump, and so this was Aetna and WellPoint and Advisory Board and so to prime
the pump I said, let me just show you what we did and this us just kind of
playing around. I don’t even want to be in health care, I want to go back into
gambling and other vice-related industries but nonetheless, here we are.
So we framed this up around a business, we said, hey, how could we help
payers and hospital systems actually grow organically from Medicare Advantage.
How could they improve their star metrics across the entire continuum and how
could they improve their benefit configuration? How could they give more people
more bang for their buck?
And so we used a bunch of data, back then it was just 15 file sets from HHS
and CMS, today it’s over 36 including private stuff, and build benchmarks and
indices all on the fly. You know what you saw today where you couldn’t drill in
and do this. With this thing you actually can. You can drill in down to county
and contract and up and down, against benchmarks from psychographic to clinical
to food access, all on the fly. It’s recalculating that in a dynamic structure.
DR. KAUSHAL: So maybe for the group highlight, a specific problem that a
payer might have that you can help solve.
DR. ROSENTHAL: So how do I improve my Medicare Advantage population? How do
I get more lives? They’re worth $10,000 a piece, per member, per year. They
want more of them. They are making an M and A land grab. How do you do that?
So once you have those lives you have to do something with them because
otherwise bad things will happen. So assuming you get lives, how do you become
a five star plan? How do you move bone density, diabetes, renal failure, eye
exam, how do you move your star scores up? And CMS has done such a wonderful
job across chronic, wellness, customer service and customer satisfaction and
So we basically built that for plans and for providers and we’re up in
operation. This isn’t a startup thing, we have plans and integrated networks
nationally. And I hate kind of showing this because I don’t believe you need
teams of developers spending years doing this but if you want to see something
nifty, that’s sort of how we do it. And how do they price? And how do they do
their benefit and their contracting, et cetera, et cetera.
You do to these management, you do it live, you do it integrated. You do
some community outreach stuff. One little plan we work with sends a bone
density truck into the burrows in New York and it works like a charm.
DR. CARR: Because I am a business application naïve person, I head up
an ACO, right, so if I wanted to do all those things are you saying I could
build this or I could buy what you have?
DR. ROSENTHAL: The ACOs are fun and fine but it’s more of a population or a
national level so you can look at community. Basically, you’re a payer or
you’re a hospital system or maybe you’re a provider and for the Medicare
Advantage business, this is all we’re looking at right now, so we use CMS and a
bunch of other broader HHS, food access, a bunch of stuff, and some social
Google, Yahoo, Twitter stuff for psychographic stuff and we basically – I’ll
take you into the app – how do I get a bigger population? How do I improve my
metrics, whether it’s customer service or clinical metrics? And then how do I
make money and how do I give people better bang for their buck? How do I make
it a win-win? Or when someone buys a service or a procedure or an MA
population, health plan from me versus someone else, how do I give them
So that’s three very specific things we do, and we only do three things,
mostly for payers, only for Medicare Advantage until K comes online with these
star metrics. How do I get a bigger population – how do I bring more people
from traditional into MA, then once they’re in MA, how do I get a bigger piece
of market share? How do I capture more agents on a daily basis?
Then performance, how do I across all 52 of these metrics and Part B, how do
I improve my clinical, so it’s chief medical officers top 10 payers basically,
how do I work my interventions?
I have disease management. I have case management. I have nurses. I’m doing
calls. I’m doing compliance, et cetera, et cetera. And then for the CFOs, how
do I run my benefit config? How do I set up a plan and price and how do I
contract with the hospitals? Just three things.
So real quickly that’s what we built.
DR. GIBBONS: So just one question, on the second question of how do I know
what to do to maximize these? You’re really on one side putting in what we
already do and then trying to say do we do enough of it or do we need to change
the mixability? You’re not really saying, oh, this needs to be done in this
population, it’s not being done.
DR. ROSENTHAL: We do the latter. Let me show you an example. So you don’t
know this Mr. Payer. Maricopa County in Arizona what’s driving five of your 50
star metrics, where you’re getting killed, it’s food access. We get that from
one of the HHS datasets. It’s really food access. So you know what you need to
do? You send out a fricken vegetable truck and drive around and when they do
that, their metrics go up. Or it might be bone density for someone else.
So they have anywhere between 30 and 50 interventions they are currently
running for 10s, if not 100s of millions of dollars. So one question is how do
I reallocate my resources but much more fundamentally is where do I spend them
geographically. And what’s driving it? Is it sentiment? Is it food access? Is
it disparity? Is it clinical care? Is it comorbidities? I’ll show you all of
that. So a bunch of different datasets. What did we build?
Just real quickly so everyone understands, because you asked this. We built
a SAS platform, meaning it’s software, it sits up there. I don’t use it, not
consultants, other people use it. It works on iPads, works everywhere else but
in this room at HHS.
DR. QUEEN: Do you mean SAS or –
DR. ROSENTHAL: No, not the stat pack, software as a service. That’s a really
good question. Not the statistical analysis tool. And so just real quickly,
what did we go about doing this. This has no IT. This is all public data.
This is everything we’ve taken directly from you guys, or scraped or
reconstructed or done other things, and so literally – I’m going to show you
data that Aetna doesn’t have or I can show you hospital system or anything that
they don’t have that we’ve pulled together and I know they don’t have it.
Let’s not say its Aetna because I don’t want to violate anything. Let’s say
it’s a large payer, several large payers and their folks say actually we don’t
have that data, where is it, is it at CMS? And part of it is, I was kind of
talking about taxonomy, it’s not about size and speed, we have to take 18
different files and put it in a metadata structure and print it on the fly to
be able to say your aging month to month went up or down basically because that
file doesn’t exist, blah, blah, blah.
So I can’t show it to you live or on the iPad because A, my computer won’t
work on the projector because there’s not HDMI, but even if it did, there’s no
signal in here. The signal on this is completely firewalled. So what I do have,
having done this –
DR. CARR: Could we go out to the parking lot?
DR. ROSENTHAL: We could, we could actually, if you guys want to do that.
DR. SONDIK: When you talked about all those interventions, who has those
interventions, the county?
DR. ROSENTHAL: No, the payer or the plan typically has them or there’s
county or demonstration –
DR. SONDIK: Is it the payer that actually has food trucks?
DR. ROSENTHAL: Oh absolutely, they’re doing salsa – Does anyone have a
computer that works? So I’m going to play a video, if I can actually get it up
and running I’ll show you the thing live. And Greg’s seen this before so he’s
sort of bored with it. He can just doze off, don’t mind him but for the rest of
So what I did having done demos and different dog and pony shows at health
care things to kind of consumer things like South by Southwest, design and
music interaction stuff to comp and techy kind of things, I think I just broke
your computer. One of the lessons of doing that is always if you go into any
hostile place, always have the demo recorded and so I’m going to play a
recording for you.
(Demonstrating a show)
DR. CARR: How do we land the plane? So what would this look like in terms of
Datapalooza while he’s getting that up.
DR. KAUSHAL: We have been offered a slot, correct? Or we have the
DR. CARR: I saw it on one agenda, that we were among the list. –
DR. KAUSHAL: I think that’s the most important thing to do, confirm that we
have a slot. The format would be some verbiage to highlight what we’re doing,
the types of datasets that we are releasing, that have been released and I
think maybe that for a quarter of the time and then the other three quarters,
if technology works, to showcase an application that has used it in a very
DR. FRANCIS: I was just curious about if you use the way Josh is, multiple
datasets, you aren’t actually linking the data but you’re crunching the data
separately, that’s what I’m curious about. So for example, let’s assume that
you have one dataset that is the number of grocery stores in Maricopa County
and in other places and then you have another dataset which may well not be
from healthdata.gov but that gives the number of uninsured in each county so
you can compare Maricopa – but you’re not linking the datasets in any – I want
to know how you’re linking them.
DR. ROSENTHAL: So I am linking them on the fly. So this is actually a
10-year-old kid when we were kicking up our offices and here is the app on a
40-year-old touch screen and he basically went into some of the big payers and
said, how do I improve clinical interventions for Medicare Advantage for these
populations and he touched the biggest bubble and then he touched the green dot
and then he touched the little piggy and it was fantastic so if the 10-year-old
kid can do it, that’s sort of what we designed it for and I’m from Louisville,
Kentucky, so there’s the –
DR. CARR: Will you be having a follow on for 55-year-old kids?
DR. ROSENTHAL: So real quickly, so here’s the question, what problems are we
trying to solve? And again, if you guys don’t want to show this, I’m more than
happy not to do this but since Mo is very persistent and persuasive.
So essentially what you see at the top of it is basically four tabs. So I’ve
pulled up Aetna but I can pull up anybody who I think is interesting and this
is without permission, this is all the public data and other things I put
together and what I’m doing right now is I’m analyzing. I’m not acting on
interventions. I’m not doing any evaluation stuff. I’m just analyzing.
And when you look at it, I can look at any entity’s profile or I can sort of
explore and the analysis, we basically know people don’t know how to do
analysis, so we did it for them. So you can always export it out if you want
So we said, hey, remember how I said there’s three things it does, it helps
you figure out how to grow your business, helps you figure out how to improve
the performance, helps you figure out how to do your benefit config and your
marketing to actually create value and then interact with the providers.
That’s what that analysis thing is … and the big black screen was actually
very tough to implement but now that we have that going everything will work
The fun part about it is in the old disease management or kind of cost
savings world you had to go on site and kind of dig through their claims in the
HR and PHR data and now you can just pull it all up instantly.
DR. FRANCIS: How do you get their claims data?
DR. ROSENTHAL: So it’s not their claims data, we’re actually just looking at
the Medicare performance rates. So we basically are just bypassing claims and
the reason behind that is claims for them is a proxy to move the rate and you
guys have already published the rate.
So here real quickly what I was going to show you is we’re just in analysis,
we’re not actually in any of the action which is kind of like sales for us and
under analysis I can either just look at profiles or benchmarks or I can
And there are three things people want to do. They want to grow, they want
to improve performance and they want to increase their value. And then this is
called grain. So remember how we’re always talking about data grain, how you go
up and down the grains, so this is a parent or I can look at a contract or a
county or a state or an MLR, anything I want to look at, I drag it in there and
then we bake it up for them.
So I’ll get into performance because that’s what the medical folks here know
best but just pretend you’re a business person because you need the lives
before you improve the performance. So we say, hey, there are three sources of
organic growth. You can improve your growth funnel which is MA penetration over
eligible. You can take market share, which is your market share. Or you can
capture agents. And so we have to frame up the business context. So the
question the group is how do people always use this data? You use this data to
do these three things.
Or it might come from inorganic growth. So this is every single county,
every single contract, every single area in the entire United States or I can
view the growth topology. So this is old Aetna and every one of these bubbles
that will be moving, I’m just going to play, and it works really slick on
Every one of these bubbles is a state and there’s Ohio, so I’m getting a
sense of where they are and as I click on Ohio, now I’m looking at their
counties and I see their growth funnel changing, this is what you call an
inverted funnel meaning they should have a specific marketing strategy around
it, which means they have more people in penetration than market share and
there’s trend over time, et cetera, et cetera. Or I can look at a profile and
so these bubbles are sort of flexing and I can send it to a PowerPoint or I can
actually do something with it. Where are they actually operating?
So everybody’s seen heat maps. These aren’t a big deal but how do you
actually connect it to the chief meaningful business question for these guys,
right. And so I’m going back and forth. Now the kicker is this. Drivers. And I
want to stop here.
So we’re using Google and Yahoo and Twitter for sentiment and sentiment is
psychographic stuff. But we’ve baked up the drivers chronic performance star,
customer satisfaction performance, provider performance, wellness, et cetera,
et cetera. Then we have our indices we built on top of it, a performance index,
a growth index, a sentiment index and a value index. And then there’s specific
other sets from Dartmouth Atlas or Unwarranted Variation and some of the
hospital readmission type stuff.
DR. SONDIK: What is the sentiment?
DR. ROSENTHAL: So this is our proprietary stuff we put on top of it. So
sentiment predicts very much like a net promoter score if you know the –
So in the olden days we built risk stratification models and that predicted
how likely, really how much someone is likely to cost over a time horizon. This
sort of index predicts do people give a damn about things, to put it in the
vernacular. We scrape their sources and scrape Twitter and we put – and by the
way when we’ve done this in previous lives, we’ve outpredicted RAND with this
sort of stuff so it’s not whogepugee.
Scrape is, so there’s different ways to get data into things like this. One
way is to download a file but that’s really tough and time consuming and it’s
just a pain to do it. If you build the core infrastructure the right way, you
don’t have to do that. If you build a flexible taxonomy, you can just suck
stuff down from different places and restructure it on the fly.
So on the web analysis that you were looking at, we would show up as having
hit the site but we would never show up as having downloaded a file. And so for
the benefit config, for what a plan or any of that stuff, we’re just scraping
And so we do the same thing with Google and Twitter. So I can show you
examples. We go through Google and we basically do a little surveying on the
fly and say show me 50,000 people who are over 65 and show me which ones and we
ask a question, are you in Medicare Advantage, are you not in Medicare
Advantage, what do you think about it? What do you think about these topics
which are directly related to the star metrics as well as the brands and then
we sift through the Twitter feeds, the qualitative type stuff and put it in a
qualitative quantitative matrix.
DR. SONDIK: How do you ask the question?
DR. ROSENTHAL: So Google has some formats. They’re actually pretty good at
doing that sort of stuff so we’re using their P value and predictive values.
DR. SONDIK: How do you get the people?
DR. ROSENTHAL: So Google does a lot of really interesting stuff. This is
some of the original 101 educational stuff I had shared with the committee. So
Google actually has a product, so we scrape Google searches, so every time you
search something, we see that. And when you –
DR. FRANCIS: Do you know it is me searching? Or do you just know that there
is an –
DR. ROSENTHAL: With you, we know it’s your IP address.
DR. FRANCIS: So you know it by IP address.
DR. ROSENTHAL: Although it depends, if it’s a survey, it anonymized. If it’s
Twitter we know it’s you with a zip code and some profile attributes like
gender and location.
DR. CARR: I want to make sure that we get the concept here, we make a
decision. What I think we are all responding to is we thought you took some
data and made it show up and it turns out there’s all this knowledge management
that is highly sophisticated that could take hours unto itself to explore and
understand but it is a lot about the innovation so I think that we’re actually
doing three things. One is you could start with that data, number two is you
can do this all this knowledge management and number three is you can display
something that is driving change. But this is sophisticated.
DR. ROSENTHAL: So here’s a couple things to know. This is HHS data and CMS
data and even broader .gov data and social data in the market right now
operational at large scale with major payers who are solving three specific
problems around what we have incentivized them or what you have incentivized
them to do with your policy.
So if the question is what does the data group do and do people use it?
Here’s an answer to it. Here’s how they’re using it in these specific ways.
DR. CARR: So on the continuum, thinking where we started today, about
communities want to know how many people smoke, this is now the nth degree of
sophistication. So while you’re saying the putting together isn’t
sophisticated, the transformation of data elements and the repurposing of data
elements is really the very sophisticated piece.
DR. ROSENTHAL: So when you saw the earlier stuff and they said you can’t do
that because you have to restructure it and it is just light years ahead,
that’s what this does. So when I started addressing the committee and saying,
let’s look at Google data explore and some really basic stuff like that and I
personally don’t believe you need teams of developers and large scale 50-year
contracts, but if you want to see what you can do, just give me 30 seconds to
show you a couple other things.
DR. CARR: Go ahead, finish up and let’s hold the questions and let’s finish.
DR. ROSENTHAL: So here I want to see what is actually driving growth of this
product. I can look and see what’s driving performance across clinical metrics
and what do I want to look at? Is it our indices? Is it psychographic indices?
Is it clinical care, unwarranted variation? Is it market type? Is it healthy
behavior? The indicators warehouse is fantastic. That’s how we use that by the
way. Is it physical environment? Is it food access? So we’re actually taking
your stuff and we’re saying what’s driving this sort of stuff and so what’s
really interesting, if you’re a big national entity and maybe your private,
maybe your public or nonprofit, so I’m clicking on some stuff.
And what’s going to happen right now is I’m going to take different things,
do you want to look at clinical care? Do you want to look at morbidity, blah,
blah, blah, what’s going to happen right now is I’m going to select something
and I’m going to restructure all of their data on the fly instantly. And kind
of make groups against our benchmarks as to where the areas that are high,
medium and low. So it takes just a minute, partially because it’s like Netflix
when you stream a bunch of data, that’s why I was having trouble on the
computer because I’m streaming a lot of data.
And so here are high, chronic performance counties. This is by your
standards, by the way, and so here’s where they are for Aetna, where they
operate. Now that’s pretty basic stuff. That’s not terribly sophisticated.
Where it really gets interesting, and I can drill into it, so here’s low
Rockingham, Virginia, you know what’s going on there, and by the way, their
growth is worse. So here’s an interesting, positive, non-perverse market
dynamic for these guys for one of the rare occasions. If they actually move the
metric, if they improve the quality of care, like it’s highly correlated with
But now I’m looking at and seeing how does this relate to is it sentiment or
value. So now I’m restructuring on the data on the fly very, very quickly and
I’m going to break some groups of high performance, high sentiment, low
performance, high sentiment. Where are people having good chronic metrics and
where do they give a damn? Where are they having poor chronic high sentiment,
high chronic performance?
MS. BRADLEY: Could you pretend like I’m a 5-year-old and you’re talking to a
DR. CARR: Here is what I am going to say. This is not possible. But let’s go
through it because we want to conclude where on the continuum do we want to –
DR. ROSENTHAL: Here is from Aetna, but we can pull up the entire US health
system if I feel like it. And what I want to look at is where areas where
people actually give a damn about the things you’re trying to get them to give
a damn about, by our sentiment index and where are those same areas that have
high chronic performance, these counties, by your star metrics.
So here, this isn’t the US health system although I could pull it up if I
felt like it, this is just Aetna and so here are the areas where Aetna
operates, these counties and communities where people give a damn about the
stuff you want them to give a damn about, you can drill down into the specifics
and they are actually hitting their star metrics. And here’s the growth of the
product around that which is your market incentive to make Aetna give a damn
and so we can further and further.
And then there’s other areas where they don’t give a damn and they are doing
well. Well, that’s really interesting. You might want to focus your marketing
campaigns and your public health services very differently. Here’s a really
wacky thing. Here’s where they are doing really poorly and they give a damn.
And guess what, there’s things going on there. There are disparities in food
access. That’s a food track by the way. The bone density, et cetera, et cetera.
And so this is just on growth. I won’t bore you with too many of the details
but let me take you through just a second.
So age-ins are really important. Every day 10,000 people age into Medicare
Advantage and the way they think about it is if you capture an age-in rather
than having to take them from traditional Medicare inside-out, depending upon
how you do the ROI calculations, they are worth 10 to 1 and 10 to 1 on that and
so you want to keep track of that. This is a file that the market is absolutely
demanding. That’s the first thing they start with.
This is a file that HHS and CMS does not produce. And so we essentially
capture this in their areas. We pull together some census files on a monthly as
well as some of your CMS and health indicators warehouse and you see old Aetna,
they have the year-to-date for the entire US health care system is 4 percent
up. They are 3 percent down. Something bad happened here in these particular
So the point is, this is just all growth. This isn’t the sexy clinical stuff
which I’ll take you into.
DR. CARR: Remember 5 minutes because we’ve got two folks that are leaving us
shortly and we have important things to discuss.
DR. ROSENTHAL: All right 5 minutes. It’s always these challenges when you’re
asked to produce sophisticated stuff and who it to people and do it in a
limited timeframe but I’ll give it my damndest. So here is growth metrics. If
you’re interested in the US population. Here is the number of total eligible’s
versus number of total MA versus MA penetration, versus market but every major
payer flipping across where Aetna operates and so you’ve seen heat maps.
When you go to HDI people make these colorful maps, right. And the question
the committee has wrestled with is what do you do with it and why do I care?
Well actually here is why you care. Here is their best opportunity to improve
their population size by a bit. Compared to their competitors, et cetera, et
So we’re using some pretty sophisticated technology that works better on
touch type stuff but it’s around a very, very specific business metric and I
want to be sensitive to the time so let me drop down into the performance
because I think that’s where most of the docs get much more interested, at
least the chief medical officers are sort of interested in that thing.
So here I want to flip up performance and I can check performance across the
entire continuum of care, whether I want to look at county or hospital or
provider or HRR – so here I want to pull up at the parent level, let me go down
to the metrics.
So these are your star metrics for Part C and Part D. These 2013. If I click
on a dot it highlights it which not only shows me absolute value in details but
shows me relative distribution and so if I’m breaking down against a trend, and
the kicker is, it does it by the people in your market and this is really
important for your policy, like thinking for whatever it’s worth, I’m able to
toggle back and forth between their market and the reconstructed peer group.
And what that means is let’s say I’m a little tiny integrated network that
really deals in dual eligible’s in New York. I’m in New York, I have certain
scores right and when I compare them to the other bigs in New York, like United
and some of the big hospitals I get crushed. But when I compare them to some of
my peer groups, my tiny little people focusing in duals I have very different
And what this allows them to do and what you can allow anyone to do is to
disambiguate between if I have a poor score, is it because of my organizations
type, is it because I’m tiny, because I have poor people, because I don’t have
good PCP supply or is it because of the market I’m actually serving, meaning is
it a characteristic of the market. So they can literally say why are our scores
the way they are. Plus some distribution, et cetera, et cetera, et cetera.
I guess I should show you hospitals.
DR. CARR: I want to say you are brilliant. You are amazing. And we’re
dazzled by this. I would like to open it up to questions now and we do have a
break coming up so we can look at this some more.
DR. FRANCIS: When you get Twitter information, okay, you get the sentiment
information let’s say off of Twitter. Do people know that that’s going to be
fed back to their health insurer on a per contract basis?
DR. ROSENTHAL: Oh, absolutely not. And by the way, that’s all your stuff and
there’s dozens of other things, nor do they know it’s going to your cell phone
to target your advertising and so as we’ve had all of these conversations up
until now, like the committee and the workgroup is kind of talking about data
and security in this way and that way – now I’ve noticed that it isn’t people,
there’s no PHI in here just by definition is at a county level so I just want
to be really, really clear about that. There’s no PHI, it’s all population
DR. ROSENTHAL: But they don’t know that it’s being linked.
DR. FRANCES: No, no, no.
DR. CARR: And so where you are going Leslie with that?
DR. FRANCIS: Quite clearly if I answer a Twitter question, like did you buy
veggies in – if I’m asked do I like, on Facebook, veggies or not, and that’s
put into his sentiment –
DR. ROSENTHAL: It is not even doing that, you say hey I stopped by Starbucks
and I had some coffee.
DR. FRANCIS: Right, if I don’t know that’s being linked up to –
DR. ROSENTHAL: You do, it’s public so –
DR. FRANCIS: I know it’s being made public but I don’t ever anticipate the
significance of that and, in particular, what I don’t think about is that it
might affect next year’s insurance premium and –
DR. CARR: So you’re right, but is it in scope for this working group on HHS
data, that’s the question. I recognize that it’s an issue but we need to kind
of regroup here.
MR. SCANLON: I think we want to get some closure to the organization with
this. We’ve got several options I think though. I just don’t think we need to
go into it any further.
DR. CARR: Josh, it is dazzling.
DR. MAYS: I just want to agree with Justine, that you’re absolutely
brilliant. Let me just say a little bit about how I think it fits in because
about 2 weeks ago I brought in one of the Obama pollsters. And he was able to
do something very similar. He was teaching us how to use the techniques used in
the social media, data social media and getting people to move to action. So I
think if I can kind – it was Cornell Belcher. So I’m trying to kind of help
translate what he was saying.
MR. SCANLON: I think this came up in the context of applications that we
want to highlight.
DR. KAUSHAL: Let me just put forward a framework. Please push back if it’s
not correct. So we are living in a world of extreme policy and flexion points
which are redefining business models and in my opinion for the right outcomes.
We have the data release piece which is, again correct me if I’m wrong, once we
enable this whole new set of ecosystem and tools which help enable payers and
providers to get paid in the new world. I think this is a great example that
highlights that. I think our decision is do we like this enough to showcase it?
I’m again very biased and non-conflicted and I think it’s an amazing tool based
on everything else I’ve scoured.
DR. ROSENTHAL: Can I say one thing before you guys decide? We had wanted to
release this elsewhere at a major tech and consumer conference but like
whatever you decide, I’m totally cool with, but I think it’s very, very
important that we show something that Healthwise can pull it together and show
something that –
DR. COHEN: My strategy for Datapalooza would be two tiered. One this is a
very sophisticated application that people who get it will be blown away with
and, at the same time, if we could do a more simplistic application, let’s say
I’m Nabisco and I want to market low-fat products, show me the public data
sources from the federal government combined with the social media that would
allow me to target my marketing strategy. So this is very sophisticated. People
will love it and then there will be people who might not get it because of its
level of complexity. If there was a way to show the potential range of things
that could be done.
DR. CARR: I agree with you. The part to you Josh that is like we do this,
this, and this and Ed’s asking like what is that? Most people need to
understand that data transformation and I think that hits upon the innovative –
when we talk about this group talking about innovation, that’s where the
innovation is. That anybody who thinks they are going to just download smoking
cessation in Manhasset Long Island is probably not going to go very far and I
think what we really need is to go in slowmo, maybe have Vicky be your
spokesperson but to say if you have a date element, what are the different ways
you can use it.
You can use it in its native state or you can blend it with Twitter or you
can interface it with something else. But I think that the majority, and I
don’t mean to take away from this, but if we were to really educate people, the
important thing would be to say, each element is good in its native state if
you put it with one other, it’s better; if you put it with four others, it’s
If you turn it into an index you can do that. So here’s five ways to
transform data, everybody got that. Now once you’ve transformed here, here’s
five new questions that you were not able to answer before. So I think that’s
it. And I think we could use this but on the other hand Josh has an audience
that are highly sophisticated that will pick this up in a second but I think if
we were to play a role demonstrating how the transformation of data elements,
or juxtaposition or linking or whatever, can bring you to a new place. Ed, were
you going to say something?
DR. SONDIK: There are several things in this that I think are really
relevant. One is the data sources that you actually use, okay, the data sources
that you actually used, that’s extremely important for us. Secondly, the data
sources that did not come from here. And differentiating that. The manipulation
of those, that’s sort of another story. But that’s extremely important.
The third point is that many people sitting around this table including me
look at something like this and we actually start with it from a different
point of view, which Aetna and others may start from but may not, which is how
good is this. So when you say sentiment, that’s my immediate – I immediately
glom on to how do you know that this traffic that you’re manipulating, whether
it means anything at all and how it relates and actually it’s relationship to
the health care system in total I can buy. What I have a question about is how
it relates to Aetna. Because how you do that, make that leap from what I gather
is generalized traffic that you’re not instituting.
PARTICIPANT: It’s specific to Aetna as well.
DR. SONDIK: How you do that I don’t know but we don’t have to discuss that
here. I only know about asking people questions and –
DR. TANG: How do you know –
DR. ROSENTHAL: Because we’re scraping it with Aetna as a keyword at the
taxonomy and then we actually –
DR. CARR: – but I think the point that’s being made is that we’re moving
from hard and fast. This is the discharge of this hospital to assumptions, sort
of that – crowd sourcing in a way. If this word and Aetna always appears all
the time, this sentiment data –
DR. SONDIK: So that’s what I was getting at. That’s another aspect in
talking about this to this at Datapalooza, a very diverse group, and you were
making that point. That sophisticated people, it may not be that they are so
sophisticated but they start at a different point. I mean people in health are
really focused on truth, you see.
DR. ROSENTHAL: No, no, no. This is just to frame it up very quickly. My
background is PhD Fulbright for Advanced Study Quantifying Qualitative Data, so
I have some academic roots. The point of this was that how do you take
something and how do you create market value? So while the data implements are
interesting and we sort of are happy about that, the real power behind it, so
we’re not asking for a double-blind control study what sentiment is doing, what
we’re actually showing is this group of people who have a high sentiment here
have high growth and they are worth money to you. And guess what, when the
customer satisfaction star score comes out next year, you see that strongly
correlated. And guess what, when you spend marketing dollars on them, they
respond 10 to 1, 100 to 1 against the other pieces.
But specifically what I was trying to do was saying very, very specifically
how do you frame it up for the business is what we’re most proud around, so how
do you take these different data pieces and say hey, you know why you should
care about this HHS data, if you’re trying to incentivize the market, this is
not for researchers. That’s a very different skin and we may or may not have
something like that.
This is for decision makers actually in the market and so what we did is we
transformed the data to specific things. How do I make these market decisions?
So I totally get what you’re saying and it’s not as if we don’t understand
that, it’s built for something completely different.
DR. CARR: I have questions from Lyn, Larry and Chris.
DR. BLEWETT: I think it is a good idea to do you two-stage kind of simpler
and I would encourage whoever is making this decision to do a public health
application as opposed to a business application, like obesity or smoking or –
DR. CARR: I think what we have – I mean Josh has an application – but we
could deconstruct this and do PowerPoints without doing –
DR. ROSENTHAL: An interesting thing about why we’re even in health care
because now they payers actually care about that – so now all of a sudden Aetna
is making rebate from you guys to actually improve that obesity metric, right.
So they are using this data as outsourced public health, if you will.
DR. CARR: But you’re right, the same obesity data has three different
customers or 10 different customers so where you take it or who you are
wouldn’t matter but okay, so that it –
DR. GREEN: I want to go back and frame a question around what you just said
a moment ago Justine about five and five. Lying beneath this is a set of
ordering principles. And ordering principles is the operating definition of a
classification. We care a lot about classification.
What is not clear to me, I’m just not fast enough to keep up with it, are
the ordering principles of the data underlying this. And I think for most
individuals for it to be believable and perhaps to be understandable, the
presentation might want to lay out classic Justine Carr diagram of some sort
about the ordering principles of the data. One of the ordering principles is
what’s the source? But there are some others that are implied. If those can be
DR. ROSENTHAL: Just also so you know, because it was really intentional. So
this is CMS star taxonomy. So everything we lift the taxonomy is straight from
CMS and then we build second order stuff which we don’t share with anybody
because that’s our IP and in the business world that’s really important. So if
you want the raw CMS data in a usable format and if you want to use the
taxonomy, here it is, you’re looking at it. The little sentiment thing, we
don’t tell you how we do that.
DR. GREEN: Another ordering principle that we went by was public, private,
government, enterprise – that map that leads to this demonstration would be
DR. CARR: I agree. We are really learning a lot Josh, even if we’re not
learning as fast as you do.
DR. ROSENTHAL: So this is government performance data, government provider
data, government socio-demographic data, web consumer and then we have details
DR. SONDIK: I think that kind of outline is extremely important.
DR. CARR: So I want to hear from Chris and then Vicky and then Lilly and
then Mo and I need to have time to hear from Bill, who leaves at 3:00 and
you’re leaving in 2 minutes. So let’s hear from you because you’re leaving in 2
minutes, and we can continue this.
DR. KAUSHAL: So I don’t know if we’re going to reach a decision today but I
need some guidance from the committee. I could find an app which shows where
food deserts are, there’s a dime or dozens of those. And no commercial use
whatsoever, no adoption whatsoever, interesting interface based on the data
that’s released. I’m happy to show that – unless you think that’s a waste
because we’ve done that for 3 years and Greg’s in the back and he can give us
more info on his background and what he thinks or if you want to show something
which is out in the market and creating revenue and value based on policy
changes to the data, this is one example, and again one of the best examples
I’ve seen and that’s where my bias is but I’m open to what the group wants to
DR. CARR: And I think it is the middle ware that we don’t want to leave out.
I think this is brilliant and Josh could have a session that would have people
who understood all the underlying assumptions but thinking about our role as a
working group, I think we could add value if we could take people from,
remember last year when we showed you here’s where the food is, that’s one
thing you can do and then maybe incrementally say but look what else you can do
with hard data so you overlay two things. Now you take three things, something
from HHS, and now you take this sentiment data that is squishy, that is not the
one for one kind of stuff, but that adds value and then we build up to what
Josh has and Josh you probably would want to – I think you already have a
session to demo this?
DR. ROSENTHAL: Just to be clear, I wasn’t going to show, we’re in stealth, I
haven’t shown this to anybody.
DR. CARR: It is a different crowd that would be learning this but I think it
is the step, as we think about the gap of demand and supply, it’s a big insight
to the fact that you need to have some knowledge management and some
classification and some incrementalism to get to a place like that. That is the
future state and I think Mo is right. We’re trying to talk about where can this
take us, that’s where we want to go.
DR. ROSENTHAL: It is sort of ahead of the market. You heard some of the
other people that we had on that said we don’t know how to use this HHS data
and even some other people we’re talking to – we don’t know what to do with it
and blah, blah, blah, so it’s a little bit ahead of the market.
MR. SCANLON: Do we think we’d have enough time at our session – well, the
first thing is you want to introduce what the workgroup is doing. Secondly, you
win a show, say that as examples of third generation users of HHS data, here’s
an example. If there’s a public health application of lesser complexity, we
probably want to do that too if we have time. But I think we want to then hear
from, if there’s time at session, we like a two-way sort of what do people
think we should be doing.
DR. CARR: So that’s the general direction. We will spend, we still have
several hours to go, and thank you for getting that started, don’t go away. But
what I want to do now is change direction a little bit and it’s actually not
about the Datapalooza and we’ll come back to the Datapalooza because we have
more to talk about, Ed has a session as well. Bill, can you take us through the
health grading system. So this is getting back into the basics of there’s a lot
of datasets out there and there are a lot of unsophisticated people who are
trying to do simple tasks and it’s hard to do that. So, Bill.
MR. DAVENHALL: So maybe the segue here is, I represent the unsophisticated
people. And I would say, listening to Josh, I would say data is on a continuum.
It goes from data that people can be mildly curious to the kind of data that
you’re going to serve a warrant on and do search and rescue.
And all these people are consumers of the data that you folks use and we’re
talking about here every day. So you have to be really inclusive when you think
about what’s the underlying data, like Larry said. I can tell you from a file
that I’m going to talk about here later that if I told you there’s only 33,000
residential zip codes in the United States where people live and yet Medicare
puts out a file that has 43,000 zip codes, what’s running through your mind
right away. Whoops, what’s going on here? We don’t know what’s going on here.
I want to say that in regard to the sophisticated use of the data, it’s no
better than that minuscule little piece of data that you started off with. So a
lesson on geography 101, is zip codes change every day. They are not
historically consistent so if you compare zip codes over two periods of time,
you immediately have a methodological issue.
Census geography is the only geography that is historically consistent. So
you start to take a look at the data that you produce and love and like and say
if it can’t be put into standard geographical units that the census bureau does
than everything else related to that, census date, population data, is kind of
like, well, you know, it’s a good guess.
So having said that is what I focused on was identifying organizations that
use some of the files that everybody likes, MedPAR files, hospital service area
files, National Center for Health Statistics files. And what I discovered were
like 13 companies, they are organizations, of which I know about because they
tend to focus on displaying their data in very precise geographical manner.
So I went to these people and I said what kind of data do you use? Where do
you get this data? And identify the dataset and would you be willing to comment
at some future time about its quality, its usefulness and so forth.
Well, they all were excited about this opportunity because they said they
basically – there was never a forum for this. Many of these organizations spend
100s of 1000s, if not millions of dollars cleaning up data in order to put into
their products or services whether they are free or not, and if they’re not
charging for the service, somebody’s given them a grant in which a large
portion of their dollars are being consumed by cleaning up this data.
There’s a little matrix we put together and we’re looking for nominations of
people that you know who have touched DHHS data and can allow us to go talk to
these people and find out exactly what they are doing with the data and also
then recruit them to be willing to talk about their experience with data.
Some of them probably won’t talk about it because they have created
proprietary cleanup procedures which add value to their data and therefore,
they are not going to explain to you guys how to clean it up because they are
making profit off it. So that’s another issue.
So we’ve started that as a workgroup and we’re just going to keep adding
that to the list so that we have a group of people that actually can come here
and tell us they’ve got their hands dirty with our data and can speak to it.
The other thing is that – Susan provided that list about 90 data items and
it’s paralyzing, it’s overwhelming and when you take a look at the scope of it
you realize that most of it deals with data that has to go through the privacy
board and I would say at some point we’ve got to decide that there’s bodies of
data which everybody has to use and wants to use and then there’s a lot of data
that I don’t think anybody really cares about, its either too old, it was done
for a specific piece of geography, and even though we’ve inventoried it, it
doesn’t – back to the prioritization, it doesn’t warrant our common effort to
really spend a lot of time with it because it’s not leveragable. And I would
recommend to this committee that we only look at data that’s leveragable on a
national level and I mean nationwide but maybe at local levels.
But let’s not start working with a dataset and begin to show it off that
only is relevant to Akron, Ohio. It would be kind of wasteful in my opinion.
So in doing that we created this idea of the data grade. So everybody’s in
to grading everything from hotels to – how many people own more than one
universal remote? So there’s many ways to skin this cat. I largely work as a
company – we work with people who are trying to develop applications regardless
of the data, forget the data. They are people who love data and they just grab
data and work with it. So when you get in the health space, you have a number
of datasets that people have chronic conditions about. About its quality and
And so one of the ones I picked was this hospital service area file to show
you an example because I have personal knowledge of it and I have developers
who are working with this file.
And so it’s not meant to be critical of the agencies that are producing it
but hopefully instructive of what I mean when I say you got to make your data
accessible and then you have to make sure people understand how to use it
appropriately because if you have people starting to use the data for purposes
for which the data was never intended in the first place, I think Ed would
agree with this hopefully, you’re headed into some shaky ground.
So part of it is to say is there a way that we could say what makes a
dataset five star. And you know from my opinion, if you can’t immediately map a
dataset, it would never get a five star but that’s kind of self-serving. But
I’m saying if you take a look at it, many people call us and say, can you point
us to all the data that’s available at the zip code level. Well, that’s not an
easy task to do to go find that in the system we’ve got today.
So this idea was how could we take some of the more popular files that
people use like the MedPAR file or maybe – well, here’s a good example, try
this yourself at home. Try to find the master hospital database for the United
States. I’m not talking about Medicare hospitals. I’m talking about all
hospitals. Because you know there are some hospitals that don’t take Medicare.
So where is that master list?
And secondly, who should be the organizations that talk about its quality.
Should it be the various trade associations that manage hospitals? Should it be
CMS? Should it be HHS? Should it be Homeland Security? Should it be the Census
Bureau? Should it be the Department of Defense? And I can tell you, all the
ones I named have databases on hospitals, and they’re all different.
So you could start there and begin to say, well, what if we did this to that
file and said, what are the files that out there and did they meet this grading
system. So you can see one of the things we need to argue about is the criteria
for rating, what’s appropriate and what’s not appropriate and you see what
we’ve put there. Cost, frequency of agency update, downloadable and machine
format, geographical level of detail, and whether it has national coverage and
scope. There could be more, there could be less or some of those could be
The whole idea was to be more intentional about the datasets that we need to
work on that we know will have the greatest utility and versatility across this
spectrum. As a governmental agency, I don’t know why you’d want Josh to be
making a million bucks because he knows how to clean up your dirty data.
I’m saying with a little bit of extra work, and I would say not a lot of
hard work but smart work, you could say, well, let’s tackle this problem, let’s
find out why there’s x-thousands of missing data holes in this database before
we send it out. Or if we send it out that way, tell people ahead of time that
this is one of the limitations they are going to have with this file. So the
idea was – I’m sorry, you had a question.
DR. SONDIK: The five points there – I don’t see anything about quality.
MR. DAVENHALL: Well, it does have about data – it’s not talking about
content quality now. It’s talking about the physical quality of the data.
DR. SONDIK: I know, I’m talking about the content. Because you give the
example of the hospital files, there are aspects of it of completeness but
there’s also aspects of the actually quality of the data about how precise it
is, whether it is right, even when it says it’s updated annually or more
frequently, how do we know that that’s well done?
MR. DAVENHALL: We actually don’t but if you go to the next little page there
where it actually picks the file it has places, imagine this being a webpage,
you would have a page set up and you have developers who touch this data that
may comment about this data.
You have end users who’ve tried to use it and have run into problems and so
I try to give a couple examples there because I’m the one that touched this and
I said, well, it’s got some problems, some limitations.
It’s meant to be how could we put this all in one place because I think part
of the problem of the stickiness, that’s a technical term in web language, how
long do you stay on the page before you move off it. Well, I would say some of
those people will stay 5 minutes, 50 percent are leaving.
It’s probably because they either concluded that that wasn’t the right piece
of data for them or they were no longer interested. And yes, some of it didn’t
work. So part of that quality issue has to be contextual. You have to have
people who are willing to tell you what they found and what they discovered.
There could be more stringent requirements. I mean you can make ones on the
accuracy of geographic, in other words, how many zip codes are in here that
there’s no data. So why do have – so that was the whole idea. How could we
grapple with this because if you put grades on this then you’re going to
encourage its use, increase the frequency it’s going to be used.
DR. CARR: How you grade it depends on what you’re need is actually. Another
way to do it would be if you only need aggregate directionally dadada, if you
need meticulous specific zip code, you need these. And so I think there’s
different ways to do it.
MR. DAVENHALL: If I told you that I know that there are people using your
data to demarket needed health care services, what do you think? They are using
your data for that, aren’t they Josh? So you really can’t even sort of guess as
to how people are going to use this data but you can make qualities about it,
statements of its quality of technical robustness for any purpose and it’s that
end user who has to determine whether that’s appropriate.
DR. CARR: Leslie has a comment and I want to stay to our time table which we
are minute over. We said we’d take a break now. But Leslie, let’s hear from you
and then I want to just frame up how we’re going to spend the rest of the
afternoon after our break.
DR. FRANCIS: Just a simple question about five star rating system? I’m
assuming that your methodology is for each of these five you get one star. So a
$5 dataset and a $50,000 dataset each get no star on the free.
MR. DAVENHALL: No it actually says that you only get a star for cost if it’s
DR. FRANCIS: But it doesn’t distinguish between a $5 and $10,000 or $50,000
and my other little question about it is sometimes with a star rating system
you care – a comparison between a two star and a four star makes sense because
you think all four of the stars are pointing in the same direction and I’m not
at all clear that this is comparing apples –
DR. CARR: So I think we’ve taken it a little too granular. I think the
concept is what we like. That there are datasets and datasets and they may have
the same title but their data completeness, integrity, value, privacy, costs
vary and that’s a tremendous concept. How we define value, whether it’s
constant or whether it’s according to the use remains to be seen but I think
that Bill has put forward a valuable contribution that I’d like to talk about
DR. ROSENTHAL: If I could just say, if you do seven things to the site, some
are easier and less difficult versus authoritative tags that will be the
context stuff you’re talking about, the second is user generated tags, what are
using directional or aggregate, that’s easy, that’s free, that doesn’t take any
time. And like-like, if you like this, you like that, that’s a basic similarity
Your rank authoritative is what you’re doing regardless of the grain, that’s
helpful. Having users rank them, just one to five, free, easy, quick to
implement and then finally user comments, if you do those seven things, some
easier than others, you’re up and running and can answer –
DR. CARR: You know how often you get something and on the front it’s in
English and on the back it’s in Spanish or some other language. I think that’s
what we’re going to do. Will have Bill on the front and Josh on the back and
depending on who you are just refer to that –
MS. BRADLEY: Can I also explain that the sentiment data is actually
observable data. It’s observational data and so it’s similar to claims data
analysis. You’re observing people’s behaviors and what they’ve done on the
Internet and so it’s not totally fictitious.
DR. CARR: Okay, this is very exciting. I want to adhere to our break. Let’s
take a 14-minute break and then we’ll come back. I want to hear from Leah and
Chris about demand said and I want to leave a good amount of time to come back
and integrate what we’ve heard today and where we go from here and then what it
would look like at the Datapalooza.
DR. CARR: We want to keep moving along. When Larry suggested for Ed’s
presentation that Linda, myself and Ed together, but I know that Ed is looking
for some state person who uses data. I was trying to think who that might be.
MS. BRADLEY: I had a question for the members about whether or not you have
ever clicked one of those buttons that said, did this answer your question, if
you are web searching. If we added that, do any of you? No?
DR. COHEN: I only do it when it doesn’t answer my question. If it does, I
MS. BRADLEY: I give a lot of feedback on websites. I actually do click those
buttons, but I just wonder. That is the best way to get feedback for improving.
DR. COHEN: Every time I do a web transaction, I automatically get a survey.
Actually, I have considered not doing certain transactions because I don’t want
that other survey.
MS. BRADLEY: Anything on the website, there are restrictions.
DR. CARR: Okay, take it away.
DR. VAUGHAN: Good afternoon, everybody. I am going to talk a little bit
about the slight charge to talk about the universe of data. That seemed a bit
large, and so the way I approached it was to take Justine’s questions about,
where is the demand for data, what data is useful to whom, what is the problem
we are trying to solve, and how can HHS’s data be a part of those solutions, to
address those needs. To do a broad survey and to talk with some of the users,
and to ask them about what they are doing, to give you a few examples of how
people are actually using it to solve their organizations’ issues, needs and
How can an open health data ecosystem empower many communities of practice
and interest, support the adoption of these inner operable standards, link to
open access initiatives, which has been put forward by the White House and
longstanding organizations, including public library science, and to strengthen
the convergence of the process and goals.
Who are the users? This is a very preliminary survey. There was not the kind
of hard we would usually like to use or put forward. Talking with users was
very helpful, as kind of a serial focus group, and engaging innovation partners
in all sectors. Not just federal users, not just app creators, but local
government, non-profits and folks like that.
I knew we were going to have a presentation about HealthData.gov, and there
weren’t a lot of numbers available directly through the site. I decided to
approach it by taking a quick peek at Data.gov, where there were some metrics
available. I hope you can see some of this.
It was pretty interesting. This was the monthly visitor statistics. There
were more than a million and a half, from February through January of 2013. You
can see, starting in February, running out through January, the cyclic nature
of that. Those millions of visitors, that is how many pages they looked at. In
each of those instances, they were clicking through. I couldn’t give you more
of a breakout than that. This was in Google Analytics, so it should be
Just taking a peek, you can also see the weekend piece, and how it cycles
through from New Years, this is just January. It gives you a little bit of
idea, kind of bouncing around the same range of numbers. No big surges, no big
What states are visiting? California is number one, followed by Virginia,
which is kind of interesting, then New York, then the District of Columbia,
Texas, Florida, Maryland, go Maryland, Illinois I think is in the house,
Pennsylvania and Massachusetts.
DR. COHEN: We need to adjust this by the number of Federal employees.
DR. VAUGHAN: No, not part of this analyst. One of the things that was really
interesting, though, and I just thought I would mention it, is they had the top
ten visiting countries. This is just for January. India, China, Canada, the UK,
Japan, South Korea, Spain, Germany, France and Taiwan, and I thought those
first two were really interesting, and prompted some other questions. I don’t
know that they are necessarily the users we were going for.
DR. CARR: Are you open to taking questions?
DR. BLEWETT: Is this the same as what was presented earlier?
MS. BRADLEY: All Federal data, HealthData.gov, yes, which used to be on the
same site, but moved off and they are about to relaunch, and AlphaData.gov.
DR. CARR: Does this include or exclude Health Data?
DR. VAUGHAN: It includes it. It still includes it, this is last year. The
links, though, or the tags are incomplete, so it is not capturing both. That is
an easy fix that could be leveraging what we are doing. If somebody just
happens to land on Data.gov as a first place, by however means they searched,
and they say, I want to see HealthData, it is not necessarily going to pull
them through. That is a pretty straightforward tech fix.
Speaking of which, this is the health section out of AlphaData.gov, which is
a secondary site that was pulled together by especially one of the Presidential
Innovation fellows out of the first class. He has pulled out a few of the
examples, that are especially high-value data sets, and where, in some cases,
there have been some products created. That is a growing site, it is a growing
landing page. It has got a little bit different feel. It will be interesting to
see how it unfolds.
MS. BRADLEY: If you have feedback, they are still piloting. We can get that
feedback to the right people. If you would rather something else pop up, for
instance, we could change that.
DR. VAUGHAN: I was trying to get just a sense of the thing. What many local
governments and some of the federal sites uses Socrata. It is a way of
organizing data sets and publishing, depending on your subscription, different
components of it. Their top 10 data sets for the last 12 months are crime data,
restaurant inspections, permits, 311 data, which is calling for information or
reporting, zoning, property taxes, public facilities, business, financial
transparency and performance. There are a couple of health or health-related
items in that, but I thought that was interesting. That is pulling across all
types of governments.
The showcase, Challenge.gov, we are going to use challenges to help
encourage people to use the data. When I originally went to the site to look
at, because I have not only used the site, but have been fortunate enough to be
honored in a couple of instances, was with the expectation of seeing now, more
than a year out, what apps were still there, were they still even working.
I found something a little bit different. This is Challenge.gov. It is the
shared space for public challenges, inviting people in to create applications,
videos, a number of different products around Federal data questions and
issues. What I found was a couple of interesting things.
If you go to Challenge.Data.gov and you look for health, you find that there
were 88 challenges for health. Of those, 57 originated with Health and Human
Services. The top challenges, it is just barely on there, was actually video
challenge. That had a popularity of 5,600 followers, as it is called, a modest
prize. It was about bullying, and the respondents, it is quite moving.
I looked to see how much data it was linking to in each instance of these
favorite ones. For the overall challenges, 39, let’s move, 5600 in the case of
stop bullying. There weren’t necessarily data linkages, and that seemed like an
incredibly missed opportunity. If we are saying, develop an application, our
data is important, but it is not important enough for us to say, this is the
data. Then, I think that may be sending out the message that we intend for it.
Too often, in every instance, they also weren’t linking to really rich
resources that have been developed within CDC and other parts of HHS, which
could inform and carry people through, just if they found interest and weren’t
going to do an application. In the case of stop bullying, it did link to the
bullying site, which is super well-done.
Does it make a difference how big the prize is? Well, none of the top three
highest prizes were among the favorites. I thought that was really interesting.
Half a million dollars and that wasn’t one of the favorites. I am sure it was a
favorite for the person who won the half million dollars. It is not a direct
correlation, so there are other factors about engaging people and bringing them
in that are, I think, important to start to begin to understand.
One of the other ways that the data gets deployed are we call codeathons or
DR. SONDIK: Challenges that are more specific, at least in an area, a topic,
are more successful than challenges in general that, say, use the data. We
really value great imagination and so forth. Cancer focuses the mind, and
there’s lots and lots of data sources that are related to cancer. You know what
I am asking. Is there a way of focusing these challenges, that can make them
more successful, for some more than others?
DR. VAUGHAN: I think so. I wouldn’t want to draw the conclusion because I
have not done a complete analysis. This is just looking in a survey kind of
way. Certainly, it seems to be the case. It also was the case that a number of
the challenges were specifically engaging social media. They would have
webinars, there were YouTube videos, there are links to Twitter.
They are missed opportunities, easy to do, easy to drop in, to invite people
in, and then, carry them through, again, to these more fully developed
programs. February is go red for women and heart health, then why wouldn’t you
leverage that? Bill has talked about some of those types of things. Why
wouldn’t you leverage that to say, we are going to do our healthy heart
challenge? We have got these programs, we have this data, and then we have this
follow-through, and to kind of expand and use what we have already, into better
With the codeathons and hackathons, they are all different levels. I thought
I would talk about how some of those folks are using Federal data, or at least
as part of their solution. Recently, here this last weekend, was the
International Open Data Hackathon. It was people from all around the world,
using open data in their countries, including the US.
In San Francisco, I thought I would talk about a group of civic innovators,
called Code for America. They are folks that work at a modest salary, for a
period of a year, in different cities around the US. They participating,
calling it Code Across America. They are San Francisco partner with the human
service agency, so-called.
They have just started in the last six weeks or so. They are wonderful. They
showed up at my taskforce, so of course I think they are wonderful. They have
been very careful and thoughtful about sitting down and talking with people,
who are actually frontline folks, about this is what we are here for. What are
your problems, what are your issues, what are your concerns?
This is at Code for America. You can see, basically all of these people are
heads or deputy heads of public health agencies. This lady, with the beautiful
orange scarf, is the woman who is in charge of implementing health reform in
San Francisco. Her name is Tangerine, like her scarf, and she is awesome.
There is Healthy San Francisco, which Mayor Newsom was a real leader in.
That has gotten us out of the business, with a modest charge of having people
to have to choose from their blood pressure medicine or food. It is not really
comprehensive. It is this piece, which is a huge and important prevention
piece. This next roll-out is really interesting. Tangerine and her team are
very thoughtful about how they are approaching it.
This is California’s portal, to covering California families. It has a
countdown of days, hours and minutes until health reform is implemented. They
have got some information at the bottom of what is it going to cost you. That
is all on the website.
What her pitch was, what she stood up to ask developers, designers,
community people who are all in attendance at this thing, to help her with, was
how, given all these languages, Arabic, Armenian, Russian and more, how do
approach people at community meetings, on the street, wherever we are going to
try and get them to sign up for it, how do we create a forum that is
HIPAA-complaint and gets them in the system, because we are missing a lot of
That was a bit of a theme that I heard throughout, where I can’t imagine
that counties all around the country aren’t having that same issue or concern.
Maybe there is a broader opportunity for implementation of practice, make it
easy, engage these amazing front-line folks, to get their ideas of what is
going to work and not work.
That was her pitch, and she got some good ideas. She didn’t win, but she got
some great ideas and it was fun. It really links back to this amazing document
that Vickie sent out and that I commend to you, talking about the digital
divide. These data sets, along combined with others, are an opportunity to
start addressing some of those issues.
Larry has talked about the public health workforce, and can we leverage
these data sets, combined with the state and local, to mitigate some of these
shortages. The non-profits are also looking at some of these issues. PEW
internet has got tons of interesting open data sets, but they are tracking who
is actually using this stuff and how, which is kind of beyond the scope of
this. Again, I commend their site to you for further consideration.
The mobile economy is growing, but it is also stagnant. One of the other
challenges in communities that have not been necessarily considered right out
of the gate was a parallel event that was going on at Stanford, which right in
the heart of Silicon Valley, has all sorts of innovators, including an
innovator start-up space for undergraduates, which is really quite stunning.
They reached out in this instance to the medical school, in the medical
school community. There were some engineers and designers and developers there,
but the people who were participating were some undergraduates, but mostly
medical students, residents and physicians. They came with their ideas for how
they wanted to use data, to answer their questions and issues and concerns.
There were 56 people who pitched different products for a weekend.
There is a lot of energy there, that we are missing. One of the things that
I loved about this, though, it was one of the few instances where HHS reached
out to them. There was one presenter, but using social media. Here are our data
sets, here are, and they tagged it to Twitter. If you have any questions, ask
us on Twitter and we will get back to you.
People mostly didn’t know what to make of it, but they started talking about
it, and I thought that was really important. Some people use the data
specifically because they were invited to. I think it is important to
understand that a lot of people don’t feel like they have been invited. This is
a little bit of a close-up of the data sets that use a couple of the API
directories and the invitation on Twitter.
One of the other open data things was called Eco Hack. That has been held in
multiple cities. This was San Francisco for the International Open Data Day. It
had a number of really great products and ideas pitched, including by graduate
students from UC Berkeley, in the School of Public Health.
There has been an ongoing project, to try to understand the impact of
environment pollution on cancer disparities, in particular. The air monitoring
system from EPA has been part of that analysis on an ongoing basis. It is a
little horse that is important. It is part of the conversation, along with
weather data and all these other things.
What the researchers that Cal set up were beacon, which is local monitoring
for urban air quality and climate change. They partnered with other great folks
at Chabot Observatory, the Oakland high schools, and the Exploratorium, which
is a science kids museum, and for grownup kids, too. It is a great place.
They had specific goals. They wanted to meet more people who are interested
in the same issues, to start charting maps on their data, and to just
brainstorm, and they did that. The yellow dots are where they have all of their
sensors. What they created was a little bit of what you can see at the bottom,
is a heat map of those sensors and some of the real-time readings that they
were pulling down from the high schools, and the other sites of operation.
Philanthropy and non-profits are also areas for partnerships and engagement.
Philanthropy kind of gets smushed over a little bit. Last year, there were
about 10 billion in grants, almost 20,000 grant makers and recipients, and
represent overall 11 percent of the economy, in terms of employment, so,
This is a little bit tough to read map. Many foundations’, private
philanthropy, work with the Foundation Center to publish some portion of their
data in a consistent way. What they have started doing through philanthropy
insight, and an initiative for transparency called Glass Pockets, is to start
publishing more of that data in multiple formats.
All these little pins, green pins are grant makers and the brown pins are
recipients. The cross-hatching is overlaid over the percentage of the
employment sector, which I apologize is hard to read in this slide, but
Maybe some examples of who has brought some of these loose ends together. At
the Health Datapalooza a few years ago, there was a little booth called
Patients Like Me. They are a patient engagement empowerment. Within that
ecosystem, patients can choose to publish and share their own information, as
much or as little as they care to, and to track that. They use
ClinicalTrials.gov as part of their core stream in important ways. They kind of
rework it, so it is more usable and more accessible.
They are pulling down, this is clinical trials, so they transformed from
that and the developer portal, to create their own product. Just here in the
last few weeks, one of the interesting landmarks, in terms of monetizing this,
what is the impact, is that we were funded by the Robert Wood Johnson
Foundation for $1.9 million, starting with free data. $1.9 million starting
from free data is interesting.
MR. CROWLEY: They have actually run some effectiveness studies now that have
done clinical trials, showing the effectiveness of certain drugs for ALS
patients, at probably 1/100th of the cost of a clinical trial.
DR. VAUGHAN: The cofounders of this company had a sibling who had ALS. That
is part of what drove them to this place, so very interesting. One of the other
examples, again, a company built, this is a California company, who won best
community health app at the Datapalooza, it is called Healthy Communities
Institute. They are pulling health indicators and under factors, and aligning
it with county health data. They have got a very vibrant US precedence, and
starting for international. I can’t give you the numbers, because I don’t know
them finally, but they are doing well.
One of the examples locally here is there are actually these little
indicators up top, from hot to cold, or cold to hot. Community dashboard for
Montgomery County, they are the people who are powering that, using those
health indicators. It is a fun site. It is easy to use, and if you are local, I
certainly commend that to you, and the site in general. There are indicators
for diabetes, colorectal cancer, motor vehicle accidents, people in poverty,
education, social indicators of health, aligned with some of these other health
DR. COHEN: They use vitals and VRFSS, mainly. If states have hospital
discharge databases, they will use some of that. Most of them don’t use claims
DR. SONDIK: They have another component of it, where they talk about
programs, as well. They also encourage communication among the various people
who are using this kind of structure. A very interesting question for me is
what difference does it make? I hope that doesn’t sound cynical or negative,
but I would like to know what its impact is.
DR. GIBBONS: Are you talking about the Patients Like Me model?
DR. SONDIK: No. PatientsLikeMe is really intriguing for a lot of reasons. I
am talking about essentially the dashboard approach. They do it very well. It
is extremely attractive, and it brings different communities together. The
political community, because the board in Montgomery County, necessarily has to
pay attention to it, the public health component.
Then, there is a background to it all, which is from public health. There
are public health experts who drive, in a sense. I don’t know of any efforts to
really say, okay, here is where we were. We put it in and now, here is where we
were. They may have.
MS. BRADLEY: They are tracking it. You will be able to see on that dashboard
whether or not it makes a difference. You won’t be able to attribute it.
DR. SONDIK: I won’t know whether it makes a difference. I shouldn’t get into
that, because now I am being compulsive on trying to look at evaluation. It
would be really interesting to know who actually uses the community resources
I think it is terrific for the public health people to have something like
this, and a tool which they can actually use. I think it is important for some
of the people sitting around this table, in HHS, to know how it is being used,
and then maybe how it can be made even more useful, if it is not being used all
DR. FRANCIS: Unless I am wrong, I think it was some of the communities from
the Healthy Communities that Susan Kanaan originally talked with, when we did
the report two years ago. I think Sonoma County.
DR. VAUGHAN: Sonoma is another one of the counties.
DR. SONDIK: That is the one where I first saw it. It is very big in Hawaii.
Just about all of Hawaii, I think, is on it. They have got a lot of counties in
Arizona, I think. It is important.
DR. VAUGHAN: It is a hugely important question. I think my inclination is to
say yes, because it is not just the technology alone. It is technology
partnering with the local partners, linking to the programs, linking to local
resources, what they have decided out of the portfolio they are going to need
and use. I think that my guess is that it is, but I don’t know that they are
measuring it in that way. I think that would be important and helpful to know
DR. COHEN: It is trying, I think, move beyond data, and overlay metadata and
success stories. There are always links to other program interventions.
Essentially, it is moving to helping communities make decisions. This is more
of a decision-making approach than a data provision.
DR. VAUGHAN: No, it is based on data.
DR. COHEN: Yes, it is based on data, but data is an input into a broader
decision-making process. As opposed to state-based web query systems that
minimize the meta data, or the environmental public health tracking system is
an effort to sort of have a blended approach, where you provide some contextual
information around data.
My problem with these are that they end up being very expensive, as opposed
to value added from freer systems that states or the feds provide.
DR. VAUGHAN: I think their response would be that it is less expensive than
hiring an employee to do it, especially now. They can justify a contract, they
couldn’t necessarily justify hiring.
DR. GIBBONS: I would also say that there is another side. This question of
whether it is effective, the real question is effective at what. One of the
things that is easy to forget that a major important thing that needs to happen
with data is to communicate information, outside of whatever else you do.
The appropriate question may not be what health parameter has changed, as a
result of a dashboard being put in place. What information was more effectively
communicated to whoever it needs to be communicated, then was before the
dashboard was put in place. I will stop there.
DR. COHEN: Have you helped communities figure out what they want to do?
DR. GIBBONS: Whether they figure out or not is a separate question that
could be tied to it. Are you communicating valuable information to the people
who want that, is a bona fide question in and of its own, for which data is
DR. CARR: You have to define the problem before you solve the problem.
DR. GIBBONS: I am going to come back to it, because I think it relates to a
lot, at least what I have heard.
DR. VAUGHAN: Two of my favorite examples of using data and part of the
federal data stream, and local data, are two projects in New Orleans. One is
part of the non-profit Knowledge Works, which has been around since long before
Katrina, the greater New Orleans community data initiative. They are basically
literally walking people within the community, regular working folks. This is
what data means, this is what it means for you and this is how you use it to go
advocate for what your community needs and wants.
The way they have also used it is to secure more money in federal funding,
by improving the capacity of local people to correct the data, to inform that
data, and thereby justify additional resources coming their way, and for
communities like New Orleans or Orleans County, $15 million more is a big
difference, especially right now.
One of my other favorite projects is called the Louisiana Bucket Brigade.
They are an environmental justice group. They are matching up federal data,
local data, publishing their match-ups and publishing maps, and encouraging
local engagement and reporting in that way. It is a combined approach, an
engagement approach, a larger capacity-building approach, more grassroots up
than top-down, but very interesting and effective.
In talking with companies, and going to hackathons, I came across these
folks. I didn’t expect the conversation that followed, and I thought I would
mention it in particular. Food Essentials is a company based in St. Louis. They
are publishing information, which is food label data and retail data. The tag
line was that we have got this great FDA data. We make it available to you and
to invite developers to use the derived product to make more money.
My approach was, hey, this is great. I am looking for examples of people
using federal open data, be my friend. They said, well, sure we will be your
friend, but we are actually providing the data, which was not what I expected
Apparently, what they are doing is that their product is to pull all the
requirements from the food labels and other things, create their own specialty
product that makes it far more searchable and usable. They sell that back to
MR. SCANLON: FDA was planning how do we do this? So it looks like we found a
DR. VAUGHAN: Part of the legacy contract that they inherited was that it
could be searchable for FDA’s website, but not published as an open data set.
However, apparently behind the firewall at FDA, it is completely published and
accessed, which the cofounder of the company let me use.
It was kind of an interesting instance that had been anticipated. However,
for the purposes of the hackathon, it was very interesting. I rang California’s
WIC-approved foods through their database, just to see how close it came and
what I could find. I found about most of the brand name products were covered
in there. Most of the local products were not, so it was about half and half.
They have a system, basically a red-green-yellow, like a stoplight, whether
it is complaint, kind of complaint or alert. I found for the very top level
things that WIC looks at, calories, protein, broadly defined, it was pretty
more or less in agreement, not completely.
When you go down to what is also required to be reported, in terms of
allergens and food additives, it was all full of red lights. Here, we are
saying, okay, we are doing this great thing for the babies of California, the
future of California. It turns out we are giving them a whole bunch of
additives and allergens. I just thought that was interesting.
MR. SCANLON: Was it a red light to the consumer?
DR. VAUGHAN: It is behind the firewall. If you develop a product to it, you
could. One of the apps is for gluten. If you have a gluten sensitivity, and you
use the app product, then yes, you will find it. If you are a developer, you
can expose it. If you know to search for it specifically, you will find it.
Basically, they made it so much easier to search, that it is much easier to
find using their product than otherwise. It was interesting, and not what I
expected to find.
The other interesting thing from them is because they have been a
Chicago-based company. I found in the instance of St. Louis, in some of the
other international partners, that they are incenting innovators to come.
Basically, they pay for their offices, they set them up. They move their
company from Chicago to St. Louis, basically getting a pretty clear ride for a
couple of years. That is money that they are not spending to go, and they are
still a national-international company. I thought that was interesting.
Public health users, San Francisco, particularly under Mayor Newsom, was a
big publisher of open data, and that includes the environment health, food
safety program and restaurant safety scores, like many local departments. At
the bottom, you can see the open data portal and Socrata and the searchable
forum on environmental health website. All of the current reports are there.
However, what they decided to do, and this has been really interesting, is
to partner with a San Francisco company, which you all may know, called Yelp.
Most people think of Yelp as where can I find an activity, what is close to me,
a location-based service. What they did was they reached an agreement to link
to all of the restaurants the health score.
That has added value to the city and county of San Francisco, in that it
tends to be a lot more attention getting, and encouraging compliance and public
safety and public health, if it turns up in the Yelp score, where people can
find it really easily. I thought that was a really interesting partnership.
They are looking at starting it in other cities, including New York. I thought
it was an interesting partnership to take.
This is public-facing data out of the open data catalog that does get
reported out and aggregated from state and federal, kind of in the other
direction of how can we start to rethink some of these partnerships, so that we
are partnering to get those data out in a way that is useable, that is
important. Yes, I want to know if it is a failing health score. There are some
interesting opportunities for other partnerships in this ecosystem.
What are some of the next steps and opportunities? I have spoken about it.
We can set those targets and implement policies, but without the observations,
without those data, we can’t know what is working or where. Part of that is to
get a better handle on what data do we need, what do we know.
Part of my informal conversations, talking with the different tech
companies, the health department, some of the non-profits, are just asking them
and listening. What is success to you? What are your pain points? What are
your, if I just had more, this would be wonderful? What are those
opportunities, and to invite them in. What do you think? What, in your
business, in your product, do you measure? What is your API that goes, how are
you measuring that?
Food Essentials, last year, they got something like 8.8 million pulls of
their data. I haven’t heard anything approaching 8.8 million in a year around
this table so far. There are some other partners and people who are interested
in coming to the table, that have not even invited, that are ready and willing
to come down and have those conversations. I thought that was wonderful to know
A lot of it is ads, some of it is subscriptions. For the APIs, some of them,
after a certain point, they will charge back. Some of it is ads, it depends on
the company. Some of them will sell it outright, like Healthy Communities.
DR. GIBBONS: I am thinking about new or emerging opportunities for the data,
some of which may or may not be obvious. One way is to look at who is using,
but another is to look at who is not using. Who is not on this list, and then
think about why and what we can do to get them. I think there are a lot of sort
of usual subjects or suspects that aren’t there.
One of the things that I think about, when I think about getting people to
use, now, in the design world, when you are designing technology and you are
designing software, too, but it all starts with a use case. It doesn’t start
with, well, let’s design some software. It starts with a real use case. I think
what has been happening with some of these challenges, you talk about broad
challenges and otherwise, is there is no use case. You just want people to use
What I have come to learn, now, 90 percent of the work that I do at Hopkins
is all around technology in health care. I have absolutely no background in
writing code. In the beginning, 10 years ago when I was starting this, I
thought this was going to be a real problem. I had a challenge. Yes, I was
interested in this stuff.
I had a very seasoned faculty member, who had been working in the same, many
of you know him. He actually said to me something. I thought he was being nice
at first, but it is really true. He said, look, first of all, that actually is
not a problem at all in this world. The reason is because most of the people
writing code are not health care people. Most of the people in health care
don’t know how to write code. You can’t just say coders do it, but you need
somebody in the middle who may not write, that can help the coders see what to
do. That is the use case.
Some coders can do that, but I think the approach that is being used is a
small one, for those coders who are able to do that. What we need to generate
is not more places where coders come to the table and hear our data sets. We
need more use cases. We need to talk to whoever about what is the problem you
are having that we needed fixed, and then think about, do we have a data set
that can solve that problem, then link developers around those issues with
those use cases.
Going to the example of Josh, I think we make a mistake if we think his
innovation is his piece of software. His software is cool, don’t get me wrong.
What he did was made a use case. I think why that thing is so valuable is
because he understood what matters to the people running the plan.
It doesn’t matter what you and I think about it, honestly. It matters what
they think about it, and he made a data set to answer that question. Most of
us, to some degree or another, are methodologists in here. If we don’t
understand the methodology, we don’t want to accept it. That is of no relevance
to the people who are making the decisions about those things. I think that is
where it hinders us, because unless we can see the methods and understand them,
we don’t want to accept it. Whereas, the people who are willing to pay real
dollars for it, just want the answer.
Now, it is important to know, is it really working? Do we have to understand
all of that? No, what we need is, if the goal is to get more people to use the
data, one way that I think that we are not capitalizing on is helping to create
use cases, and then link people together. The idea of presenting this tool at
the Datapalooza, I think is fine and we should do it.
There is a bigger way, I think, that happens more over a long term. The
examples Josh used, that is just one example of how to use this tool. There are
a million examples of how you could use this tool to do things from the
bedside, from the community, to real estate, using Health.gov data.
I thought of some use cases here. Actually, I didn’t come up with them
today. I have actually been thinking about it. I was telling Josh, I have been
publishing and writing about this stuff for 10 years now. I don’t have the
skills to do it, so I am always waiting for somebody or trying to talk. He has
got the skills to do it, but not calling it what we have been, so it is
We are running out of time, so I won’t talk about them, but I think that is
another way of sort of thinking about how do we get people to use the data. It
is really focusing on the use cases, because there are a lot of them.
DR. VAUGHAN: One of the main things I have found in asking people, what do
you think about the data, is that they just don’t know about it. They just
don’t know that it is there. Part of it is very much like many of the software
companies, to have a so-called developer evangelist, somebody who is actually
reaching out and engaging those communities.
I think, maybe from a Silicon Valley perspective, everybody codes, what do
you mean? It is different in different parts of the country, different parts of
the world. What I found exciting about Stanford Medical School is that it was
the residence and the docs, and the students, who were actually coming to the
table and coding.
What I found exciting about the Code for America is that it was people in
the programs, not so much coding, although some of them could, was with their
ideas. I had this problem, this is my idea for a solution, this is where the
data is. I need you for to help me with this part, and let’s work together. I
thought there are more opportunities for partnerships out there than we
realize, and that those are all ultimately about the social determinants of
health and communities, but that we have got a lot of folks who could be
invited in, and should be.
DR. COHEN: I have been in government at all levels, Federal, local and
state, for my entire life. One of the biggest gaps is our understanding of how
to be entrepreneurial. I mean that not in the sense of making money, but
thinking about we are so mired in collecting the data and making sure we are
getting it right, we lose sight of how these data are really going to be used.
Initially, it wasn’t our focus or scope. Now that we have gotten more
sophisticated in being able to generate the data, we have time to begin
thinking about these questions. I feel like we have moved to another level, and
we need another paradigm for those who generate data, thinking about its use
more directly. If we could add value at all levels of government, it would be
to find people who can make that connection between the data geeks who generate
the data, and the folks who ultimately can use those data and apply it in a
variety of settings.
DR. GIBBONS: The other side of that is, when you think about it that way,
then you say immediately, there is no way in the world 20 people could think up
all the possible use cases. It is just not possible. If you start asking
people, we could be the collector of use cases, and then have a system to
prioritize them, and then a system to link them with people who could do those
DR. COHEN: I think everybody who is like me, who generates data, has the
pile of slips in their lower left-hand drawer of great ideas that somebody
says, why don’t you just do this or do that? We never have the resources to
wraparound. That should be an office in every health department, the bottom
left-hand drawer folks.
DR. VAUGHAN: Well, the National Health Service in the UK actually is doing
that. Maybe we should be minding some of what they are doing, as well. I think
part of what I also heard back from some of these folks, and my takeaway was to
not be post-scriptive about what form the data should be in, that if you need
it as an API, we have got that for you. If you need it as a CSV, we have got
that for you. If you need it as a PDF, go forth.
As long as it is the same thing, apples to apples and oranges to oranges, to
not gain the system by saying, it will be this or so, but to put it out there,
to promote it in more ways that are more engaging, that invite more people in
and to let them do it, to get out of the way. A little bit of it is, I think,
we have not invited people in and we are also not getting out of the way often
DR. FRANCIS: This is an impression that I am going to pose as a question, to
ask you whether it is an accurate impression. When I was listening to you
speak, it sounded to me like many of the use case examples, I will characterize
the examples you were giving as use cases, really were much more like the kind
that were presented at the Datapalooza last year and the year before, that Mo
was sort of thinking of as old hat.
Where there was identifying information, it was about restaurants, not
people. That is why I wanted to understand. It seems to me that use cases in
which, regardless of how the information was originally shared by people, but
what the app developers or whatever want, is individually identifiable
information. I think they may raise different kinds of questions, that is all.
What I wanted to ask was, a lot of the examples you gave were not examples,
except Patients Like Me, where that is a very upfront, everybody knows what you
are sharing the information for. Things like ClinicalTrials.gov, there is no
identifiable information there, the results of the clinical trials and what are
the open trials. I wonder if you could comment on how accurate that impression
DR. VAUGHAN: I would say that is not accurate in a couple of ways. Number
one, the question and the opportunity is how do we grow the community of users?
That the community in California or the community in Michigan or the community
in Florida decides to step forward and put stuff out there, in their way, for
their community, just because maybe we want to know that other communities have
done it successfully.
Because they are doing it for their own community in their own way, and then
pulling on the data, it doesn’t mean that it is old hat. It means that we have
brought some new users to the table. I think to characterize the users as
wanting one kind of data stream underestimates the power and possibility of
using what we have much better.
DR. FRANCIS: Did you find examples where the users wanted identifiable data?
Could you give us a couple of additional ones that look like?
DR. CARR: Without question, it is a given that there is this tension between
wanting granular data that approximates identifiable, and that is true. I think
that we can opine on that, but not sort of as the hypothetical, but not as sort
of the overarching. When we get to a specific, we should incorporate that. The
answer will always be yes, does this push the limit. I am just trying to figure
out where this is taking us.
DR. FRANCIS: Where it was taking it for me is, if most of the kinds of
releases that people want don’t involve identifiable data, the questions of
protection look very different from if the requests, and the more interesting
uses, do. That is all.
DR. VAUGHAN: Interesting to a county is if you can get $15 million more in
this economy. I think that is pretty interesting, if you are a county health
department right now.
DR. CARR: I think Leslie said it well yesterday. It comes down to trust. At
the end of the day, where that middle point is, or that level of acceptability,
it does very much reflect the community, and what the shared vision, values, et
DR. VAUGHAN: I think identifiable data is a whole different set of issues,
and that is beyond the scope of what we are talking about today.
MR. SCANLON: I think all we are trying to do here is get a sort of state of
the science or state of the art, and what are we seeing, what are we
developing. I think the use case, my own thought, Chris, was it a question of
what was the objective of this, which is the use case. Sometimes, it is just to
get the information out there.
I still think, and that is probably why we formed this group, a lot has to
take place between what we put on a public use federal website or something
like that, and somebody actually taking it beyond just reading it. That is what
we are looking at, how do we promote that process and make these products
easier. I think you had a couple of ideas there, in terms of the format and
DR. GIBBONS: I think recognizing where our biases are limiting us, and
health care, and public health largely, but not as much of healthcare, has been
driven on this one-on-one notion. I fix the patient in front of me. There is
actually an exquisite amount of detail that I, and anybody else, can know about
you from public data sets, without even knowing you, without knowing your name.
You would be surprised. They can tell you what kind of toothpaste you are
likely to buy.
The idea, maybe not your opinion, but the assumption or maybe the underlying
idea that anonymous data can’t get me to a level of granularity that is really
helpful to me, is really fallacious, it is not right. We have to help educate
people and help people understand that.
DR. CARR: We want to come back to where do we go from here.
DR. MAYS: The issue of use case, I think, is probably very important for the
group. One of the things I wanted to really comment about, it is too bad that
Bob Kaplan is not here, to watch how it is that he is actually developing this
for mHealth through NIH.
I went and did this whole week-long training, but the way that he is doing
it is he is bringing together the people that need with the people that can. We
have to actually spend a week with engineers, and they have to understand us
and we have to understand them. It was probably the best thing that I could do.
That is how I know how to do this stuff, so I now know how to work with
One of the things that maybe is a possibility is to attach a day onto the
data user’s meeting, your NCHS data user’s meeting. That is one of the places
in which you could move people who are beginning to have an interest to even do
things different with your data, because there, you would get a lot more
bump-up, I think.
DR. CARR: It is like Paul said today with the extension centers, who said
that there was more meaningful users. This is with ONC, HIT, meaningful use,
the folks that participate in the extension centers had a much greater uptake
of meaningful use than those who did not. I think that is the same story.
DR. VAUGHAN: I would also say just briefly that there is a new ecosystem
within Salesforce, AT&T, mHealth Group, Microsoft developer group, and some
smaller companies like Tigsy, which are basically meant to let it be possible
to create applications and platforms without knowing how to code, basically
drag and drop, and make your own, and kind of cutting the whole engineer piece
out of it. It is a market.
DR. COHEN: I don’t think government has invested in these kinds of trainings
and opportunities. Putting online training and making available is a perfect
way to try to encourage folks thinking about use cases. There are some
rudimentary trainings attached, for using Federal data or state data. There are
the help files that explain what the variables are. To give, whether they are
YouTube videos or interactive training about here is how somebody has leveraged
these data to create value.
DR. VAUGHAN: There are some examples in the open courseware suite. Stanford
has done some things, Hopkins has had some wonderful things.
MS. BRADLEY: We are working with Code Academy to develop some —
DR. COHEN: People should be aware of these kinds of things.
DR. VAUGHAN: Code Academy’s educator is a person who is from San Francisco,
who is the cofounder of the Women’s Tech Group.
DR. CARR: Let’s regroup now and think – Datapalooza is an opportunity to
achieve a number of things, its visibility, its kind of roadmap of how do you
get from here to there, and if there are educational opportunities emerging or
even getting the feedback. I think we should think about that.
This is where I think we are. One, if we have an NCVHS workgroup session,
what we would like to do is travel the road from where a person like I am, to a
person like Josh Not so much that anybody would come out of the room, like
Josh, but that people understand that we are not just doing static pictures of
food stores, but rather how this knowledge management evolves, and when you can
develop sophistication in that, the kinds of big questions that can be
addressed. I think that is one thing.
Just to say, if you have this, you add this, now you can do this. Now, if
you add this, and then the kind of crowd sourcing data and so on, how it
enhances. Then, the things that we are hearing are once you know how to do
that, you could do that, but how would you be successful, so the use case
critically, the education, the presentation, just the things that we have been
hearing, to wrap that in.
DR. VAUGHAN: The success is different, so obviously having a $1.9 million
funding stream is successful. There is also success, is my program working
better, am I getting better value of what I have already got, reusing it, in a
way that makes it easier to use for a non-profit, for a local health
department. I think those are values also that have a bottom line, but are also
users that should be invited in and learned from.
MR. SCANLON: The objective could be simply to just put the information out
there and have the community more informed.
DR. CARR: I think Chris also raised an important issue of who is using the
data and who is not. I think that kind of thing, who is not at the table here,
because that also informs the outreach.
DR. VAUGHAN: Vickie, I referenced that report you sent out and zero divides.
I think it may even be on the SharePoint now, but I commend it to you,
thoughtful about the digital divide and e-divide. One thing I think, even
though it could be seen as outside, but I think it is inside the scope of this
group is alluding to what was said earlier, fostering opportunities for people
of different types of backgrounds, not just data geeks and developers, to come
together. In some ways, sometimes focusing on the app becomes the distraction.
I personally hate the terms mHealth, I hate it. It is like have a phone, oh,
we must use it in healthcare. That is the wrong way to develop tools. They
start with what is the problem that needs to be fixed, and then, is there a
technology solution that fits that problem.
I am not totally against them, because they can be valuable. Use cases
around problem challenges, something like that, because one app can be used for
many different kind of things. If the focus is on building an app, we can
always build a 100 more apps of the same thing.
DR. GIBBONS: You don’t want the tool to determine the limits.
MS. BRADLEY: Did you say you were tracking the use cases?
DR. VAUGHAN: It was a convenience sample. Tracking it which way?
MS. BRADLEY: If I created an email account for you to send these types of
things, like in your daily life, idea/use cases, with a one sentence, I will
find an intern who will find out more about these cases, that would be helpful
to continue that dialogue, even when you are not here.
My other question is just to understand the audience at Datapalooza. I know
Justine is on the steering committee. I am not familiar with who will be in the
DR. CARR: It runs the gamut. As I said, Ed’s group was a very kind of more
academic community-based, what are the potential. Josh’s group, very different.
MS. BRADLEY: Who is NCVHS, our session, are we speaking to everyone?
DR. CARR: You are right, who is the audience.
DR. VAUGHAN: For the GIS presentations, it was mostly GIS people.
DR. GIBBONS: Another way to look at that, and this is coming from my
perspective, is while that idea has generated a lot of steam, because I have
heard it a lot, it also can push some people away. I have never been, I will
admit that. Part of the reason I haven’t’ been is because it sounds like it is
for tech geeks or data geeks, of which I am probably neither. I am being a
little tongue in cheek, because I do see my relevance in going. I just haven’t
had an opportunity to go.
MS. BRADLEY: My point is just I would like to tailor whatever it is we
DR. CARR: Yes, I think you are right. The other term that I heard, that I
think is very important, and impacts us as developer evangelists, to identify
opportunity and partnerships. If it at the end of this session, the goals might
be one, to open people’s eyes to the potential that there is much more here and
it is sustainable. Second would be creating developer evangelists, or
evangelist, of that.
DR. VAUGHAN: The developer evangelists are remarkable. They go and meet with
the community, this is our product, this is how you use it. They are available
for at least some period of time, to let people get their hands on it, ask
questions, make mistakes. It is open, these are the possibilities, this is what
you might do it for, what are your ideas. It is a real direct and intentional
process of engagement.
People on the for-profit side, that is how they get more customers. That is
how they bring people into create products, to get even more customers. That is
also how it can propagate through all the other sectors, for whatever your
customer might be.
DR. CARR: It gets back to what Paul said this morning, feet on the street.
DR. VAUGHAN: That is exactly it.
MR. CROWLEY: Through that engagement at the Palooza, how do we want to stay
engaged with these members, who have showed up, who have already expressed an
interest? These are the people that we are going to try to get feedback from.
As we continue to make decisions or think about ways forward, how do we sort of
bring them into the conversation? These, too, sort of logically become people
who become part of this learning community and this health data social group,
where we start implementing these other ideas that have been expressed here.
DR. GIBBONS: That is a really good point, thinking about mentors, for the
ones who have been involved n the past, reengaging them in the future.
MS. GREENBERG: I am still stuck, not really, I have made a little progress
this afternoon, with Lily’s slides. I felt the same way, Mo, when I looked at
them this morning. Forty-five thousand people, oh, that is so pathetic. On the
other hand, have I ever influenced 45,000 people? No, but everything is
The untapped potential is mind-boggling. The use case, I mean, I am thinking
everybody who has ever known anyone, who has had any kind of cancer, has
probably gone to that NCI website. I think it is the best source of
information. It must be millions of people.
Some of those types of websites, let’s see what the numbers are. It would be
just interesting to contrast. It seems it must be 45,000 reporters in the
United States, if you look at all the little towns. Maybe they don’t do things
on their own anymore. They get everything from the AP or something.
What I am thinking is we have these opportunities. First of all, I think you
should definitely do the session. I think you probably should do it, in
conjunction with Ed’s session. I gather Ed is still planning to do this, even
though he will have retired by then.
MR. SCANLON: He took a somewhat different tactic.
DR. CARR: I think he is looking for something that would really draw more
upon the hearing that we are having and sprinkled with some data and a data
user. I think this session could stand on its own.
MS. GREENBERG: He does attract people to there. It is a good question, who
is your audience, and there are so many different types of people who come to
the Palooza. You don’t want to compete, and that happened last year, against a
session, if you can, in which the same people who think would come to yours are
going to that one instead, and maybe they might have bigger names.
I am thinking that both at your session and at the hearing in the end of
April, these are great opportunities to start collecting some of these use
cases. Then, in kind of a snowball effect, once you have some good ones, to
start thinking about YouTube, webinars, partnering with people on webinars,
partnering with people on educational offerings, because just partnering with
schools of public health.
There are so many opportunities, but I think you can’t just expect them to
come. It isn’t if you build it, they will come. It doesn’t work that way.
Forty-five thousand came, I don’t know I have never built anything that that
many have come to.
DR. GIBBONS: We undersell ourselves, and we have to be careful with the
numbers. Let’s say for example it was 45,000 developers, and then, they all go
off and make things that now millions of people are using. You say, oh, that is
nothing. It is one statistic, but you look at it.
MS. GREENBERG: I think the education component is so important.
MR. SCANLON: I think it is sort of the background and activities of the
committee, why we are doing all of this, for the use case for this group. Then,
maybe two examples of sort of generation three applications. Then, I guess I
was hoping to have part of it be a two-way discussion of what do they think.
Again, this is probably an area where in some things, the Federal government
will help, and in some ways, it will get in the way.
If we just ask them what could do the Federal government, it is the wrong
way to go. What could we do in terms of the data. Our contribution, among other
things, is the data that the taxpayers have paid for. You have already paid for
it. If you don’t even know about it, and very few people are using it, then
obviously we are not doing a good job.
Fairly good with the researchers and the public health community, if you
have gaps there. For this broader community, it is the difference between
20,000 hits and eight million hits. I hope we can leave time.
MS. GREENBERG: That is what I am suggesting, use it to get some feedback.
MR. SCANLON: It is a two-way. Do they have ideas? Is there a place to follow
up, particularly in terms of what would make it easier. In addition to that,
that is for the Datapalooza, those are the same questions we are asking the
subcommittee to help us with, even before that. What could we do, simple and
more complicated steps, for the data we already have and are already trying to
Make it available in ways that developers would find it easier or use it
easier. If you are not aware of it, you will never think of it. Putting the
boys and the girls together, and everybody stays on two sides of the room, is
DR. CARR: I think Bill’s suggestion is helpful, sort of introductory, what
is out there, if I am this kind of person, what I am looking for.
DR. VAUGHAN: In the interim, one of the most wonderful pieces of just going
around and talking with people is, again, in different parts of the tech
community, of organizations coming together for different reasons, they are all
interested in offering their ideas for how it could be better. It would be
great to engage them and bring them forward.
MR. SCANLON: I think from that point of view, besides faster, better,
cheaper, we need something more.
DR. FRANCIS: A good way to get a sense of the audience actually is I believe
they publish each year a list of the exhibitors. You could go back. At each of
the Datapaloozas actually, and it is a great source of use cases. You could go
back and look at who the exhibits were at the different Datapaloozas, and
DR. CARR: I want to circle back now to Josh, because the other thing that
actually I thought about, is as you get to this more sophisticated part that
seems like way out there, although maybe not so much, when you think about if
you are an ACO, you talk about Aetna, but you are an ACO, you are trying to get
your services to your people, all those kinds of techniques. The ACO session
was pouring out into the hallways last year, everybody was there.
If we could build something, we could build toward this is what you have,
but we could also carry that over, that okay, now, you are not Aetna, but you
are Steward Health Care, and you are trying to manage your high risk.
DR. GIBBONS: You are describing a use case. Use cases that are relevant to
ACOs, that I am sure can be done —
DR. ROSENTHAL: Sort of a way we did this last year, when we had a couple of
hundred people and they were spilling out, as well, literally, what we did was
we said, you guys come up with the use cases. They filled out these mad libs
and they literally said, I am interested in this.
PARTICIPANT: What is that?
DR. ROSENTHAL: Like the hackathons and the knock on those is that, 99
percent of them, vastly more than any other space, fail. If you look at a
funnel, and Greg has his internal crunch phase, so why did they fail. It is
people starting with the data, rather than the use case, and they don’t
navigate perversity, so they are essentially walking into Chevron saying, I
invited a water-powered engine.
The way we handled this last year, when we had a couple of hundred people,
is we had not me doing it, but we had five other people walk through, the exact
same thing, use cases. Here is the problem I am solving, here is the data I am
using, here is the output of it and here is what it looked like.
Then, we had everyone in there fill out, this is what I am trying to do, and
they filled out a format. They had to use public data as source and public data
as measurement. We had a couple of hundred of those things and it was very
interesting. It was very interesting framing around it.
DR. CARR: At that meeting, 500 people made a use case?
DR. ROSENTHAL: Yes, so they filled out these little cards, a couple of
hundred people. Literally, I can show it to you, I brought it in last time. It
said, here is my idea, it is for a blank. It solves the blank business need for
this market in this segment in this way, using this data as measurement, using
this data as whatever. Rather than starting with the data, it actually forced
them to create a use case around it. I have a stack of those, and I am doing
the same thing.
DR. CARR: This was a list of this is what I wish I had, or this is what I
know how to do?
DR. ROSENTHAL: Think of it as a hackathon, where you say, I have this idea,
right? All these guys come in with these ideas. The only difference is, it
actually forced them to say, this is my idea, what I want to build, what I know
how to do, whatever. It forced them to put the business wrapper, public good
wrapper, social good wrapper around it. I sent that to your work group a couple
of times, and I can send it again.
DR. CARR: I guess I will ask it again.
DR. COHEN: Did you send all of the completed cards or did you just send the
DR. ROSENTHAL: I sent my summary notes, just like I did with the site you
guys did on other things.
DR. COHEN: We could potentially pull some use cases out of there.
DR. ROSENTHAL: Yes, or you could look at the historic stuff. If you want to
make it interactive, I thought I heard you guys say you want to make it
interactive, you can have them create their own use cases in real-time. Say,
here is an example, here is one way to use something high-end maybe, maybe not,
but it is in the market solving these things. You guys all have great ideas,
the MIT guys are coming up with all of these apps, and they all fail like
markedly, one after another, in terms of -.
MS. BRADLEY: You need to be talking to the Harvard MBAS.
DR. ROSENTHAL: Exactly. I mean, any of these things, you look at that is
cool, I am speaking at Ted. Do you have users, do you have revenue? What are
the metrics to even evaluate is a business question in that market.
MS. BRADLEY: All he is saying is that people need to have the thing that
says, I am solving this problem. They start with like, I want people to stop
DR. CARR: I am trying to picture the thing. They have a card that says I
want people to stop smoking, and then what? Josh solves it?
MS. BRADLEY: You then you help them pick. You are like, this idea works.
DR. ROSENTHAL: They have a Mad Lib card, and it basically is this formula. I
walk them through it. I say, this is how I did it, this is how — Mad Libs are
the games you play where it has the sentence and it has a blank. It is like
this game basically. They take these little cards, and they look at your
sources, and we had a couple hundred of them come out. Rather than trying to
retrofit or guess what the use cases would be, everybody works in these groups.
They basically create the use cases in real-time. It might be a project that
they are working on, to answer your question, it might be a business they have,
it might be a crazy idea. That other link I sent was one of the Pew research
scholars with her kind of idea, saying this was the only session where we
actually tied it to like the business and market reality behind it.
DR. CARR: At the end of this, you will have shown them your problem can be
DR. ROSENTHAL: They identify the problem and they create their own solution.
They walk through that framework, so I don’t show them anything. They build it.
This is what I did at Harvard and other places. The groups work together and
use their data to solve a specific problem. They come up with their own use
Rather than serving as the collector of use cases, you get them in a room.
You have them say what is important to you, in terms of a problem, what sort of
data might help you, how do you go about solving it, and then you glean that.
That is what we did last year, that is what we are going to do this year. You
are invited to do it. You can do it at the session I will be doing it, or you
can do it in your own session, we can do it together. That was highly
MS. GREENBERG: Did you get into, given the problem they want to solve,
whether there are data or there aren’t data, or you just assumed there were?
DR. ROSENTHAL: The mechanics of the game, I can send it again or I can bring
it in, if you want to look at it. They had a master card with a formulation,
don’t call it a business case, just kind of fill in the blank. Then, they had a
series of little tiny playing cards. Some of them were, what is a business
need, lowering costs, improving growth, what is a specific clinical need, XYZ,
what are barriers, what are competitive advantages.
Then, the public data, both for source and for measurement. The requirement
was they had to use at least one HHS data source for measurement and for
source. They went through and did that.
MS. BRADLEY: Can I translate it back into something other folks might be
familiar with? I like Mad Libs, I remember that. Another example might be, it
is like those public preparedness exercises. These are like group learning
exercises, where you want people to get together. You have them maybe start.
This is when they come up with the use case, the Mad Lib card./P>
Then, maybe you get together with five people, and you say, okay, let’s pick
one to work on. Okay, now, let’s go through and think about this some more.
Kind of the way in which you pick ideas, you are like, which one do we think
might work. You are learning from each other about why one might work.
Then, you can sift them through, but the whole thing is that they are kind
DR. CARR: What if the data is not there?
MS. BRADLEY: Then they learned from the exercise about how to go through the
process of thinking the data is not there.
DR. CARR: It is like restating your problem. I can’t solve how many da da
da, but if I knew how many of B, it doesn’t matter if I don’t know A, but B
would be close enough for me to take action.
DR. GIBBONS: We are talking really about binding to a goal and many ways to
do it. I have thought about ways where you are doing exactly what Josh is
saying. But the people doing it don’t have a clue about the databases. You have
articulated that it is a problem, so the process could be to educate. Oh, these
are the problems, because that is what you need, the scenarios. Do we even have
databases that can solve those? Some will be yes, some will be no.
The other way I was saying is to collect these ideas, because in the
beginning, it is going to be rough for people who haven’t done it before. These
things become examples to others. As you do more of it, it actually gets a
DR. ROSENTHAL: That was a speed dating part. They had to do three of them
without thinking about it. If you know the data, that is great, and if you
don’t, you don’t.
MS. BRADLEY: I think the other point would be that we want to train them to
start with the problem, and then say, okay, so I don’t have the data. Now, you
might want to go out and actually ask people who know about data. What if it
turns out we did have the data, and it was CDC data. We are trying to open up
their dialogue and think about, okay, who would I go to then to find out about
the right data. Don’t just start with the data, don’t just start with a pretty
MS. GREENBERG: Who was on the committee, besides — You don’t start with a
standard. Who cares about the standards, if they are not answering some
DR. ROSENTHAL: That was out of those couple of hundred cards, one of them
was data access, the data doesn’t exist. You get a sense of what are the
business problems they are working on or interested in, what data are they
using, and you can track it that way.
DR. CARR: A couple of different threads going on, there is Datapalooza and
what this group will do. There is actually, Josh, what you are doing is sort of
102, we are doing 101, and maybe that intersection. I don’t know, we ought to
think about that.
Then, I want to hear from Ed, and then, the third thing is just not
forgetting about what Bill Davenhall suggested, although that is not going to
be tied to the Datapalooza, so we will come back. Ed, did you want to say a
little bit more about your session and how we might intersect with that?
DR. SONDIK: I am going to run down this outline I handed out. Does everybody
have one of these? I was asked to do a session on community health data. We did
this last year and it was kind of a double session. I thought this year to
divide it into two parts. The first hour and a half focused on people who could
talk, practically speaking, because their hands were dirty with it, innovation
and insights that they have drawn from this area of using data to try to
address health problems.
I thought it would be great to get, and I just list off the top of my head,
it came from my random word generator, my brain, it would be great to get a
legislator at the state level who really has done something, been involved with
an issue, whatever. Somebody representing a healthcare delivery organization,
you can fill in something along those lines, a lobbyist or an advocate, who has
actually used data to come up with arguments. An insurer and how they use it,
perhaps Aetna for all we know, and a public health official. That was the
To have them, as I wrote here in my death list prose, have a great story to
tell on how local or national or international data make a difference in
decision-making. The point that was made before is that data is good to inform.
There is not necessarily a specific decision or the like. I think it is a
really good point, and that could also be part of this. You still want to know,
I think, how it actually gets used. It is great to inform, but toward what end?
If you can show that it makes a difference, I think that is great. That is sort
of where we are.
I thought the second part could focus on the future and recommendations for
the future. To do that, I was thinking of starting this hour and a half with
something on the order of 20 minutes from one speaker, who could look more
broadly at this field, from new sources of information. I wrote here,
electronic health records and crowd sourcing, to a variety of issues, I said
here, enrolling everyone in drug follow-up monitoring or surveillance, which
would give information.
Then, the issues that this raises, now, there is someone sitting here who
always goes to privacy and confidentiality. I can’t imagine where she is at the
moment. This always comes up. Just have this person be able to bring that into
the story. I always bring up the quality of the data. Don’t ask me how to
measure the quality, that is another story. It is always an issue. People want
to know about the veracity, the reliability, the validity and the usual kind of
The person who would do this would be a really good speaker. Someone who
would really show the promise of this. Kind of pick up from the first session,
where we are hearing short talks from people who have actually gotten their
hands dirty, and they are not the data people, these are the users.
Then, we would have perhaps a data person, a practitioner, whatever it is,
who has really good vision and can bring things together. The question then is,
okay, where do we go, though? What recommendations can we make for HHS,
recommendations for the GSA? They are still over Data.gov, right? What these
panelists would recommend to the federal government, but also it can be other
governments, as well.
I thought, now, who would be good to do that? I thought, number one, because
it is a little issue with me, the ABCS. The ABCS is a White House program. If
you don’t know about it, that is my issue. It is disturbing to me that it is
not publicized enough. It is a program, very simple to get people to use
aspirin, have their blood pressure checked and controlled, know what their
cholesterol levels are and do something about them, not necessarily with
pillage or plunder, but with exercise, diet and the like. The S is smoking.
I thought somebody representing that program, and the local issue, what are
the ABCs in a local area and what is being done to work with the medical
community and whatever. What recommendations would that person give in that
area? It is the Million Hearts thing. We, connected to Dr. Frieden, call it the
ABCS, because who is counting a million. Prevent a million heart attacks.
Healthy people implementation at the county level, again, what
recommendations? Remember we talked about small area data? No data are small,
only the areas, I suppose. It is late in the day.
Infectious diseases and epidemiology, critical public health function, what
recommendations there? States complain that there are too many programs they
are reporting. They are not linked together in some ways. Bruce could talk
about this. Panelists representing the schools, what is needed to know there,
what would they recommend in that area. Understanding big data at the small
level, I am not sure what I meant by that. Research and data users of the
future, for example, the self-monitoring, move toward self-monitoring, what
recommendations could we make in that area. That was my thought.
I put it in this matrix that says A and B here. When we first started
talking about the linkage between this activity, and particularly Josh’s
activity, I thought, well, this could fit in here. Now, having gone through
this, I don’t really see that it could fit in, except perhaps maybe it could be
a version of the 20 minute talk. We would have to have slow talkers.
It should be something that is really stimulating, people really feel that
this is an exciting area. The evidence presented in the first session will
buttress that, the information there.
I just, out of the top of my head, wrote down some possibilities here. I
don’t know any legislators, so I didn’t put anybody in there. In organizations,
I thought somebody from the major ones was possible.
MS. GREENBERG: AHRQ, I think they still do, they used to have seminars with
legislators. Some of it is about using data for policy, so they might could be
able to help you with that.
DR. SONDIK: The first one, in the second session, in the 20 minute, Don
Detmer came to mind. My horizons have expanded today, and my vision is
expanded, I think. It could be anybody. I just wanted to put something in here,
because I was going to share this with the Datapalooza executives. That is my
story on this.
I would be grateful for any suggestions on people. Also to say, I think that
is a lousy idea, what I would suggest is the following. I need the suggestions
right away. I really have to get working on getting this thing fleshed out.
MR. SCANLON: The other thing is we John Bon Jovi at one point in sessions
last year. I think we should raise our standards to maybe Beyoncé.
DR. SONDIK: It is very interesting that you brought that up. I want you to
know that this is a PG rated session. We are not going any further than that.
DR. CARR: This has been a very rich, exciting session. I think we have
broken through, from where we have been to where we are headed. It sounds like
we want to have an NCVHS session. Lily and I will talk maybe on Monday, and
Then, we need to get back to Ed. I just think we need to kind of mull it
over. I think everybody looks spent.
MS. BRADLEY: I also think we should pick who we want our audience to be.
Let’s pick them and tell them, this is who we are targeting, and we will craft
it for them.
MR. SCANLON: This is probably a concurrent session, not a plenary?
DR. SONDIK: It is a concurrent. I don’t know what day or what time, I just
wrote this in, because I like to see the detail, whether it is right or wrong.
Any questions? I think some of the stuff, you know, I have seen stuff, for
example, at Maryland that could be really exciting here. I think, to me, it is
really on the demand side, I think, the use side, as opposed to the supply. I
think the more that we can emphasize that there are really meaningful uses
here, PatientsLikeMe is just a wonderful example.
You know, I don’t see that as sort of the, well, maybe it is mainstream, I
don’t know. It is compelling in the fact that it can generate new information
in really different ways. It may be possible to bring that kind of idea in
here, with the right people and apps that really generate new information, as
opposed to use the information, in conjunction to using the information we
DR. CARR: Do I have a motion to adjourn? We have a lot more thinking and
follow-up to do. I thank everybody for this very rich session, and I wish you
safe travels on your way home.
MS. GREENBERG: This is very exciting because, we have been talking in this
committee, ever since I have been involved with it, and Susan Kanaan can
validate this, too, about how can we educate people more, or if not educate,
just sort of communicate more with people about the data, and how it can be
In many ways, it is the best kept secret and everything. We now have this
working group that is really helping us do that. I am very gratified by that.
Thanks to all of you. I am so excited about Lily.
(Whereupon, at 4:48 p.m., the meeting was adjourned.)