[This Transcript is Unedited]
Department of Health and Human Services
National Committee on Vital and Health Statistics
Working Group on Data Access and Use
November 14, 2013
National Center for Health Statistics
3311 Toledo Road
Hyattsville, MD 20782
Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030
caset@caset.net
P R O C E E D I N G S (1:04 p.m.)
Agenda Item: Welcome
DR. CARR: Welcome everyone. We will start by going around the room and announcing who we are.
I am Justine Carr, Steward Health Care System in Boston. I am chair of the Working Group on HHS Data Access and Use.
MS. BRADLEY: Lilly Bradley, staff to the committee and the Working Group.
MR. DAVENHALL: Bill Davenhall, ESRI.
DR. CARR: Mo will be joining us by phone today so we want to be sure to speak into the microphone.
DR. ROSENTHAL: Josh Rosenthal, RowdMap.
DR. CROWLEY: Kenyon Crowley, University of Maryland, Center for Health Information Decision Systems, no conflicts.
DR. VAUGHAN: Lean Vaughan, Health Policy Group. No conflicts.
DR. TANG: Paul Tang, Palo Alto Medical Foundation. No conflicts.
DR. MAYS: Vickie Mays, member of the Full Committee, no conflicts.
DR. GREEN: Larry Green, member of the NCVHS Committee and voyeur at this work group.
DR. FRANCIS: Leslie Francis, member of NCVHS and member of the Working Group and no conflicts.
(Introductions around the room.)
DR. CARR: Thank you everyone. It has been a kind of somewhat circuitous route to today because of the furlough, but I want to thank folks who have worked independently and pulled things together. Lily thank you for working around the furlough.
What I’d like to do today is go back to start with our charge, remind ourselves of what our charge is, and then Bill has done a very nice integration of where have we been in relation to the charge and actually where are we going would be another good part, and then we can move on through there. I won’t go through the whole agenda. Let’s just start with a reminder of the charge. In your folder you’ll find something, November 14 workgroup charge, but before we do that, Susan and Bruce, do you want to introduce yourselves?
DR. COHEN: My name is Bruce Cohen. I’m from the Massachusetts Department of Public Health. I’m a member of the National Committee and of this work group.
MS. KANAAN: I’m Susan Kanaan, writer for the committee.
DR. CARR: Alright, just a quick review. I took some high level things here. The work group charge is to review the current portfolio of HHS data resources and the current policies, mechanisms, and approaches for promoting access and innovative use and applications of HHS data to improve health and healthcare. Number two, identify and monitor trends and capabilities and traditional and new information dissemination and data access strategies, developments, technologies, including social media and their application by the data technology innovation community, and advise HHS on opportunities. You all are the innovation community.
Number three is identify and monitor the types of data an information needed by all participants in the health system including consumers, patients, providers, plans, payers, communities, state and local governments, and the policy research, public health and other stakeholder communities with attention to the content quality, technology, and audience issues.
Four is identify and study areas of opportunity to improve data access and application and associated privacy technology and other data policy issues; serve as a forum for promoting and facilitating creative communication to the public, key stakeholders, and the technology and innovator community about the availability of HHS data and opportunities for use. Five is facilitate HHS access to expert opinion and public input regarding policies, procedures, infrastructure to improve data access and innovative use. Six, advise HHS in understanding an evaluation of how HHS data is being applied and the value that it is generating.
That’s our charge, and Bill sent along a very helpful summary last week, and I’ve mixed and matched pieces of it, Bill, so feel free to help us with this. I think you really captured these four– in four, federal data leadership requires objective assessment of the following, these four things, and I’ve taken some of the examples. I hope I’ve not undermined your intent, but the four things are what data already exist along with the qualitative usefulness assessment?
If we think back to our early meetings, that’s where we began. We had a lot of presentations on the various data sets. We had a lot of discussion around what would make them better and how we could make them better. I want to come back to that, but I also want to put– and Larry, maybe this is for you– there’s been a lot of rich observations in this group around the data sets and things that can be done to make it better. Just how we channel that information back to HHS in a proper fashion so that it can be acted on. When we were in the building, folks at HHS could be at the meetings, but since we’re up here it’s harder for folks to get up here. Of course we have Lily as our liaison, but I know the full committee has a very deliberative process of letter recommendations. I think Jim and Todd were not feeling the need to follow that deliberative process on some of the things we’d opine on. We’ve not figured out the vehicle to get those ideas across. I’m sure you’ll come up with a plan.
DR. TANG: I have a question about the charge. When Larry went over on the table, he said what are the three words and stuff, one of the ones I quoted was Jim Collins’ “who then what”. If I look at this, it looks like we’re doing a lot of “what”. What data, what sources, what is being done versus who could benefit from it, and what do they need? It just struck me as we reviewed this. I wonder if that’s part– as you know, we’ve struggled a little bit with how do we create the value, and maybe we are looking– we’re starting first with “what” and not understanding who’s the beneficiary and what do they need.
DR. CARR: I’m interested in what others think, but I just assumed it was HHS has put this data out there and they’re asking for input from this group on what would make it better.
DR. TANG: So I think that’s putting the “what” before the “who”. I think we might need to find our true north first, and then what of these data are useful to solving the problem that’s really out there. It doesn’t mean we’re ignoring the data that’s out there, but I think we have to be more selective on what’s out there, and more importantly what are the gaps. If we want to look at novel sources, which is part of our charge, we’ve got to see which ones even pertain. Just by being novel doesn’t mean it’s valuable.
DR. FRANCIS: This is just a process point. This is a workgroup of NCVHS, so a fairly typical thing that a workgroup would or could do is make recommendations to the committee of which it is a workgroup. This workgroup could make recommendations to NCVHS about what– if there were a desire for a more formal letter or something of that sort, this group could suggest saying to NCVHS just as the Tiger Team, for example, suggests things to the ONC policy committee.
MS. GREENBERG: This is the last time I’m going to introduce myself as Marjorie Greenberg from the National Center for Health Statistics, CDC, and executive secretary to the committee. It’s been a great privilege to be able to do that all these years.
DR. CARR: The question was when this, the working group, was first set up by Todd and Jim et al, there were two directions that I heard. One is that this is a reactor panel that could just give feedback. When I think about the things, as I was just talking about, the charge, and Bill has put together some of the things we talked about in the early stages about how HHS data could be made better, the question on the table is when we have those reactions or observations, should we be putting that into a formal letter of recommendation through the committee that gets voted on in February?
MS. GREENBERG: I think if you have recommendations that you think rise to that level, now that you’ve been meeting for some time and discussing these issues, absolutely. That was always expected. The reactor is like hey we’re doing this, what do you think about it, and the kinds of discussions we’ve had about certain things, but it was always expected, and I think it’s spelled out there, that you could make recommendations but through the full committee.
DR. CARR: So what I would like to do is just maybe ask Bill to speak to the great summary that you put together. You had given some example in terms of what data already exists along with qualitative and usefulness assessment.
MR. DAVENHALL: I would like to say I will qualify some of these comments by saying that I personally believe that there is more than enough data. It’s not an issue of there are some issues over the new data that none of us are experienced yet with, but for the data that we are roughly familiar with, there is lots of that data that is going unused because we’re looking at it in not the most modern way possible. Part of my comments are directed to– this isn’t like inventing new data and not really struggling with who the data is for. There’s another document provided that would tell you if we’re interested in the health of babies, well, we could drill into each one of these and come up with who exactly would benefit from this data.
It’s a chicken and egg thing. We have a pile of data, a mountain of data, that we refuse for a variety of political and economic reasons not to mine, not to get busy and exploit for positive purposes, and then we also have all these social and health problems we’re trying to solve in which the answer always is we never have the right kind of data yet. My argument here in showing you some of this is we have all the data we could ever hope to use probably for the next five years that would be tremendously useful for community health activities.
For a variety of reasons, which we as a committee need to help the federal government identify what those barriers are so that we can help them understand how they might get around some of those, that’s what I always come at this from. We’re not plowing our fields as well as we already know how, and yet we sit around and spend a lot of time talking about how we want new tractors, and how we want to plant new crops. It’s that kind of analogy.
I do have a slide that I’d like to show. It’s going to be a little lesson on geography. It’s one of those that says United States geographic granularity. What I wanted to do here was make sure we were all well grounded in why I keep talking about how we don’t use the data we already have. Look at the column that says “geographical units”. Think about birth rates. It is potentially possible to have a birth rate per every one of those geographical units–
DR. COHEN: There are actually 57 birth rate unit. There are 57 registration districts in the United States. It’s just a minor aside.
MR. DAVENHALL: There you have the number of counties. When we talk about data that we have the greatest amount of at the county level in population health in this country, we only have 3,248 specific geographical entities in which we can represent that data. We can only generalize for the 3,200. You go down the list there. Now all these, let’s get down to the big one where there’s a thing called “block” that’s a creation of the census bureau that they collect our census data on. There are seven million of those. You can see from a geographical perspective the greatest granularity is going to be that. I would suspect there’s very little data that can be used publicly at the Block level that deals with a whole lot of issues in health because your cell size is going to get so small you’re going to be able to identify people.
Somewhere between seven million and 3,248 lies our greatest opportunity. These are all– they’re not in concrete, but I’m saying they’re well-vetted, they’re well understood, the statistical communities that worry about confidentiality have had a lot of experience parsing and looking at data and reporting on data at these levels.
Now, go over to the column that says “data sets”. I’m trying to inventory how many data sets do we have out there today at those various pieces of geography. You can see the winner is zip codes. Right now in health we have more data that’s available by zip code than any other single category and then not much below zip codes, almost non-existent in the rest of them, in the geographical granularity that’s available to us in our work. It’s a painful elaboration of the obvious.
That National Center makes none of that data below– it’s really county– available. In fact, it’s even worse than that, it’s regional. It’s the census regions, which there’s only four of them. There are a lot of issues that we could get into that. The other slide is the score card–
MR. ROSENTHAL: How are you thinking about HRR and things like that?
MR. DAVENHALL: They’re another one of the administrative boundaries. The hospital referral regions, the hospital service areas are reconstructions. Like for example, Dartmouth gets to those HRRs and CMS by reconstructing zip codes. Their unit of analysis is this HRR, but how they construct is through aggregating zip codes.
DR. COHEN: I’ve got a quick comment. I think this very provocative, and thank you for doing this. I think it’s very important to provide the context. I would say with respect to census tract, I think virtually every state’s geocode, its vital statistics, its cancer registry data to generate census-tract level information– I think there might be another column that tries to identify– yes, in most cases through a variety of methods, it can become publicly available and useful to researchers, and there are web-based data query systems that are at a census-tracked level for aggregated data, but not at the individual level.
MR. DAVENHALL: That doesn’t mean– what he’s saying is that behind the firewall inside the public health authorities and agencies, the state governments have the opportunity to look at this data at those lower levels of geography.
DR. COHEN: And the public. There are web-based query systems where you can go online now and generate information by census tract for the number of births and deaths for several states. Those data are routinely available for a variety of uses, either aggregated across years or aggregated across census tracks. For instance, we aggregate into neighborhoods for large cities in Massachusetts to create viable geo-political units. This is really a very useful exercise to understand the depth and context of the issue.
DR. ROSENTHAL: If you are going to do any reconstruction around HRR, some cross-walking into Dartmouth or even STAR scores are at a geographical level of contract you need county, basically. County is the great cross walk to any of the provider overlays as well. I’ll just throw that out there. It’s even more important than sheer numbers. That’s the way you take population health and connect it and supply it.
MR. DAVENHALL: I went to healthdata.gov and did a statistical analysis that’s at that site. You can do the very same thing. In fact, it’s a very productive site. It has some– I will give you some suggestions as to how it could be improved, but for example, there you can look at each of the agencies and see how many data sets they have out there and how many are at the zip level. You see the big winner is CMS, but only 13 percent of their files are available at that level. What it’s basically saying is if you’re going to be a public entity and you want to use this information for the public good you’re going to really struggle mightily.
When we encourage organizations, community-based organizations to get busy doing this community-health planning, unless they have a serious partner like Bruce at their right arm who can lead them to all the little pockets and silos of this enriched uranium data, they’re not going to find it. I can’t even find it on the healthdata.gov site, but I do believe that by putting it on that site we’ll be able to do more of this. A year from now, if you go back and see how those numbers changed and what percentage might be able to be at a lower piece of geography.
DR. TANG: Can I ask another clarifying question? The fact that it’s not easy is a good point, but the fact that it’s possible– is that what Bruce is saying? It’s possible to get down to the census track, which is your block? Is that true?
DR. COHEN: Some states do down to the block or block group or census track is the basic building unit for aggregating to develop population data.
DR. CARR: So its ability is one category. Access is another.
DR. TANG: It is not access. It’s maybe visibility. It’s not easy, but it’s accessible and it’s possible.
DR. CARR: I think what we’re hearing is that they have it at a granular level, but it’s not available–
DR. TANG: No Bruce says available–
DR. CARR: It’s variably available. The availability is a separate issues from can you get it– does it exist, yes, who can have it is the issue. There’s two separate issues.
DR. TANG: It’s really important to know what we can do.
MR. DAVENHALL: Let’s talk about birth rates. I delved into birth rates, birth data because I wanted to know where all the places are in the United States at the zip code level where there’s low birth weight babies being born. Three states have that data, and they make it available publicly. Only one makes it available in an Excel spreadsheet. That’s California. Missouri and Oregon provide it, but they only give you pdf tables, so there’s an extra step.
This is the part of our assignment that talks about how accessible is the data. Do you have to go through three or four steps to actually employ its use? You begin to realize while it is available to people who are knowledgeable about where they go find it and the processes under which they have to use this data and acquire this data. From a public standpoint if you wanted low birth weight babies and you wanted to zero in and come up with targeted interventions, there’s only one state I can direct you to and that would be California because it’s publicly available.
The only restriction is they don’t publish any data for a zip code that has fewer than five births. What that demonstrated to me is that the data is there. The data is being collected and being paid for by the National Centers for Health Statistics. The question I want to ask as a committee member is why don’t we have 49 or 56 other geographical entities at that level creating that publicly available data like California has done it? I would say you could spend a lifetime looking at each one of these categorical areas we’re looking at. You could do it by agency, too. The drill down gets very tedious.
DR. CARR: This is excellent. I think what I would like to think about how we structure today, it will pack a lot into it, but as we go through these questions that we’ve kind of mulled over, we’ll formally configure them. We’ll make a letter. We’ll have a phone call in January to review the letter, and we’ll have it ready for the executive sub-committee and the committee in February. Go on.
MR. DAVENHALL: Just one last comment, there’s one more slide. I said if I have to suggest to the committee where one would start, this is my suggestion. I tried to think of data sets that you couldn’t argue that they’re needed more on the West Coast than the East Coast. These are what I call national data sets in which anybody who’s going to tackle what I would call the determinants of health issues and the “who” would want to have access to this information. It’s all about do we give our kids a good start?
What are people dying from, and Medicare admissions and Medicaid admissions in which we stand a good possibility that we could get the organizations who manage this data to agree that they could drop down a couple of grains of granularity so that people could actually benefit from this in local communities. Unfortunately, I tried hard and these are all the ones I could come up with. Everything beyond that I could think of is fraught with a lot of other issues, tangential issues, HIPAA issues, and the consistency of the data collection. In deaths and births we have a great system in place that has been made sure by virtue of the work of this committee over decades to make sure that data is accurate. The unfortunate part is it’s locked up and to the extent that you know where it is and can get it.
DR. COHEN: The question here is who owns those data and what are the regulations around release? NCHS does not have the authority– does not own the data and does not have the authority to release it. It resides in those 57 jurisdictions. Some of them have conflicting regulations that allow and don’t allow access at different levels of aggregation. For instance in Massachusetts we have a publicly available mortality file. You could write a letter to our registry and get the addresses of everyone who died in Massachusetts and zip codes, census track it, and use it at whatever level you want.
We’re one of the few open states with respect to mortality data. For birth data, if you wanted those data, you would have to put in a data request that would be reviewed. We could make it available. Some states, depending upon your need might not even make that level of data available. I think you’re right. If we’re talking about liberating data, what are strategies that will help promote data owners and increase their confidence that providing data is to their benefit and to the community’s benefit.
We haven’t really dealt with– the tumor registries again are all state-based tumor registries and have very different laws regulating those data release, and certainly not at the individual level or even at the aggregate level. I think you’re onto something. These are the fundamental questions. If we want to liberate the data, what are the impediments to liberating the data and what incentives can we provide to the data holders who aren’t necessarily federal agencies to provide greater access to that information?
DR. FRANCIS: Could I add to that there are two different kinds of state law frameworks just to know that are there. One state law framework is all about data so there will be health data authorities in different states and what applies to that, and then there will be state freedom of information act laws that again vary from state to state. So, you could, for example, find there are some states where there have been journalists seeking data, and there’s been litigation under a state freedom of information act.
There’s a well-known case in Illinois where a journalist investigating a cancer cluster wanted to get access to state tumor registry data and was turned down originally by the state because the “n” was fewer than five, and actually the journalist went to court and won. That was under the state freedom of information act. There’s a whole lot of state law out there, and it differs.
MR. DAVENHALL: I wasn’t suggesting that the federal government would have to solve this, but they could come up with some leadership about incentives, like for example I can’t imagine why anybody would oppose being able to target low birth weight babies because the cascading of this is if you get into it you discover it’s costing us a lot of money because we’re not able to get upstream fast enough to actually prevent and intervene–
DR. FRANCIS: What I was saying is that states may actually without legislative action not be able to respond to incentives, so it isn’t just a matter of some money. It may be a matter of state by state statutory change. It’s complicated.
DR. MAYS: I agree because when we did that thing on eliminating health disparities when we went through what we noted is that privacy and confidentiality laws varied so much by state it was very hard to come up with a specific. My question really was– because I think this is an excellent exercise– and the question is in terms of some other categories whether you want to have categories that allow you to see who we can use the data for.
One of the problems we have, and I can probably say California’s pretty good, but when we start to try and drill down we can’t often get gender and race at the same time. It gets to a point then where if it’s a query system it won’t pick it up or the data is not such that we can’t.
We might be looking at low birth weight data and the population we need it for the most we can’t quite get it enough. The question would be whether you want to talk about here are some model states, and here are some model things that can be done to really highlight those things that would help the community with the data but that isn’t available. That would be things about getting down to race, ethnicity, age, and language. Those are usually the things that we struggle to be able to get all together in order to help the community.
DR. CARR: Thank you, Vickie. Marjorie?
MS. GREENBERG: I just went upstairs because I felt that this– I am not sure where this workgroup charge came from. Was this a document you prepared? You just excerpted it, but this is the full charge and a propos of your previous question it specifically says that among other things the working group will also advise HHS on promoting and facilitating communications to the public about HHS data and will facilitate– all right, specifically the working group will assist and advise HHS through the full committee on recommendations to promote and expand access to and innovative uses and applications of HHS data to improve health and healthcare based upon the following considerations, which are these six, and then at the end it says the working group will provide its analysis and recommendations to the full committee.
DR. CARR: Larry?
DR. GREEN: I want to do three things. One is I want to say I really like this. The fact that it’s particular and stops talking about concepts and starts talking about particular data sets and it looks to me like it is dead on to the charge of the group, particularly number five, but it overlaps with some of the others. It looks like an opportunity to me, Justine, for the workgroup to flex some muscle here and do number five and prepare some sort of a report. I could just follow Bill’s outline here to some extent.
I just listened rather than reading for awhile there, and he says we ought to be doing a better job with what we got. We don’t even know what we’ve got and its nature and where it is and who has it. He’s dived into this, and you can see implications from the descriptive work that he’s done.
We’ve got four people on the committee that are steeped at the moment in what the committee is up to for the next stretch of the road. Here’s another example of convergent opportunity. If this workgroup were to say– it’s sort of a push and pull opportunity as I see it. This workgroup could push in front of the full committee given its 2014 agenda going forward some answers to the questions it was tasked to provide expert opinions about.
Paul, you came back this morning at least four times, what do communities need? When I look at Bill’s stuff, jeez, it really sharpens the thinking about what communities can do when you start seeing the gaps and see how other sources really might fit in. You guys have a lot of imagination about this. If you use this as a framework, just make up something you want to propose should be done to make the data that we do have more available to communities and go as far as you can.
If you think about focusing on this, this, and this– that last statement that Leslie just made and also what Bruce was saying, who owns these data? I’ve done two of three things I wanted to do at this point. The third one is I heard loud and clear that it’s no one’s job to make these data usable by a community. It is no one’s job to make these data usable by a community.
MS. BRADLEY: What about the CHNA, the hospitals– the community health needs assessments that hospitals have to do?
DR. GREEN: It’s nobody’s job to see that taxpayer-funded data from vital statistics and health surveys ever get used at the community level. That’s the thing I’m asserting. Bill said it three time at least, at least three times that if you have an expert data geek, an analyst that’s been struggling with this for the last 20 years, they’ll know the ins and outs of how you loop through this, and if you don’t have them and you’re a community organizer, you’re a mayor, you’re someone that’s trying to do the right thing by your community, you have no idea what to do about this. When you start approaching this, it’s too formidable. This is a really sweet spot, Justine.
DR. CARR: I don’t disagree. I think we’re onto something, and I appreciate Bill’s input.
DR. ROSENTHAL: When I look at this, I think this is great. It basically says it should be a patchwork of state regulation, fine. There should be greater granularity, agreed. This is a great demonstration of that. Back on 9/21/2012 when the committee first met, I actually pulled up seven recommendations, and we had a great debate about whether this group got to make recommendations to the committee. I said, if we do, here are seven things out of the gate I would instantly do. One of them was taxonomy, which this is an illustrative example of, taxonomy.
I was asking the Dartmouth question. If you care about connecting anything from CMS Star rates, and you have a contract, you have to use geography, specifically county, so not just greater specificity but a very specific type of geographic specificity. That falls under the category of if we have a bunch of data but no one can use it. Part of it is accessibility, people publishing things in pdf’s, which is just insane, your point that it’s nobody’s job, the current state reflects the incentives.
This is a market, HHS. No one has incentivized usage, so of course no one’s going to use it, and you’re going to publish things in pdf. That’s very basic, kind of straightforward. To actually tie into the geography and figure out what is useable, that’s really around taxonomy. The original kind of thing we were talking about. Anyway, there are seven recommendations from taxonomy, synthetic soft files, which CMS has done well. One was data browsers. Instead of just putting data out there if you’re actually interested in people without technical skills using it.
Remember the Google data browser and the tableau data browser, et cetera? Then also social data, we might not have a use case, and in fact that’s from our exploration. We’ve found that communities really aren’t using social media in any meaningful way outside of a little bit of PR here or there, or some basic profiling.
Nonetheless, we see it coming down the pipeline and as part of the charter to have some intelligibility around that. I would suggest I would be happy to kind of remove to the original seven things I proposed.
One of these would be taxonomy, which Bill has given a good example of. I’d extend it out to say, what’s the purpose of it; greater specificity for some very purposeful exercises that we went through about a year ago, and then also incorporating social media into that same taxonomy. You can call it a framework. You can call it whatever you want to, just a shared mental map around it.
DR. CARR: I like that. We have been learning and we haven’t been productive, and it’s time for us to be productive.
DR. ROSENTHAL: One of those seven, taxonomy and learning center, some business or public social good in all the various challenges that we’ve put money behind, synthetic files, which CMS has done a good job putting out a couple of puff files, data browsers, products and partnerships, which we’ll have to come out of this other thing we’re talking about, and then opt-in, call it Blue Button if you like. I want to share my data. It’s my data, can I do that? Will you allow me to share my data? Just exploring that concept, which people have played around with as well, those would be the quick seven.
DR. CARR: So I guess the question would be it sounds like to prepare for this, there’s work that we need to do to get best practices around the country in different states. Bruce, you know at least one, maybe others, and in California the same thing. Make a recommendation, give an example of someone who’s already done it would be one strategy. Josh, back to your presentation in September last year, have any of those things been addressed?
DR. ROSENTHAL: CMS has done a little bit around synthetic files. Otherwise this is an example of taxonomy. The social media thing we’ve cycled around. Lily last meeting brought up some specific examples of taxonomy. If you want to look at best practices for disseminating public data, I’d suggest you look outside healthcare. The second meeting of this workgroup, we had Jim Scanlon in here talking about when I say “taxonomy” I mean publish an ERD– entity relationship diagram, the data model.
There’s nothing around that. Then we had privacy swirling around as we wrestled with what an ERD is and what that means. The point is there’s been a series of recommendations. We’re sketching out along one or two of those additional lines, and Bill’s done very good work around that. I’d suggest we revisit that and see what we’d like to recommend.
DR. CARR: Maybe what we need to do is if we can get your slides from that September meeting, because you had a Powerpoint, and let’s see if we can get that from that meeting so we can pass it around. I think that the idea of saying this would be something good and here is a state who’s doing it, here’s an agency who’s doing it for this synthetic– I think for each of those things to have a letter be meaningful and potentially also “this is what might get better”, it doesn’t have to be everything, but it could be one thing to make it tangible as to why that would be important.
As we think about this, if we said– we have a lot of data. It could be better used if we had these things, and here are examples of people already doing it and here’s an example of the benefit, is that sufficient? Is there more that has to go with that? If we have examples that are already being done, I guess that’s CMS that has the synthetic files, and I’m assuming they’ve addressed all the concerns they have. Are we naive to say here are good things, or is the fact of the matter that it’s actually states’ rights that are interrupting this, and back to what Larry said, we need to have one person whose job it is to ensure the use of these data?
DR. FRANCIS: There are two separate questions. The first set of questions is about what kinds of policies apply to federally possessed data, and there is a series of kinds of policies that apply, including the privacy act. That’s one set of questions.
The second set of questions, however, is all the state law questions. The point that we were making is if you look up their birth and death registries, tumor registries, immunization registries, that’s all information that’s collected at the state level. On the other hand, Medicare is a federal data set, and so you’re going to have– it’s just– the only point, I’m not making any kind of judgment about it, it’s simply that this committee isn’t going to be able to, other than saying it’s really too bad this is happening at the state level, this committee can’t change the states or make recommendations to the states in the same way it could to HHS.
Some of what I think is– just let me give you an example about taxonomy. Depending on what information is included in the categorizations when you release data, there are privacy act questions as I understand it. This committee isn’t going to be able to change the privacy act, although it could comment that it thinks that there are concerns here through NCVHS.
DR. MAYS: One of the things that the committee can think about is the notion that data gets collected out of state. The data is often collected with lots of different fields. It gets sent to the feds, and then it gets sent back for use and sometimes it’s condensed. The issue isn’t necessarily always that it’s a privacy issue. Some of this is that it’s actually a statistical methodology issue, and that hasn’t been developed well enough.
Models of imputation, for example, that say I can give you all this data– imputation model means that I take the data, use people but I disguise them by using a way in which that disguise allows me to give you the data back versus my not giving you the data at all because I’m afraid that what I’m going to do is reveal somebody’s identity. There are models, and the problem is that NCHS, for example, had resources a long time ago to do these kinds of statistics and to train people. That dropped out. We don’t really have enough of the science that really says how to release the data back and give us the maximal use. What we do is we have to worry more about privacy, because we don’t have a guidance about things like using imputation.
DR. ROSENTHAL: I have just sent you guys the pdf, but in a nutshell, the original set of recommendations were within the scope of HHS, and even basic taxonomy to say, hey, is it at county level? What Bill did, he did by hand, hunting and pecking and probably using an old calculator. That’s not on the website. You can’t find that.
If I wanted to say show me birth weight. Show me something by gender, race, what have you, I have to reconstruct that through some shoddy imputation because there’s no metadata around it. I have to look at a census set, reconstruct it, et cetera.
That sort of stuff is just with current HHS assets, to say nothing of basic taxonomy. You’ll see in examples, like what I mean with an ERD, you’re looking at a couple of example files. If I see a wacky Excel file, what does this thing mean here? What is that, et cetera? Basic master definition of data elements like you’d find in any other space outside healthcare. That is something that there are resources within HHS, which are dedicated for that infrastructure by the way.
Then you can’t change legislation, but if you actually make that apparent and make it searchable and have a lot of users using that just like they do outside of healthcare, that becomes a pretty good blueprint. How about if HHS were actually the blueprint for the states or communities?
MR. DAVENHALL: I’ve never been accused of overthinking anything. The whole question of ownership of the data is a big gray area. I’m not an attorney, but I would say to you the states don’t really own birth data. I’ll tell you who owns it in America today, it’s the hospitals because 99.9 percent of all births occur in a hospital environment.
Hospitals already have this data. They have every piece of data that got sent to the state. The state has decided they’re going to slow down the machinery to send it to the national center who has told me their job isn’t to see the data gets used for any practical purposes except for statistical purposes required by law.
They’re not really that keen on using that data to do early intervention. There are some flaws in this model we’ve built.
What would be easier is what is already happening in this country where companies get paid to aggregate. Like all these health systems grow up, and they get 10 hospitals, 25 hospitals, 50 hospitals, 100 hospitals, and they all go to Rand, or some big research organization, and they will contribute all their data. They’ll contribute all the birth records of all the babies that were born in their facilities. What has happened here is they have duplicated something that already existed. The only reason to duplicate it was the system didn’t provide it any other way.
There are many roots to this. The association of this thing where we now can get a prescription in any state. You can show up and the states recognize the prescription — are you aware of that? That happens, but the federal government said they couldn’t solve that problem because it was the state’s right. All the states had separate boards to do that. I forget the name of the organization that got put together. It was a non-profit organization that went around the states.
It only took them seven years to do it. They put their own teams of attorneys and met with state legislators state by state by state until they got to the point where they could get every state to agree to a model set of laws that will allow the transportation of a prescription from one state to another state so you could pick up a prescription in Florida written by a California physician in the system. I would say there are many work-arounds to this problem, and I would say the federal government doesn’t have to solve it.
They either need to stand back in the Senate and say we expect states to do this. NAPHSIS should be interested in birth registration and helping– part of the problem is no one has told America what’s the most significant problem we all ought to be working on. By the way, here’s the data we all think in our best judgment you need to have at your disposal to work on that problem. If obesity is our biggest problem, we don’t have much obesity data below the county level.
We don’t have any at zip code level or block group level, that sort of thing. Part of it is the federal government could do some really useful thing like pointing to us and saying this is what we believe is America’s salvation in the healthcare field, and if you get busy, folks out in your communities working on this, we’ve got some things that will help and assist you.
DR. CARR: Another thing we can do is link it to the HHS strategy and blueprint, because some of this does align. If you want to do that, you will need this information. Here’s where it is and hope we’ll be able to get it.
DR. COHEN: I agree with you that the federal government can have a really important role in this. It’s never really defined this as its role, to focus on the uses of the data, particularly at the sub-state level or even at the state level for that matter, or the level of community, which is where we want to push the use of these data, but there are models we can suggest. CDC has the WONDER system, which is a web-based data query system that can generate mortality data by cause, by age, race, sex, county over time.
If the federal government developed these kinds of systems or promoted the ubiquitous development of these systems by states, by creating essentially web-based data query templates, states could go a long way in filling out these templates at very low levels of detail to provide to communities. States would figure out how to deal with their data release and data suppression issues on a state by state basis, where the feds can essentially create master web-based data query systems.
It’s all really doable, but you’re right. Nobody has actually said this is my responsibility. This is my role. I think there’s nothing more that the vital statistics community would be really happy to participate, not as an impediment but as a way to figure out how to more easily share and use our data because we collect it to be used. There’s nothing worse than not having the resources available to use our data.
DR. ROSENTHAL: We looked at an example of that in New York, with not just a query but a visualization of private companies, Google and some others. Some stuff was getting out there even unbeknownst to us.
DR. COHEN: The Federal government actually under the assessment initiative at CDC for awhile did have money, and it funded and promoted the development of web-based query systems. Some were geographically based with GIS interfaces. Some of them were tabular, but essentially the whole idea was to generate data at the community level for use.
In Massachusetts our web-based data query system has 39 different data systems in it. Some of them have the four basics, births, deaths, cancer registry, and behavioral risk factor surveillance data. We have census data, employment data, education data, graduation data, and transportation data. There’s no end to what the federal government could do to promote the development of these systems for use, simple use and access at the community level. I think we have a very strategic role.
MR. CROWLEY: I think a lot of what Bruce said even before that I’m going to echo now. Seriously, I think that one of the comments that really struck me today and has been resonating is this concept around usability. How do we make the data usable by these different groups? There are some known things we can do such as the metadata, the entity diagram that Josh was talking about.
In the data development world these are basic platforms, precursors to being able to do anything. The absence of having a well-defined structure for that across HHS right now, that would be a logical place to start investing some resources. There’s already some of this in place. Healthdata.gov has metadata associated with most of their files. How that metadata is structured and how usable it is for people of different roles in the community or academia or other science driven areas I don’t think we’re completely sure of, but we want to make it more usable, so I think that could be a real focus for us.
DR. CARR: Just clarifying, what you were saying is, what is the platform again?
MR. CROWLEY: There’s a platform for using data, whether it’s making apps or doing research around it, it’s having an understanding of the data whether it’s the metadata that’s associated with it, the entity relationship diagram of the data that’s included in whatever data set you may need to use to create new things.
DR. TANG: I’m following up on something Bill mentioned about prescriptions that can cross state areas. Were you aware of this? One of our challenges is of course state laws and the state data sets. I don’t know whether we can learn something about this, but I wasn’t aware of this at all.
MR. DAVENHALL: I will get you the information and lead you to the association that crafted that. What it was, it has to do with whether your prescription can be filled in another state.
DR. TANG: Right now, you can fill it at the discretion of the pharmacist. I’m not aware of something that basically makes it a requirement that some other state pharmacist fill it.
DR. CARR: I think we’ll get the details on that. I think your point is the government didn’t have to do it. They had a group that went out and did it, and we’ll hear it specifically —
DR. TANG: If that’s possible– I would like to know how that’s done, how that’s possible, and whether it’s legal. Maybe it can apply here, but that’s the first I’ve ever heard of something like that.
DR. GREEN: I want to ask Kenyon a question, or actually any of you. What are your ideas about the structure of pre-usability at the community level? Bill’s argument is that we’ve got data that we aren’t using that could be used, and Kenyon said there’s no structure for helping those things to be used, and I suspect your imagination has ideas about what that might look like. What would a structure that would help communities use these data look like?
MR. CROWLEY: I think some of these ideas we’ve talked about before but that bear repeating is the best person to understand “usable” is the person who’s using the data. If you want to make something usable or judge it’s usability, get feedback from those that are using the data. I think it’s one of these recommendations around this learning center, this learning system.
As data is coming into the pipeline and being used, having a mechanism to let people a) share their experiences both good and bad with the data so you know which pieces are good and bad, b) so you can share what’s working and what’s not working with the data, c) also understand what additional needs surround that data– one idea that came up earlier that I think would be interesting even looking at the query request that came into healthdata.gov for types of data and for how to use certain types of data.
I’ve sent several different requests into healthdata.gov over the years saying it would be good if I had this data or if we used this or this might be available. I think the feedback loop there could probably use some work.
DR. ROSENTHAL: I’ll give you seven specifics with documentation for all of them. Specifically when we say taxonomy, and in this slide deck from last year there are some specific examples around that. There are also follow-up presentations for each of these. What do I mean by “data browser”? How do we see New York’s data being used? How do we see HHS’s data being used without HHS expenditure to build the system, even visualized? Just to give you a high level, one would be taxonomy, publishing the ERD diagrams. That’s, as Kenyon was saying, what you find anywhere else outside of healthcare.
It’s the first thing you look for and the first thing you see– there’s additional presentations I’ve distributed, and I can go through it again. Here’s a very high level one. I open up a file from CMS, and what is this stuff? What is plan type description? What is that thing? There’s no metadata for it anywhere on the site. What might that be? Legal entity name, what’s that thing? How are contract or legal entity– how do they relate to each other? What are the different data elements– what’s an MAPD indicator, and is it an attribute of a plan or an attribute of a contract, an attribute of both?
I can give you boatloads of examples on any of the behavioral risk factors surveillance stuff on it. This is not a technical diagram. This is just an illustrative example for non-technical people, but there’s this thing called an ERD diagram, and when you go somewhere, bam, you see it. It shows you what all the data is. It defines it, and it also shows you the relationships. In this, you can see that parent org has a child org, a contract belongs to an org, a plan belongs to a contract.
These are all geographic units that make no sense without counties, so if we’re going to start talking geographic taxonomy as one domain, another is accessibility. Is it pdf? Is it csv? Is it an API, et cetera? Is it machine-learning? The other things I found in this flat file that have no definition I’m able to restructure and say contract number is an attribute of contract, which belongs to org. Plan type belongs to plan. Plan idea belongs to contract. I can start to make sense of what is this thing when I open it up.
We’ve heard the problem described very well. Unless I have a boatload of expertise and have done this for 30 years, I don’t know how to do it. The concrete way of fixing that, and one example, is to publish the relationship. The other is to publish the metadata. As Kenyon is describing, we went through a separate presentation where we said how do we use user feedback to determine what is a good set versus what is not a good set? There are things around frequency, file-type dissemination– is it pdf, is it something else? Learning center was the formal manifestation.
There’s a document that reflects that recommendation in detail, biz or public good, social, around the challenges we do. Right now, HHS puts a lot of dollars around here’s some cool data. Make something cool, and maybe it’s used, maybe it’s not. One of the recommendations is actually have a problem-solving aspect tied onto that. We did an environmental scan, and there’s not a lot being done. In fact, we don’t even have a shared mental model.
Chicago is doing really cool stuff. They’re using Twitter as a phone tree. They’re not using Twitter as data. How do we even have a meaningful discussion around that, and how can you share with Chicago or other communities that do that? Synthetic files, we talked a lot about privacy. Here’s a nifty professor that we brought in some of her testimony with some of the JPL guys who talked about synthetic file.
We heard people on the committee say it was technically not possible. CMS published it this year, presented on Academy Health with it. That was a reasonable success, et cetera. This is all available for you. You’ve had it since last year. You can look at it in detail. There’s additional files with all of that. Data browsers, so rather than putting the data out and having Lily try to code or me try to code, there’s a boatload of users. We looked at examples of a teenage kid who was looking at co-morbidity for diabetes using HHS data sets that we didn’t even know were out there, put it out there. Partnerships, you don’t have to build it yourself with HHS dollars. You can partner.
It’s already being used by the way. That taxonomy that we’re saying HHS should develop internally in a meaningful way is actually being used by private companies outside of it. That’s how they’re able to connect census and World Bank data and life expectancy throughout the world. They’re doing that work for ourselves, so we should be not only aware of it but encouraging and encourage the states.
Finally opt-in, that was that crazy little thing, not Blue Button but a green button alternative. This is obviously heterodoxy at best, but to say I have a family member who has cancer, I would like to be able to contribute my data, am I able to do that, where I waive my rights for this, that, or the other, and to build up a data set with additional rights for research for public usage? Those were the initial seven. After a year of talking around, I’d also say some sort of– instead of just reacting to the data coming in, some sort of proactive position or framework around social media data, and privacy as well as imputation characteristics and other things would be the eight recommendation.
DR. CARR: So I have Lily, Kenyon, and now Mo is on the line as well.
MS. BRADLEY: A question I’d have for the working group members is understanding the black box. I think sometimes we feel this way about health economics. The technology aspect of this seems to be a black box where say you– let’s say that someone like me knows these adjustments need to be made, and then I carry it to someone who might seem to be in charge of a website but then it enters a black box. Where is this coder? Where is this developer? How do I submit requests in a real way?
The GAO just came out with a report last week about the abilities of CIOs across the federal government to have influence over information policy. I do think there is some question. I’m not even exactly sure what I’m asking, but if you make a lot of these recommendations it’s not even clear to me how they end up being carried out and if there were lessons learned in the private sector or in other institutions that would be quite useful.
DR. CARR: Just to clarify, are you saying that for this data– are you saying to have a front end on the data that makes it easy to use who’s the person that maps the data to get it into that easy to use format?
MS. BRADLEY: If you want make changes to a website, who do you have to talk to?
DR. CARR: If you want to make changes to a website, is that to the data?
MS. BRADLEY: Right, like building the metadata.
DR. ROSENTHAL: Are you asking is it the website person’s responsibility or the data owner’s responsibility to have the metadata to find the relationship? Typically, it’s the data owner and then they’ll work in conjunction with the website owner to publish that.
MS. BRADLEY: It is more like the work process.
DR. GREEN: Isn’t this distinction you guys are making the difference between a data custodian and who can make changes on the website and the customized browsers? It’s something else.
DR. ROSENTHAL: The data owner– and we had people in here, and this is when CMS presented their data and CDS, and we said what do you guys think about publishing the ERD? Some people said they might not have them, and then when we asked the data owners– just so you know, outside healthcare you don’t do anything before you do an ERD. That’s a blueprint to even have a conversation, and they had them. They’re laying around, and the question is: is there a reason they couldn’t be public? That would typically be the data custodian.
To your point earlier, typically you’d share that information because you’d want people to use the data, but if you don’t have that incentive, that will tend not to happen. But if we’re trying to figure out what’s a very basic recommendation that could help people use the data in a meaningful way, it’s sharing that thing that’s sitting on their desktop.
DR. MAYS: I was going to suggest that what we think about is– and I’m not sure– but is this in the bucket for standards to talk about as the data is released that this is a standard thing we want to see and here’s the format for it to be released in? It could even help if for example, and this is one of Larry’s favorites, even if you think about NIH and all the data sets we’re getting ready to release, it could cause us to have to have this ERD for every kind of data set that’s federally funded as the way to do it. I wouldn’t even say HHS. I’d say federally funded because then you get even more data.
DR. ROSENTHAL: Typically, when you say “machine reading”, this stuff goes in it. It just didn’t happen for some reason, but typically “machine reading” is code for having these pieces in there as well.
DR. CARR: I was going to make the observation, Jim Scanlon distributed the HHS strategic plan draft for public and congressional consultation yesterday to look at some of the things, goal one strength in health care, and emphasize primary and preventive care to your point, you have to know how many low birth weights you have, vulnerable populations, and then advanced scientific knowledge and innovation, foster and apply innovative solutions to health, public health, increase our understanding of what works in public health and human service practices, promote the safety and well-being and healthy development of children and youth, low birth weight babies, and then promote economic and social well-being for individuals, families, and communities. I would say this sits very nicely as infrastructure to very strategic initiatives.
To recap, we revisited– thank you Bill, Josh, and everyone for getting us back on there, and Marjorie for putting our charge clearly in front of us– we’ve had a lot of discussion, a lot of development, a lot of detail, and I think what we can do is Lily and I will coordinate with input from all volunteers to put together a letter that succinctly identifies opportunities and examples where they’ve already been done. I think that’s compelling, to make it clear we are not at the starting line.
We’re halfway there. We’ll get that draft together and potentially with attachments because Josh has a lot of nice illustrative documents and circulate that to the group. That’s what I’m hearing the group say, and I want to know is there anybody who thinks differently? Is there something we’ve not yet covered that we ought to be thinking about?
MR. CROWLEY: One point on something to think about covering and a quick anecdote that I think illustrates some of the things that we’re trying to accomplish, Chicago has come up a couple of times just this week in dealing with their health department. They were looking at ways to develop a predictive model to help improve food security and food safety in their community. I sent a note to a data mining class, I’ve also had two students who would love to do that for their class project, go to the Chicago open data portal, pull down a data set, looking at some other data it had linked through some state databases and some other databases.
I think I now have a group of students working on a predictive model for community food security and safety. Now that same kind of process could be repeated all over the country thousands of times, but one of the things we may want to think about, and this harkens to large number four, looking at opportunities to improve access to data. How are we going to engage with all the student groups and universities with health programs, whether that’s public health or health services to get them on board that– them and their professors– this data is available, it’s for use, and how is it being used by these groups?
I think a lot of innovation and people who have these types of skills are being developed right now throughout universities, so maybe a specific university outreach strategy or providing certain tools or other resources for them may be something we want to think about pursuing.
DR. MAYS: I think I sent an e-mail about the Coders for America because they actually can assign interns. The question would be whether or not you can get them to work with HHS. They’re doing– it’s pretty much the same kind of job that they’re doing, and they’re very good at it. You could have this consistently because the interns are for two years.
DR. CARR: So it sounds like what we’re hearing are gaps, gaps and examples of the gaps within the framework of the importance and then solutions, which would be practices already in place or tapping resources for problem solving.
DR. COHEN: So I think that is a great idea. What would help, speaking from a state health department perspective, and state health departments control, or are the locus of a lot of these data, if we can create a data set and a challenge and have app developers or designers or students develop web based data query systems, that essentially can be adapted by templates that all state health departments could essentially use and drop their data into.
We will have made an incredible impact because states don’t have the resources to do this, first not to develop or secondly not to maintain, but if those templates were there, they certainly could be used.
DR. ROSENTHAL: Just as a quick example of this, one of the things we talked about way back when last year was this idea of the learning center, and different components into it. We talked about state threads and some of these issues came up for even university or intern threads, and a couple things to keep in mind. One, we’re right now at in the infancy of using data in healthcare. I don’t know how else to describe it.
We have this idea that people have to use the data, and then we’ll have people code, and then somehow they’ll build something and it historically hasn’t turned into anything, but maybe it will this time. That’s possible, but in turns of a learning center for how to connect people doing things to something meaningful there are various tried and true practices around that from outside healthcare, and then also we don’t actually have to make apps. You can actually drop– you can actually put some of these files if you want in UDS, ship them off to Google– they’ve been doing that, so one thread for community.
This might be state-level community. Hey New York, how’s your stuff showing up in Google Public Data Explorer and five other places. You don’t even have to incentivize kids to build stuff. You just put it in a format where anyone under the sun can take it, kids, et cetera, and to your point Bruce, you know that by states sharing this information inside this learning center. So learning in a technical sense, this is metadata, this is ERD, this is how we’re using things to solve public, social, even business market, accelerating dynamic problems, and then here’s how we disseminate data and make it more useful, not just putting it out, but the great missing piece to date in healthcare more than anything else is you’re requiring people in these challenges to code.
That just doesn’t happen anymore. You can put it in pre-existing systems and let them do analysis. That read-write web challenge we talked about was analyzing and interpreting information rather than coding something, which drastically lowers the bar and gets back to the workforce discussion I’ve had several times. That’s par for the course in peer-play tech.
DR. COHEN: I just want to respond to that. The state health departments are very old-school and slow to change. I’m being generous here. They want to control their data. So putting their data out for others to use is a lot scarier than if somebody says here’s a cookbook, drop your data into this and it will be available to everybody. That’s just my experience. I hope things are changing.
DR. ROSENTHAL: That’s why I said maybe HHS could lead. If we did this with HHS, HHS is having the same question right now. We sponsor a bunch of challenges and somebody codes them up, maybe it works, maybe it doesn’t, but when we’re all focusing on this over here, all of the sudden it’s being used in very interesting meaningful ways, these pre-existing systems, so maybe we should encourage that.
Then the states can adopt and see if a national body is doing it, et cetera– but just in answer to all the stuff, we’ve cycled through this. It’d be good to pull it together and make some sharp recommendations, especially with the social media part. That was such a great surprise to your idea. I expected things to be different than they were, so adding that would be great.
MR. DAVENHALL: I would like to comment that it really raises the stakes for this committee’s work and I would say the National Committee of Vital and Health Statistics. We’re not an island. I’ve been starting to write down all of the groups I know that are doing about the same things we’re sitting around talking about. The first thing I’d have to say is everybody stop writing and start reading.
For example, I know we have 2020 Healthy People. I very rarely hear anybody talk about how we enable that plan. I don’t mean hear but in all the readings. We have our own plan. We have a roadmap. CMS has developed their own roadmap data plan, which a lot of this learning center thing is creeping in, so we don’t have to invent it. It’s cooking out there now. There’s going to be a major solicitation by CMS over the next month that’s going to require much more analysis of data, and they’re talking about unheard of things. They want turn-arounds of a year for these projects and the proposed innovations and so forth, nothing like I’ve ever seen before.
The Health Data Consortium is building their own data roadmap, and they even have this thing called a Data Bill of Rights that they’ve been talking about. I’m just saying part of me wants to say that we need to have Susan Kanaan help us take some of this material that we already have, and help us synthesize it because we just have this tendency to want to go off on our own directions and so much of it is duplicative.
Now, Code for America is deeply involved in the Health Data Consortium, because I was with them the other day, and I know they’re on board and present with HHS, but I don’t know what that means other than that.
I know they’re all over it. It seems like it’d be easy for us to plug into that. As a company, we’re in search of anybody that wants to build free templates to do something, but you’d be surprised how hard it is to get organizations to sit down long enough and tell you what they really need and want. It takes a resource commitment on the part of FEs or groups. It’s like a lot of these resources are available to us, and where the federal leadership has to come in is help knock down some barriers when they can and then also say there’s a lot of material here. How does this all dovetail?
DR. GREEN: Two questions, one particularly for you Bill. This chart here, the one about critical data sets and zip code granularity. That bottom line for the sources for et al, names are too many to list, could you just say some things about your thinking about all those other, too many to list, data sets and how they fit here?
It was labeled critical data sets at zip code granularity required for community health assessments. You had states, CMS, states, and then you got to the federal data sets, or the HHS data sets such as US citizens et al. There are hundreds of those right? What’s your thinking about how they fit into this conversation?
MR. DAVENHALL: These are other data sets that are often co-mingled with the basic health data. Many of them provide the denominator information that’s needed to calculate use rates and rates and that sort of thing, so a lot of that data has to be pulled in by any analyst who’s going to touch any of the specific data. They also have their camps by the way, and their roadmaps, and their planning going on about their own data sets, but it’s voluminous. Some of these agencies have had much more experience in generating what I’ve called public use files of public accessible data, but the resources do help.
DR. ROSENTHAL: I think synthesizing a lot of the stuff that’s been done is fantastic and is what we should be doing, but also providing a greater degree of specificity, especially around some of the HHS sets, and so here’s what we want to do, but in order to do that for taxonomy, that at least would mean county, because you’re going to want to crosswalk it over to different things, so when we talk about taxonomy, county if possible, if there’s not a mosaic effect for privacy reasons, if not clearly label it, et cetera.
My only point in saying this is that I think it dovetails very nicely with the aspirational stuff, but there are some very specific things I think we can apply to it. Contract doesn’t do us anything, sub-region doesn’t do anything. If you want to walk to provider depending on population, it’s going to have to be county. If you don’t want to do that, that’s fine, for privacy reasons, but at least know this is what you’re trading off and this is what you’re giving up. We went through some of this last year as well, very specifically around how to label that, and how to have the community reinforce it.
DR. GREEN: Can I ask one more of anyone on the work group here? What is your thinking about the need for, value of a parsimonious set of measures of things such as healthcare, health, cost of care, to guide the creation of a data platform for usability of public data?
DR. COHEN: There has been an enormous amount of work already done on that, and there are probably a dozen or two dozen parsimonious sets of data that people have promoted. I’m sorry Pat Remington is not here. His county health rankings–
DR. GREEN: Do they matter here in this discussion, or is that just a side issue, or is that a driver? Is that a pull thing? Is that a focal point? Does it have any use or do you think it’s irrelevant?
DR. COHEN: I think it makes the theoretical exercise practical and usable, so I would say, yes. It’s less so trying to find the ultimate best set of parsimonious data, but it should cover a variety of domains that we’re interested in. Essentially, if you go on the county health rankings, you’ll see all of these data at the county level already. They exist and they’re used. That’s the issue for me. It’s not at the county level, but if we really want to influence communities, it’s really sub-county level data that we’re after. That’s where the rub is. The feds, as Bill clearly demonstrated, have never focused on that issue and states only focus on sub-county data for their states, so there’s never been a generalized effort to be in that space.
DR. CARR: I think this has been a great recap of the work of the year, and the work now is to get it into a draft and get some work on it. I think what we’ll do is take a quick break, there are some snacks here, and then come back to talk more about the framework that had been circulated.
DR. FRANCIS: Can I interrupt because I’m going to have to leave at quarter to three? I apologize for the formatting, but basically what you’ve got, and I sent to Justine in a better formatted range, what the actual privacy policies are of the social media sites that you’re interested in. They vary.
I guess there are two sets of things to make sure you think about. One set of questions is what information does the site allow people to control? So, Twitter, Google, Facebook, they all have different– and you may find, for example, that a lot of people are behaving in ways in which they’re willing to share certain things with Google plus or certain things on Facebook but not more generally. So if I were to search you on Facebook and I’m not one of your friends, I couldn’t get information about you.
That of course limits what people can get searching from outside off of social media sites and might easily bias the data that you would get. The second thing is what kind of information could be gotten by going directly to the social media site under their privacy policies without any kind of user consent? It’s relevant to know that they say some things about public safety and law enforcement, but they typically don’t say anything about health.
If I’m Homeland Security, you’ve seen the discussions about whether or not envelope information or lists of friends of friends and so on– there was a release, for example, of tweets and who had been tweeted and so on by the guy in Norway and of Twitter activity and so on to researchers to look to see what might be predictive of national security events, but there isn’t anything that says that without user’s consent Google would let the health department know identifiable data about who’d been searching for what or that Facebook would let somebody know who’d been talking about cancers with friends on Facebook or anything like that.
That’s just– that’s one set. Those are two sets of questions. What can users reveal as they control, and then the other is what would the site release under its published privacy policies? The final thing to be aware of is that when the estate or the feds release information that they’re releasing as a public use data set in such a way that no one thought could be identified, the understanding in which those are typically released is that you won’t use it then in ways that re-identify.
There may be constraints on the data that come out that restrict what could be done even with what’s available on Facebook. Those are just the points to know about the social networks.
DR. CARR: I’d like to actually, since you’re here until three and Paul– let’s take 15 minutes to hear from you guys before we break because this is important. To frame this, remember back in the summer we talked about creating a solve-a-thon about how HHS data could be enhanced by social media data of various types.
One of the things we said was to say what is the privacy policy? We found that we didn’t have takers for the solve-a-thon because there’s more to it than that, and where we have evolved to, again a little bit interrupted by the furlough, but through some work back and forth would be to rather identify what has been done and in what way. This is the beginning of that conversation. Is it possible Josh or Kenyon or Mo to get the data– maybe I’m not remembering correctly, but one of the new terms I learned since I met you is scraping data.
DR. ROSENTHAL: This gets more technically– two things I would say, number one, there’s this other thing, which most people don’t think about when they think about social media, that’s de-identified by nature, but you’re reliant on a company to actually do the algorithmic integrity and predictive modeling.
So a good example of that would be Google consumer survey. I would go in and say, are you a Medicare member, do you have a heart condition, and get a representative sample back regardless of channel using Google’s algorithms, which can be pretty good, under no identification whatsoever.
The other thing is that’s kind of been the social sites saying for government we’ll cut you a set or whatever and that will fall under laws and what have you, but you can actually build a scraper really easily, and that is a little spider that runs along and sucks stuff off into your own database as you go along. It’s a different way of data acquisition, and a lot of that is done pretty frequently. There’s how are you acquiring the data, and they have different policies regarding different scraping activities.
It gets more technically complicated pretty quickly, but there is that other category if you wanted to use social media that was not identified by Nature in relying on them to do the statistical work, there’s a new category of that, which is coming on pretty strong as well. An example would be Google consumer survey. That’s a pretty good one.
They basically have templates of market feedback instantly where you’re using their algorithms. They maintain your personal profile, and they have all the relevant information, and they’re using their sampling techniques on that. That could be by image or what have you, but they have it on a– they have qualifying questions, they have scale questions and all sorts of interesting things.
DR. COHEN: So if I work for the community, and I want to find out what’s more important, heart disease or teen suicide I could just employ this and everybody in my community– you tell people to go to–
DR. ROSENTHAL: Actually you don’t have to tell them anything. They just do sampling by the users without you having to do any PR around it. It comes back as an Excel sheet, or it comes back in-site. It will say women who are making $30-45,000 think teen suicide is much more important. They’ll basically do the statistical anomalies–
DR. COHEN: If I wanted it just for folks in my community, they would have zip code ranges for–
DR. ROSENTHAL: The public right now– it used to be state. Now it’s down to– they’re getting it down to county as of this month. It’s going down progressively. Depending on who you are, you get different access to grain. They have that, though. If you want them to–
DR. COHEN: Say I work for Montgomery County, whether people are more interested in teen suicide or heart disease?
DR. ROSENTHAL: I run some of the behavioral risk stuff just using this nationally, and it’s wacky what you see pop out of it longitudinally month over month. You’re using their p-values and their r-values. It’s like $100, so it’s just pennies. The point is it’s not member level type stuff.
DR. FRANCIS: If it were released at a level of granularity that either alone or in combination with other data sets allowed essentially identifiable information about individuals to be figured out, it would be a clear violation of privacy policies that are up there and would actually subject any of these entities to federal trade commission enforcement, which have been taken against Facebook.
DR. ROSENTHAL: That is my point. It’s worth noting that as they do this, yes, you have a new industry. You’re going to have this stuff, people working it out, but my only point is here’s another class of something, which you may not be aware of where they’re actually using– if they were violating the law, they’d be violating the law, absolutely, but they actually have baked into their offering not to do that. There’s a question of are they successful or not, but it’s just worth noting you’re seeing the next generation of this right before your eyes.
DR. TANG: I had a follow up to Larry’s question about the parsimonious set and Bruce’s answer– there’s so many to choose from, but those sets are rates of set measures, right? Rates of obesity and rates of clinicians?
DR. COHEN: A lot of them include social determinants and community resources as well. Some of them look at the density of food banks as well as other measures.
DR. TANG: It is sort of like a rates of– the way I would extend a large question is that all we’re interested– the answer could be that’s all we’re interested in right now, are there other questions that we’re trying to get at? I would say yes, and the question is this just a timing or a scope?
I think what I heard about the scope is it’s probably scope. We’re looking at what we have now and making the best of it versus what we would like to know about our communities to engage them.
DR. COHEN: Paul, I have got to thank you for keeping our eyes looking forward rather than where we are right now. The kind of information that intrigued me about social media data has to do more with community values and community perceptions, which is the filter that you need to take all of this other data and use that lens for communities to make the decisions that are important to them. Those are the data that communities haven’t figured out how to collect. We haven’t provided any guidance in any kind of meaningful way that you really need as part of the equation to improve the quality of community life.
DR. TANG: One of the questions, just to make one up, the aging of the world is one of those things, right, and probably one of the top concerns is transportation, and I don’t think we have that in any of our data sets public or otherwise, and it’s intriguing. It’s possible we could get it from social media, but how do we go tackle the ones that are going to be the emerging things that people don’t even know– people who are already seniors know this.
The country doesn’t know this because most of us aren’t there yet, but so many of us will be very soon. That’s one of those things we’d like to get ahead of. It’s sort of a scope, and I don’t know whether it’s a timing scope or just a scope. I’m going to ask my question, and I think the answer I got from this afternoon is its scope.
MR. DAVENHALL: People are defining community differently now. Forget geography. PatientsLikeMe takes patients with ALS and developed a global community of specific groups of people. Sort of like it screws up the data if you’ve collected it geographically now, but the social media people are going to be exploiting, in a positive way, data like that. They already have 400,000 people in that database, Leslie. They’ve all opted out.
DR. FRANCIS: That set, you’ve actually got a copy of their privacy policies. It’s 400,000 people out of how many million in the United States.
MR. DAVENHALL: It is a large number of the people with ALS.
DR. TANG: It still goes back to my scope question. We talked about this earlier, Bill, and we do think there is a new community that may be more relevant and more engaging. That’s not going to be in our data sets as they currently exist either. That’s why I’m raising the scope question. Is the scope limiting us or only limiting us in time?
DR. CARR: That’s a great questions. When you say– which scope? What do you mean when you say “scope”?
DR. TANG: What I am hearing about the scope is take the existing data set and see what you can do with it, and how do you make it accessible and what you can do with it. That’s pretty limiting and if that’s– it could be a thing we do today, but if it’s not done I think we have to spend less of our time discussing how to figure ideas out of the current data set and figure out what questions to answer and what data is required and how to get at those data.
DR. COHEN: One possibility is essentially take one community defined by geography and another community defined by affinity, and look at what a parsimonious data set would look like for each of those and then see what data are available and what the gaps are, for example.
DR. CARR: But I think what we’re hearing is the definition of community. We can define it by geography, zip code, county, or state, but social media can define it by self-designated members of a community.
DR. ROSENTHAL: I don’t think it is limited in scope. I think we’re trying to do both. There are two separate questions. The first is we have a bunch of stuff, and it’s not in order. It’s not usable. It’s not structured. If you try to enter the new stuff before you get in that, the complexity explodes. You have geography and self-designation and affinity.
Step one is get the stuff you currently have in order. If there are things like granularity, like sub-county data, to make it more applicable, amen. There’s this other stuff coming on that I think you have to get the structure and the plan down for the current stuff before you can expand on that stuff.
DR. TANG: But you’re ignoring the other big stuff that’s already a lot structured, which is the clinical data. I don’t know that we have to go from ancient to unexplored in one step and throw away all the–
DR. ROSENTHAL: I am saying do you have a taxonomy for the– I’m not talking SNOMED– not this kind of a taxonomy, not SNOMED.
DR. TANG: The trouble is, even if you did all this, it’s a finite problem to do this for administrative data. It is close to intractable. It sort of feels like it’s close to intractable– to do this for clinical. It isn’t actually about knowing what each of those fields means. It’s unfortunately knowing how they were collected in each entity, by each person.
That’s where we’re getting wrapped around in terms of this so called interoperability in health information, but that’s where a lot of very high leverage data exists about health and healthcare. Yet, there’s more. It just feels funny, but it actually is written in this– well, only or mostly to deal with the existing data sets–
DR. CARR: I think it is “both/and”. It’s not either/or.
DR. TANG: I would suggest we portion our time more towards some of the high leverage stuff and spend less time– not that it isn’t a problem– but less time on figuring out what’s available in the administrative data sets.
DR. CARR: I think we’ve concluded the discussion. We’ve had this discussion before. We revisited now. We’ll put it together. It will be valuable, and it will enhance what’s useable. I think our main focus for today is what’s next? What else?
DR. TANG: The only question I’m concerned about what next is why leap over to the unexplored and hard to characterize, i.e. social media?
DR. ROSENTHAL: Part of it is it’s mandated in the charter, and I personally say because it is a black hole. There are characteristics around some of this in terms of taxonomy and collection bias and some of the other things that we’re talking about, but partially because current HHS data is in the charter social media is in the charter. I imagine the people who made the charter did that intentionally. I personally have only been around a little while, but I have seen it’s very high leverage. It also tends to be very black hole.
DR. CARR: I think your input is very valuable. Don’t lose site of the in between. You’re saying we have baby boomers becoming senior citizens, transportation for them that is going to be important. Have we planned for it, functional status getting back to Marjorie?
DR. TANG: What is the ADL for our community? That will impact so many things. Everything from getting your food– I’m not even talking about a food desert. Can you get regular food in your mouth, and can you have a rich life? Then it gets into care and nursing homes. There’s so much that can be– from one question that doesn’t exist. Why would we chase all these other things before we get into these high leverage questions?
DR. CARR: I am glad you are the meeting today, Paul. We’ve had this conversation before. Each person adds something great to the table, whether you call in from the cab or not, you made a great contribution. Thank you. It’s great. Larry and I talked about this. This is the most fluid group that ever was because it is all very new and we’re learning how to work together, how to understand each other and then how to be effective in pulling that together. Your contribution has not gone unnoticed, not just for us but for the full committee. I know folks have to leave. I have two comments and then we will take a break.
DR. MAYS: This is just to pick up on something that Paul said. There was an Institute of Medicine committee, which I sat on, and they were told by the White House to figure out for HIV/AIDS how to actually link all the different data together. There’s a model there where what we had to do was to go through and figure out how to do the clinical data, the epi data. And after that we wanted to see what were the gaps of what’s missing. I think that’s one thing to look at. The second is if we take a social determinants perspective about the health of the nation, it might be useful then to think about what are the areas that HHS needs to link its data to?
This has come up before in terms of talking about presentation. It’s come up before in terms of talking about housing. There is some work under way. Again, it’s coming from HIV/AIDS where the secretary is starting to say she wants the health data to be linked to housing, education, and something else. I don’t remember what it is, but there’s some work under way so that may be a thing to explore about the link.
It’s almost exactly what Paul has said. We’re doing social media because we have to, but at the same time we already know that those are some of the linkages that both we and the community would want to see happen and to recommend to HHS to explore that would not be unusual because they’ve already been told to do that for HIV/AIDS.
DR. ROSENTHAL: That is what I am trying to say in terms of taxonomy. In the example we looked at, what’s so great is HHS doesn’t have to do it. It’s being done by Google right now. We literally pulled up a bunch of HHS data and looked at it by birth weight, which we’re not supposed to be able to do, looked at it by race, by crazy stuff in there and set off the data set. Everything under the sun they basically have a master taxonomy. I’m not saying we have to use that, but there are public companies who are doing this right now. There’s nutty stuff in there.
DR. CARR: Alright, let’s take a 15 minute break. We’ll start back at ten after three.
(Break)
DR. CARR: Thank you everyone. What I would like to do at this point is to– you have in your blue folder the PowerPoint, and Mo, you have this as well, entitled Navigating Change: a framework for traditional HHS intervention and new data. I think as we said in the email, we didn’t get a lot of takers coming to tell us how they’re using social media data to enhance HHS data. I think we did a little bit of exploration, but I think what we came away with is actually there are bits and pieces being done and not end to end work.
What we thought we would do instead of trying to have a solve-a-thon but really actually to brainstorm about examples where we have– others have used social media and how they’ve used it. If we could begin to put together a framework of examples we might then have a hearing where we invite folks in who have done work that is illustrative and innovative and effective and learn from them the “what” and the “how”, and then take away from that what worked, what didn’t, why didn’t it. What are the issues related to privacy or standards or data integrity, usability, and sustainability?
I think one of the things that I learned as we looked at this, I was first very enamored of the Google and CDC data on flu, and then further reading educated to me that when the flu came later one year the whole relationship fell apart. I think too, we were talking about this earlier in the meeting that Facebook used to be people who were in .edu email, and then it spread to more people, and now it appears it’s only middle-aged women that are using Facebook. That’s what I’ve heard anyway.
Then we heard today about a very interesting thing, Snapchat, which is something that you send a picture and it self-destructs quickly. It’s just my thought, but there’s not a paper trail. Facebook wanted to buy it, but they didn’t want to sell.
We were discussing this morning some issues around privacy, and I thought Paul had a very incisive comment, which was that maybe this is the new privacy. In other words, if you sent something on Twitter or Facebook it lives in perpetuity, but if you send it this way and no one takes a picture of it, it goes away.
It was a fascinating observation. I think there’s still a story here that the wild and crazy population that will tell everything about their life on the internet does think about privacy, and this is their solution. It’s written by a 23 year old and a 25 year old, billionaires. Let me ask Josh to frame this a little bit, frame the discussion, and then let’s go into the detail.
DR. ROSENTHAL: This is a really good transition. This morning we talked about basic recommendations we can make, and we circled around in various recommendations. There were seven from way back when, and even more, and various illustrations of this. Since then we basically explored in depth social media and how that might be applicable for various communities in various ways. In true entrepreneurial spirit of getting something out fast rather than good and having something to react to it is a true straw-man.
This is a little write up of what’s been happening. Over the past few weeks explored various communities and take a look in reasonable detail about what they’re doing, and our expectation going in was that there was going to be a lot of usage of social media for solving problems in a meaningful way among the communities. What we found was that really wasn’t so much the case. I’ll give you some examples of that, and even kind of more troubling but more making life more difficult was that no one really had a shared mental model for what we even meant by social media and how we might be using it in different ways. We’d say hey, someone is doing something over here, so what’s a shared mental map that we can use to evaluate how people are using social media for what purpose?
At the last meeting we sketched out a little idea of having a framework with three basic buckets, one profiling, one intervention, and one outcomes. I’m hesitant to put any typology in front of academics because it always degenerates into complete solecism, and there are exceptions and we can expand it, but just for the sake of what we talked about last time, profiling, looking at your targets, looking at communities, looking at populations or cohorts, what do they look like? What are their characteristics? Then an intervention broadly defined, not just in the clinical sense but not just excluding it either, I want to do something to a population, and then outcomes, did that work or what were unintended consequences, et cetera.
We’d seen in the committee, this was a framework we had chatted about. I turned it into a slide. It’s profiling interventions and outcomes, and then we had gone through HHS data, intervention data and social media data. You can obviously extend that in to additional data–
DR. COHEN: What do you mean by intervention data here?
DR. ROSENTHAL: This was a call in the last meeting, the Seattle EMT work, the intervention– they were going to help with a particular condition and do it by using an electronic medical record, I believe was the intervention, to have mobile electronic medical records. During that encounter they’re calling an intervention they would gather additional data. In Louisville we had to explore some of the things they were doing, food access through some of the root cellars, and they were using survey data for their intervention data, or just basic encounter data, so data generated by the thing you’re doing to solve the problem you care about.
It’s very broad, very abstract. This is me putting out a mental map so we can have a shared vocabulary but way of thinking about it. If this doesn’t work, whatever we want to use is fantastic. I think we should come up with a shared image, so profiling intervention. The Chicago flu work, to walk through an example, that’s great. What they’re doing that made a lot of headlines, they’re using Twitter as an extended phone tree.
They would scan through Twitter and then respond to somebody when they would see someone mention they weren’t feeling well, they would basically say we have flu resources in the city of Chicago. They were using social media data, but it wasn’t around profiling a population. It wasn’t around outcomes assessment. It was in terms of some sort of interaction. The gist of this is that a lot of communities are using social media for PR work for marketing, for getting the word out about an intervention. You’ll occasionally see some analytic work around profiling.
Louisville is doing some of that, Chicago is doing some of that where they say, hey, let’s look at likes for hospitals, or let’s look at this, that, or the other thing, and how populations compare. No one’s really doing it for outcomes yet that we’ve seen in terms of outcomes assessment. Yes, that’s tricky. There’s a little bit of academic literature out there. There’s a journal article, which offers a spiritual exchange.
The problem with that is the sample size is too small because they’re trying to tie it to geography around hospital referral region, even the authors admitted that, but this was sketching out an idea to say when we think about social media we need to think about it in terms of HHS data because it’s part of the charter. Obviously you could extend that. If it’s actually trying to solve something or a community is trying to do something, I would propose, and we could change it obviously, I’m thinking about profiling a group, an intervention broadly defined is doing something, and then outcomes, did the thing work basically.
When we do our environmental scans and we see someone doing something cool at HTI or Louisville or a journal article or Seattle or Chicago doing these different things we can ask the question, okay how are they using this data? Are they using this in conjunction with other data? Are they using it for profiling the population, or a target population, or are they using it for actually an exchange, an outreach, bringing someone to a resource or care or something else more broadly defined, or are they doing it for outcomes assessment?
That’s sketching out a basic model. If you wanted to work with that, we’d talk about basically– we’d talked about doing an environmental scan, and this is my very rough draft of putting together a framework that we can use for that environmental scan so we can have some interpretive thread. Then you might want to look through and identify best practices for each cell, identify road blocks, these tips, tools, techniques, resources privacy concerns, cell by cell.
Social media outcomes are going to be really difficult. Social media profile was Google flu– we saw some difficulties in there. There are three points we’ve already seen. You can find some very good profiling work with some of the HTI things. Scan the environment, find best in class, and you can either look at examples that are working, or you can basically say as we scan the environment, there’s a lot of people using social media data for PR intervention staff.
There’s a few doing profiling, not so many people on outcomes. I’m going to try to find examples, filling it in cell by cell. We don’t have to use this, but this is my humble attempt to say let’s come up with some sort of framework so we have a shared mental model when we’re doing any sort of environmental scan. Rather than doing a solve-a-thon or coming up with recommendations, I think this is the first step so we’re even speaking the same language. That’s my learning from this year.
DR. CARR: I guess step one would be who is our audience for this.
DR. ROSENTHAL: The audience was just kind of internal group even learning what’s going on and having a shared mental model, and then two would be both HHS and communities were the two distinct audiences as angles. So to say to HHS, hey, if you’re thinking about incorporating this, or you’ve heard about some mosaic effect privacy concerns, these are the things to think about.
HHS needs social media data in the profiling component rather than in the outcomes assessment. Community, if you want to do this, Chicago is doing it around an intervention piece, Louisville around the profiling piece. HHS, here’s how communities are using social media data in conjunction with HHS data, and other communities, here are examples you might want to emulate or extend it into the other categories, even into actual intervention or outcomes.
DR. CARR: So feedback around the room?
DR. ROSENTHAL: Feel free to come up with something else, this is just me putting it out to engender conversation.
DR. CARR: Taking how this fits in, I think because it is kind of out there, I think a real question is, is it a flash in the pan or is it something that changes so much so often what it means today might not mean the same thing next year, or is there something that maybe isn’t related to content as much as to engagement? This came up, Vickie and Bruce and I, and actually Pat Remington, were on a panel on the public health hearings last week, or a meeting, and we heard from a person in Wooster who has done great work with the community around their various health needs. Her number one point was that data is of no use unless you have engagement.
It struck me that as we try to think about data use, this is an interesting way potentially of measuring engagement. Paul also mentioned over lunch as they’re trying to deal with population patients who are disengaged and lonely– we had talked about would this using social media engagement or something be a measure of moving out of loneliness or an outcome, actually? It’s quite uncharted territory. I think this is why we didn’t get a resounding windfall of folks coming to our doorstep to solve-a-thon.
I do think there’s a value in putting some structure and some examples together. My hope was that we could come up with some examples, whether from the Datapalooza or from other groups that you know– Josh has got some here– where we could think about how is this working.
Actually what you were saying, too, Josh about the Google survey thing. I think we ought to include that. What are we talking about when we say social media is one thing. It might be Twitter, but I think the Google survey is a very interesting thing as we try to answer some questions quickly, or even back to transportation. We can look at what’s available, but we can ask people how do you get to your doctor appointment?
DR. ROSENTHAL: This is just putting out a structure. Last time we said since it’s part of the charter to comment on social media intelligently and how it can be used to not only solve problems for communities but also in conjunction with HHS data, we should do some exploring. We found, number one, not a lot of people doing anything more than very ancillary things, things that would be very embryonic compared to other verticals. Two, we don’t have a shared mental model or map, even how to go about discussing it. Whatever we use is fine as long as we have some sort of structure where when we say you’re using social media data for profiling versus outcomes, we all know what we’re talking about.
DR. COHEN: Thanks for doing this, and I understand what you’re trying to do. The real bang for the buck here is in the third row, what you’re saying. We don’t know how to talk about social media data, so we need some typology or taxonomy to classify the use of social media data with respect to public health. I’m less concerned about filling in rows one and two here.
We can lose those because I think we know them fairly well, and the question is how do we– what buckets do we put social media data in so that we can come up with an evaluation of how they’ve been used well or where we think they could be used. For profiling intervention and outcomes seems like a good start, but it might be just as interesting to try to gather all of the uses of social media data that we know, throw them up on the wall and go through an exercise of what buckets we would put them in.
DR. CARR: Mo, I want to give you a chance. Do you have any comments at this juncture?
DR. KAUSHAL: Sorry you’re breaking in and out, so I’m getting every third or fourth word, but I’ve got some ideas around the framework piece, but I could email you that separately, around people doing really interesting things with mashing different forms of data together.
DR. CARR: We’re not hearing you too clearly either. So Mo, are you able to hear me clearly?
DR. KAUSHAL: In and out.
DR. CARR: If you can Mo put your volume for your phone on medium, not too high, not too low, that may improve this a little bit. Did you want to add something?
MR. CROWLEY: I was going to add I like this concept to do something. I shared the novel data partner’s idea with Montgomery County, which is our big suburban county here in Maryland, and they also shared it with what’s called the Long Branch Health Enterprise Zone, which is a group of about 20,000 citizens that came together– they originally came together in pursuit of some grants from the state, but they didn’t get it, but they do have this whole task force for and between different community partners and public officials and others, and they’re really excited to try to do something innovative and novel.
It brought Google in the other day to do their initial meeting with their local public health computing staff, so we’re in the preliminary stages of trying to find something to do along these lines. I just wanted to update that, but as those conversations develop then we can share that within this group to see what may work.
MR. DAVENHALL: My comment is on the environmental scan comments. First of all, I did have a page on that, and I think they were included. The environmental scan often doesn’t imply that you have to do what you read in the environmental scan. It just alerts you to what people are actually doing out there and experimenting with it. Some of them are failures. Some of them will be failures. Some of them will be great successes.
The environmental scan is like looking off into that distance. It’s like Paul was saying. You have to keep one leg in what’s driving the business today, and the next one is like what’s on the horizon. This is a good framework to capture the stuff that’s going to be happening. In March, every non-profit hospital has to start submitting their community plans, assessment plans of what they think is right and wrong with their communities.
I don’t know what in the world that will look like because as far as I can tell, there’s little guidance that anybody has provided. We ought to invite the IRS to come in after that period of time and give us a review of what people sent in. The other thing is that there are people out there — do you know what the X PRIZE is? The X PRIZE had an X PRIZE on public health that didn’t go anywhere, but it just got new money, and it’s going to be funded by Esther Dyson. She hasn’t revealed how much money is in it, but she’s looking for five communities that have at least 100,000 people that are away from a metropolitan area who have an engaged citizenry who have a public health authority that wants to do something that has an empowered health system that wants to help, and they’re going to work with communities over a five year period and do whatever it takes to get them to a healthier state by doing simple things. This is her words, simple things.
I have no idea what that is right now. What will happen is these environmental scans have to be– this isn’t like a one-shot thing. This is like we’ve got to create something that makes it easy for people to contribute to that environmental scan. While the anecdotal relationships that we developed individually are okay, we’ve got to figure out something that’s broader than that. We don’t want just the well communities to be talking about it. We want the sick communities, too. I don’t even know what I’m talking about in terms of a sick community. Bruce might know what a sick community is, but I’ve had trouble defining what a sick community is.
In some ways I think the environmental scan is not a substitute for some of the other things we’ve got to do, it’s in addition to it saying this would be like the turnaround– it’s like when you turn around the binoculars. You shouldn’t do this when you drive, but if you turn around those binoculars you get that long view, but it’s actually looking and scanning the horizon, trying to pick out things that you might recognize. I would say it’s something we’ve got to figure out how to do, I just don’t know exactly how we curate that information that’s going to come into it. Someone has to do that. I don’t see that’s an automatic–
DR. CARR: I like what Bruce was saying just in terms of building blocks, fundamentals. When we’re talking about social media, what are we talking about? I’ll contribute Twitter, Facebook, and the Google survey for three.
DR. COHEN: I am not only thinking of the modes or the media, but the uses. Here’s an example, like Josh gave a couple using Google survey or the free analytics or whatever, the Facebook life article about hospitals. We’ve used these three examples– there’s got to be more. If we solve all of these we will be able to develop a classification system for how they fit in terms of public health.
DR. CARR: I circulated also, this other article by Mark Silverberg, a PowerPoint, Social Media Data as a Public Data Resource. One of the takeaways I had from that is, as I just said, it may not be the content. It may be the volume, the number of tweets, the number of followers, the number of responses. That may be a tremendous measure of engagement or something like that so it doesn’t have to get into flu symptoms but rather– not so much the content but the quantity.
DR. ROSENTHAL: Maybe it’s worth taking a cut and doing a little taxonomy. There is no other way to do it to be quite honest. It gets complicated really quickly. If we’re going to talk Twitter, that’s one percent geography in that. You may have a million tweets in general. Is that engagement on its own? If they’ve mentioned a health topic or a non-health topic, you have to have it in a semantic construct, which you can do or you can buy, but there’s a lot of stuff around this.
We can take another step and drill into that in more detail, but I just wanted to sketch out originally with this framework, does this even make sense? A lot of the stuff we’re seeing in terms of scan or whatever else you want to look at is just using it as a channel, like a bull horn shouting at the public, or a new kind of telephone. You’ll see some in profiling. Like what you saw in Louisville, do you see some sort of dimensions make sense in terms of health dimensions, as well. Is there correlation?
If you want to focus on outcomes, then it gets down to taxonomy pretty quickly. There’s a geographic taxonomy. There’s a general taxonomy. There’s domain content taxonomy. There are different players of interaction. Whatever that is, you can abstract it out. Is it something, is it a like, is it a follow, et cetera? There are layers of engagement around that. We can take a crack and sketch some of this stuff out or we can do a scan and pick a couple of examples you guys think are interesting. Whatever you want to do, I’m happy to do it.
DR. MAYS: I think it would help the committee to do both, because I think what would be great to see is how it’s being used. There’s text4baby, so there are a lot of things that are used that make a great case to the Department to think about using it. Then there’s also the other side, which is to try and figure out the taxonomy of it.
DR. CARR: I think beginning with the examples– actually Lea had brought that up about the text4baby back in September of 2012, I believe. Are there other examples you’re familiar with?
DR. VAUGHAN: Specifically, no
DR. CARR: Right now we just have a very broad title called “Social Media” so anything you can think of that might fit in that.
DR. VAUGHAN: There are a lots of examples of things going on, certainly in humanitarian space. In terms of mHealth and text messaging, text for baby, HHS has promulgated a number of interesting applications around quit smoking and healthy heart initiatives. There’s a group at Humphrey Building that’s been doing analysis and metrics around that, engaging with the consortium of universities.
DR. CARR: So the mHealth– just assume I know nothing. That will be a very fair assumption.
DR. VAUGHAN: M-Health is a broad collection umbrella of mobile health initiatives. It especially includes things around using text messaging to convey information, in this case around health. It’s far more developed for the southern hemisphere, interestingly enough because it’s less expensive. People don’t have broadband access, but for a North American example, text for baby is superb, very widely adopted, and very popular. There’s actually some good data around it.
DR. CARR: What would be the text and then what is the analysis that is around it?
DR. VAUGHAN: Text for babies is a little bit more proprietary. The text libraries for quit smoking around pregnancy, around different ages for women are actually part of healthdata.gov already. They’re free to use and free to reuse. Then within CDC and NIH they also already have some applications available to share available– pretty much on Android and on the Apple platform as well. There was some interesting concern about whether it was just being– text for baby in particular– just a great talking point, or whether it was actually having an impact, and so there’s a group down at Humphrey that looked at that in partnership with NIH and a number of universities.
DR. CARR: Have they made a conclusion?
DR. VAUGHAN: They’ve published some work on it already. They are quite pleased, and how could that be extended? How could that be improved? Certainly linguistically there were issues, and that’s being improved. In general, most of them, the libraries, the free to use libraries, are available in English. Some are in Spanish as well.
The southern hemisphere types of resources, aside from volume of users there’s not been a particularly validated body of work around how usable it is, although there’s certainly hope that it is, and it seems to be by the popularity.
DR. CARR: So back to the fundamentals on the text for babies, or something like that, a message to quit smoking, who initiates the message and how do they know who it goes to? Then what was the measurement that allowed them to declare victory?
DR. VAUGHAN: There are a number of possibilities with that. In general, it’s opt-in because even though text messages are quite ubiquitous and low cost, they do have a cost. So women have to choose to, or whoever’s using it, has to choose to use those. Some of the companies, our two main companies in the US and there are others in other countries, differently charge for that, whether it’s a for- profit or non-profit.
They have their own series of analysis, analytics about where the call and subscription originates from and where it’s going. Some of that is more open than others. You can chose to follow that along. I’m not recalling, otherwise I’d give you a citation, there was another paper that came out of HHS around it. That was Dr. Atienza — I’m mispronouncing his name I’m sure. He’s a wonderful researcher that came out of NCI, and he’s downtown. He’s been usually the lead on that.
DR. CARR: That would be great to get that because that does sound like an outcome.
DR. VAUGHAN: He has been following that along and is a really elegant behavioral scientist. He very much has a strong foothold in qualitative and quantitative analysis.
DR. CARR: Cool. That’s got to fit in one of these boxes. Other examples?
DR. MAYS: The person you want to talk to is Bob Kaplan because in the Office of Behavioral and Social Science Research, they run the mHealth training, which I took, but Bob’s office is actually funding a lot of this work. He can lay out for you a case for use. The two places at NIH, exactly what she’s saying, is NCI has actually been a leader in this.
There’s a lot of programs that are on the shelf. We can take them off and use them and modify them. Part of what– CDC also outside of NIH. CDC has been doing the HIV texting programs, so they funded them. The way they use them, the first place they did it is up in San Francisco. It was funded by a CDC grant. I can’t remember the name of the person’s company, but what happens is that people were not– especially young men were not coming back for their HIV results.
Instead what happens is you go, you get a test, and they text you your results, and part of what happens is you text back about things. You can ask questions. You can get a menu where you can go off and find where you want to go to get a counselor to talk to you.
There’s a lot within CDC, within HIV, they now have a lot of texts programs, and the way that they’ve been measuring it is, is it red, how many times is it red, other metrics, or do you respond to the things that we suggest or ask you to do? Do you go to another site as a function of it? It’s been really successful?
They increased the testing rate – getting test results in these young gay men incredibly by using a text program. The other place they came up with it was the Chicago Department of Public Health, which has also used texting for people to learn HIV information.
There’s a lot of cases that have good solid data. The NCI guy is really very good. He’s doing the smoking cessation stuff. So there’s a lot of scientific data right now.
MR. DAVENHALL: I am an old school thinker, but every once of a while I get jilted into something else. I went to a program called Future Med. QualComm has a phone that has six biometric functions on it, so data will be flowing somewhere without the benefit of a physician, a nurse, a health department, or the National Center for Health Statistics. It will be within your lifetime.
All I’m saying is that we’ve got to be — those binoculars have to be big. What we’re going to see is the proliferation of data. It’s going to be exponential. It’s going to be out there, and essentially what you keep calling social media, it’s beyond social media. It’s ubiquitous information. Companies are going to make fortunes in sending us text messages that are– no one in the health system today in America knows where Bill Davenhall is at this moment.
In five years from now, the health systems we’re involved in are going to know where you are and what your risks are, your exposures, your resources.
This isn’t Buck Rogers stuff. This is stuff that’s already built, being deployed, and when companies like QualComm make this statement they said, we are looking for an organization who knows how to manage all this data. We’re not going to get into that, but the data is going to come like —
DR. ROSENTHAL: I think it’s good to have some definitions around this. Social media, let’s define it however we want to, but it shouldn’t be biometric. It shouldn’t be censor. It shouldn’t be devise your geo-location exclusively. There should be some social network hanging on there, whether it’s self identification or being a member of some sort of network with other members, would probably be one way to restrict it. There’s a whole other train around biometrics and device-based stuff and censor data.
DR. CARR: I think somewhere we use the term “new data”. I think biometrics is done. It may not be the one we’re talking about today, social media we’ll carve out differently but, I think the idea–
MR. DAVENHALL: The same device you have in your hand is social–
DR. ROSENTHAL: Those are two in my mind of looking at things. Those are two very distinct problems we’re looking at. One is, I’m sharing qualitative content, or I’m looking at a “like” about a hospital on Facebook. Another is I’m actually looking at a real time blood glucose. In terms of outcomes, one is rather straightforward, there’s frequency and latency.
Another is not straightforward at all. How does a relative weighted average of likes by population, correspond to expected over actual readmissions or readmissions for heart failure with a CMS data set? That’s not clear to me at all.
My blood glucose, whether it’s monthly or whether it’s real time, that’s rather straightforward. We can do either, but I want to sharpen the definition. What tends to happen in this is it tends to become an amorphous blob, basically. I think we can do an environmental scan and define what’s social media, what initial and new data, whether it’s device, biometrics, even genomic if you want to do that, and then find some examples around it, put it in a typology, look at where it’s worked and where it’s not, and be able to at least say here’s a blueprint for it. Here are some best practices. Here are some places where it’s flubbed up. Google potentially flubbed up around the semantic interchange around quantifying qualitative content. That would be an example of that. It’s very different than other stuff.
DR. CARR: So we have texting, and we have Google, and we have Twitter, and we have Facebook. In terms of defining it, how else would you define it?
DR. ROSENTHAL: You can think about it in terms of channel, basically. There’s mobile versus however you’re going to act on it. I’d actually say a social network is a group of individuals within a network and with a self-identified manifestation. Pick the top five that people think are social networks rather than defining it, and go with that. Twitter, Facebook, let’s do something with Google circles, and what else do you want to use?
LinkedIn has very interesting stuff there, too. If you want to know what people in the healthcare space making over $100,000 think about this, that, or the other thing? Maybe something else that’s non-specific and kind of new, like Google consumer surveys or something like that. I’m just throwing stuff out there.
MR. CROWLEY: I haven’t really looked into it yet, but there has been a lot of buzz about this DocGraph thing. I don’t know if anybody in this room has any opinions on that. It looks at all the different referral patterns across different physician groups.
I realize it was driven from some HHS data, given that it’s getting some buzz and there are some interesting models coming from it. There’s a core platform, but now I believe they’re doing a “freemium”, where you get a little bit, you pay some, and you can get some more. It’s HHS data actually disclosing some interesting relationships. That might be something to put on the list.
MS. BRADLEY: Part D, they stripped it down, and I think ProPublica got it under the journalism license to do an analysis. All they got was who was the original physician and then whom did they send them to?
DR. ROSENTHAL: Are you talking about Fred Trotter stuff, basically, which is Medicare fee-for-service? You can actually pull up against the Medicare metrics and say show me hospitals that are high readmit, or AMI, basically with overspending on cost-ratio, and then show me referral patterns, who’s driving that basically, doc by doc.
MS. BRADLEY: There is somewhat– on Salesforce they have that– have you seen those influence scores? That’s sort of how did we want to think of interactions, networks, and referral patterns based on real behaviors.
MR. CROWLEY: If there are certain key terms in social media that we want to follow, we could do a Radian6 extraction to see what’s being talked about in the social media sphere. I know at our institution we have people who are doing some NSF related projects that are looking– there are some new futuristic type algorithms to predict the future using social media trends that might be also interesting to look at. Some of those new analytical techniques, while some of those techniques are new, they could be applied to some existing data sets that we have in the catalog.
DR. CARR: Say more about that.
MR. CROWLEY: Essentially within the Internet, across different social media places, Twitter, Facebook, the blogosphere, with certain technical tools from web-scraping to being able to actually download the data that’s being made available by these tools, you’re able to apply new predictive models and other types of analytical techniques that I can’t speak deeply about, but I know that they’re being done. Some of the work DARPA funded over the past couple of years, and that’s resulting in some new ways to use this data really in interesting ways. There’s another area that could be worth exploring for our scan.
DR. COHEN: What would they predict? Give me an example. Are they trying to identify individuals who might engage in certain behaviors?
MR. CROWLEY: One of the things that we’re looking at now is based upon some conversations about the accountable care act in different communities, you may be able to predict what enrollment or what types of problems are happening for those types of enrollees and be able to mediate that problem or create more effective outreach strategies.
DR. ROSENTHAL: It is a kind of a laundry list of stuff. Some people do and there are boutique folks that do this in the private market where they actually scrape through Twitter, Facebook, some other things and look at words, positive, negative, and they basically do it by geography or they look for individual brands. It might be an insurer. It might be a hospital, as well as condition, and they have it in a taxonomy. They do essentially a health net promoter score, which predicts will you respond to an intervention? Are you likely to respond to a disease management program or something like that? There are other folks who look for articles about straight correlation basically, do likes on a Facebook hospital have anything to do with correlating of care, not necessarily caps, but other metrics specifically?
DR. GREEN: I am not sure this fits, but this word “gamification” as it relates to healthcare and health education, does this belong in this realm?
DR. CARR: Can you say more about that, “gamification”?
DR. GREEN: In teaching stewardship of resources and a way physician a physician manages a patient, there are now games on iPhones on rounds where you can using the charge master for the training institution each time a resident orders something that it blends a tally of what they’re spending. That can be done as a boring table, and you can then tally it at the end. You can also have a point where it explodes– some cartoon character explodes and says you continue to practice medicine this way and you’ll never be able to see a repaired bridge again in your life– that’s the idea of turning it into a game.
MS. BRADLEY: I presented on something like that yesterday on the risk of assessment within small practices.
DR. ROSENTHAL: I would suggest is if we like this idea of looking at social media or other data, profiling to intervention and then to outcomes, we kind of go off and take some time and kind of think about it, and look back through your HTI stuff. There’s a lot of articles swirling around and basically say where does that fill in and where do you want to drill into?
DR. CARR: That’s a theme that we’ve had today, just getting a literature review or landscape.
MR. CROWLEY: One more example, this is something that FDA is interested in right now, using social media to be able to better identify patient preferences about different risk profiles of drugs. Traditionally to get that type of information it takes studies. It can take a year or two years, but now because of the discussions that people are having on PatientsLikeMe and other blogs and communities, you can use some of these tools to get better understanding of how broad patient populations are using certain drugs and how those drugs are effecting the patients in different ways that was otherwise not well understood.
DR. CARR: How do they do that? By what mechanisms does the FDA do that?
MR. CROWLEY: They are trying to figure it out, but essentially if you have a conversation around a specific drug name, then you text mine across different discussions of that, and then you pull out and start doing some analysis to see if there are certain key words around certain adverse events or certain types of side effects that might be occurring on drugs. By having a broad enough population in conversations that are feeding into this analysis, you can start getting a little better in terms of validity.
DR. CARR: So these are really great examples. If we pause that knowing that we need to do a little deeper dive and get some additional examples, let’s say we come up with an array of examples. Then what? Again, getting back to the charge, some of these are an intersection between HHS data or an agenda, et cetera, and if we can look at the effectiveness– I’m just trying to think. We get that information and then we categorize it and say, here are things that are being done. It goes get back to our framework of profiling in a way– I’m trying to get my head around this, too– profiling, interventions, and outcomes.
It seems like what Lea said about the mHealth and smoking, let’s say, we’ll get the data on that. I don’t know if there’s a component or profiling that goes– but it’s essentially an intervention that actually has an outcome. For the FDA model, it’s profiling and potentially leading to an intervention. This framework I think may begin to work because I think what we want to get to is when we think of social media A, this is what social media is, B, these are the things you can do with it or you can expect from it or you can– ways in which it has been applied with a good outcome.
DR. ROSENTHAL: If the charge is, how can you use social media in conjunction with HHS data or additional new data, I think we can do a quick environmental survey. We just have to have some sort of framework around it and say here are the ways that it tends to be use. A lot of it tends to be used around intervention. A lot of it tends to be used about profiling, pure social media around profiling, biometric stuff around intervention, mobile around intervention, not so much around outcomes– here’s the state of the state right now. It’s embryonic.
Within outcomes what we find is, here are the best practices, here is are example of people that are actually achieving these things. Here’s what they’re achieving. Here’s what best practices. If you’re going to do this, you need a reasonable sample size, you have to look at geography, blah blah, here are some liabilities.
Additional privacy concerns that we haven’t thought of, sample size immediately, et cetera. When we sketch it out within each of these cells, what are we talking about? Is it a social media network that’s dedicated to a community, condition, or being sick, or is it a general network where you’re just looking at people who may be sick or may not, or is it the quantitative, qualitative stuff we’re taking about.
Here’s some basic contours of how people do it. Within social media profiling, there’s different ways. There’s restricted health, restricted community, talking about stuff. There’s general community talking about content, or there’s this general open community. Here’s an example. Chicago is doing this in an open community around a health topic, around an invention, starting to lay out some of this.
hey’re doing well in this area. They’ve had these results. Here’s a best practice, someone else who’s doing it except having an army of volunteers manually reading stuff. They actually set up a little search bot, basically, as a way to expedite it.
DR. COHEN: Your matrix would be more around the purposes as one dimension, and the other dimension would be– I’m not exactly sure what word, but it would be whether it’s targeting general population, specific population, whether it’s a mHealth personal population use. It would be a schema of what the social media or social network interaction is intended for.
DR. ROSENTHAL: That is one way. I tried to do very high level.
DR. COHEN: There are so many things floating around, it’s difficult for me to–
DR. ROSENTHAL: If I were to do it, I’d basically have that cell, and I’d basically have a list of three features. I’d have channel. I’d have inclusive. I’d have qualitative, non-qualitative, and I’d just go through it one by one.
DR. CARR: I think that one of the things is that let’s say we invite some of these folks, because it sounds like a bunch of them are local, and we invite them. What are the questions we want to ask them? Of course we want to hear about the success, but we also really want to hear about the challenges. What were the prototypes that didn’t work and why did they not work? I think that’s very important.
Here is what I’m thinking for our next meeting we can invite a couple of these folks here and get some structured inquiries and get a sense of what do we want to learn and hear from them what has been their experience and where do they see opportunity.
MR. DAVENHALL: The evaluation part of our brains are going to kick into high gear. I think that’s what Bruce is struggling with. At the end of the day, businesses who have spent tens of millions, hundreds of millions of dollars on social media still have not produced an effective ROI for a board of trustees or directors.
The jury is out as to what this is actually contributing. There are some industries that have done really well. People who sell servers, network routers, devices, they’ve done really well from this, but we’re unsure as to what it really does. We know that most businesses can’t afford to be in this now because it’s one of those things, do we advertise or note? If we don’t believe in advertising, do you have enough gall that you can stop all the advertising? The answer in business is no.
DR. ROSENTHAL: Actually, I fundamentally disagree with that. I think there are very clear cases of ROI. Like I run an ad on Facebook for a product, and I actually generate more revenue based on those ads, and I can see which users are coming through those ads.
MR. DAVENHALL: But a full cost accounting, what it costs to put those ads out.
DR. ROSENTHAL: That’s what I mean by it. The big business that have those ads on Facebook and Google Plus, they don’t do that without ROI, very specifically. I fundamentally disagree with that. I think actually social media is a very effective channel that actually has data that generates hard ROI, and that’s one of the fantastic reasons that people are increasingly using it for those sales.
MR. DAVENHAL: I disagree with that but we can argue that.
DR. ROSENTHAL: That’s worth knowing. Is it a new-fangled thing and all these businesses are doing it without reason, or are they actually tracking your data and sending you out a different ad than someone else with an ROI calculation on the back of it. By the way, these are public companies, so you can see that ROI.
MR. DAVENHALL: That is a business process. We could have done that without social media. Companies could send reminders. All those ads you see about UPS, all that can be replicated. We have to tackle the work flows.
If we talk about ourselves, and we talk about healthcare work flows, the same thing applies. We could already do a lot of what we’re talking about here. It’s just that we’ve been usurped by something called social media all of the sudden. We should benefit from that, too, but I’m saying that’s not a good enough excuse to just be doing it. Nobody in the healthcare system in America knows where I am today. What does that suggest?
DR. CARR: Why are you saying that nobody in the healthcare system knows where you are today.
MR. DAVENHALL: I have no e-mail messages, I have no alerts that being in this particular county could be dangerous to my health. Maybe this world is perfect, but you realize that we are not plugged into the same thing Josh is talking about. We’re very far removed from it.
DR. ROSENTHAL: I think geo-location and social media are two different things. One thing I would think about why are we doing it, a) because you’re right, it has leap frogged ahead of us. It’s happening, and HHS does not have a reasonably informed position on this, and even on the discussions in the workgroup we’re not even using the same– we have fundamental different perceptions about what’s going on to be kind about it. I think part of it– the easy answer is it’s part of the charter, so that’s why we do it.
MR. DAVENHALL: Todd came here and told us that what he wants is a NOAA, he wants DHHS to be the NOAA– if you’re familiar with NOAA. He wants this industry to become a NOAA. We’re talking about this massive co-mingling of situational health-relevant information for the benefit of us, not the statistical research community of the world, but for individual patients. The asset test is, even when you get in social media, what has it done for you today in terms of your personal health. That’s the question
DR. CARR: Should we talk blue button–
MR. DAVENHALL: It helps me go to take it to my physicians who don’t know how to use electronic health records and to substitute and be a surrogate of electronic medical records. I use that as a poor man’s way of getting information in front of my doctor.
DR. CARR: Again, just trying to get our hands around– it’s hard to get our hands around this. There is federal data in there, your CMS data is in there. It’s using a mobile app. You enter data yourself. Then does Humetrixs do anything with that aggregated data?
MR. DAVENHALL: No, they just make it look prettier, and it’s more accessible to you.
DR. CARR: In a way, it’s not exactly social media. It’s really HHS data packaged in an effective fashion. It gets back to this framework, and on re-reading the charge, it may be, and this was your point, too, and Paul’s as well, this whole transportation thing, there’s a lot of data there, but it’s the imagination and the insights that have to say it’s all there, static.
What are the important questions to answer and how do we not only aggregate the data but push it in a way that is easily understandable to the person to whom it matters the most?
DR. GREEN: Justine, I want to go back to your question about what kind of questions we want to ask people if we got the right people in the room. From the point of view of the full committee, I would propose four.
One is what is the stability of the devices and the system that you’re using to collect data? Think about new data infrastructures for the country using this kind of thing. If they come and go every nine months, their utility for our purposes is limited. That gets to the discussion and disagreement about the viability of business models.
The other three are really mundane and boring except for people who care about statistics, but one is do they know anything about the error rates for the data they’re using? Do they know anything about the biases in the data that they’re using? What do they use for a denominator for the data that they are using to generate rates?
DR. CARR: I am sensing in the room a kind of restlessness. It’s hard. This is a hard topic. We’re not going to go to the full time until 5:00. I think what we need to do in the next 10-15 minutes is really get a clear step by step of what we’ll do next, and when we’ve done that what is it we would learn? Are we in agreement that it would be of interest to bring some of these folks into the next meeting?
DR. MAYS: Bruce, very early on in the life of this group, gave an example that would be great to use. What he said was that it would be great if somebody got diagnosed with something and they could walk out of that and be able to go to the HHS data and say, what is my likelihood if I had this and I had that surgery, this would be the outcome? What he was talking about was trying to take the HHS data and let a person be able to use it in order to accomplish something about their health.
You’re saying, well, what good is it doing me, to some extent. I think the app or whatever would be– okay, I’m just told I’m pregnant, so I want to go on and see if I was told I had a high risk pregnancy what is the likelihood of x or y happening. Maybe that could be the case.
Then from there to ask people things about can you build something like this, what kind of things would you use to be able to allow a person to get this data? How easy or difficult is this?
Then to start saying what are some of the other sources that you want to translate that through? Do you want to send them over to Google to find that something? Do you want to connect them up with something else? There’s some basic fact data that you’d want to get? Then you’d want to send them out beyond that to start learning, like they would get a patient community. That’s a big use online of people going and finding a community.
DR. CARR: So you are saying sort of categorized what is happening. I’m just thinking about the thing from this week, thinking about the statins and drugs, so what’s your LDL? Now, here’s the rest of the story of what your risk is. That might be a good example.
DR. MAYS: It is like taking the case of the HHS data tells you something about your health and then from that, because I think we’re getting it too narrow, from that it then can take you into some of these other kinds of data, social media if you want to tweet to somebody about I have this, what did you do? You can really help them by trying to give them examples of here is the fact data, and then here’s the social connectedness data around the disorder or disease.
DR. CARR: I will go back to the framework that Josh had, because even though we said we know all about one and two, in a way, you get your profiling from your CMS data, and you do an intervention, which gives you an interpretation of what that data has told you and then the intervention may be not so much profiling, but your intervention may actually be in social media, that you now push them to somewhere to learn about it or a PatientsLikeMe community.
MR. DAVENHALL: You know how you go on an airline, and they ask you whether you’re a senior or not. You put in the destination of where you want to go. What should happen next is after I put that in there, I want to go to Rochester, Minnesota and I said I’m a senior. The next piece of information that social media should provide, or whatever it is you want to call it, comes to me to let me know that if I go to this part of the country I stand a very good chance of not being over medicated by a physician who hasn’t read about the hundred drugs that Medicare has said are no longer effective and are actually harmful for seniors released three weeks ago by Dartmouth.
You can go right now to Dartmouth, and you can find out depending on where you live what are your chances as a senior to be medicated with a prescription drug that’s on the banned list for Medicare. I say that is a really great use of– if you think about all that data and all the people that handled it, now it’s becoming useful. If the airlines sent me a message and said don’t go to Houma, Louisiana because 70 percent of the time you’re going to get one of those drugs. That thing is already available to us. It’s just nobody has figured out how to get it to us.
DR. CARR: This is another thing we’ve talked about, which is end to end. We want information pushed to the people who need it, and I think the HHS data doesn’t have the dynamism, but the social media, broad definition, vehicles can provide that dynamic piece to link you to the next important piece of data.
DR. ROSENTHAL: I think we can probably collectively do the questions we’d ask folks, but I think what are you trying to accomplish and do a quick definition of profiling– are you trying to look at– you can crosswalk geography, a list of hospitals, a list of conditions, a list of negative words, by Dartmouth or whatever you want, or a list of positive words. That’s a profiling exercise.
Think of it as non-traditional variables or whatever, or are you trying deliver an intervention? Is it a mobile thing, or is it a social network that’s actually designed for support and finding treatment or what have you, or is it something along outcomes? We can use some of those definitions, what are you trying to do, what would you have liked to do that the data didn’t support, what are you least confident in from a data perspective as well as from a measurement perspective? A question quickly about geography, some of the statistical denominator things you’ll see will turn into multiple dimensions, what did you try to do? What would you have liked to do? Where do you put yourself in a topology? Those things will be helpful.
MS. BRADLEY: Do you mean evaluation, or would that be another word for it?
DR. CARR: I think what we’re going to come away with is we think about Twitter overturning oppressive governments and the power of that, and we’re trying to think of the next thing that’s like that, but I think what we’re going to find is that it may not be a destination but a vehicle. I think that it’s– just by the fact of the incredibly sophisticated group here, we’re struggling to get our arms around what we’re talking about, says that if we don’t know it, there’s a lot of other people that don’t know it.
I think there’s a value in beginning to do groupings of ways in which social media enhances health. It may be profiling. It may be directing. It may be supporting. It may be any number of things. Let’s start with how it helps health. Then we want to do in approaching it this way, what are the risks? It goes beyond social media. Let’s start with how it helps and what it can do and how we can think about it. I actually think it is that missing link, if you think of these data sets.
DR. ROSENTHAL: You can maybe think about even for the purposes of this conversation, there’s a channel specifically, which can come over other channels, devices, or non-devices. There’s content, specific qualitative content, and then there’s activity around that regardless of the content. Then there’s the data set behind it, which may or may not have other variables. You’ll find Twitter has horrible geography. It’s two percent, all sorts of sample size issues.
LinkedIn has fantastic geography as long as you’re interested in metro area. Facebook typically has fantastic geography around zip code. We’ll start to get into all of that. I think asking them what they’re trying to do in that typology and where they ran into issues and what they thought they did well would give us–
DR. CARR: Here’s what I’m thinking based on this incredibly educational three hours. I always learn so much here. Could each person take as homework to find two examples to contribute to our collective knowledge of social media, as broadly as we’ve described it. It could be games, Twitter, anything, but examples of use? Unfortunately Lea you won’t be able to reuse the ones you gave us. They were great, but I’d love for you to find two more.
Let’s aggregate what we find, and then set the goal as– I think if we can line up some folks to come and educate us in deep dive on what they’re doing in February, that will be very rich and I think this is, even though we wanted a fast turnaround time and although it’s true the furlough put us back a little bit, this is a much more complicated story to tell, but I think it’s an exciting one. It’s going to take us a little more time.
DR. MAYS: Can I suggest that you actually make a call to Wendy Nilsen who’s at NIH? She organizes the mHealth, and she knows people both– she knows all the people that the proposals have gone in for, and she knows people nationally and internationally. She can give you some great– she is in Bob Kaplan’s group. She’s the direct person that does most of it. When we’re doing proposals and stuff, we call Wendy up and she can give you five people like that to tell you. She can identify people to come and give great talks.
DR. CARR: So are we in agreement that bringing some of these folks to the next meeting is something that we would like to do to get a deeper dive. I think sooner than that we’d like to get some stories or some examples. We will go to that, but if folks can think of two examples to contribute to this or resources, we can begin to pull that together.
DR. COHEN: We have been talking both about personal health and generalizable uses of these data, and to me they’re very distinct. One is targeting personal health decisions. The other is what kind of information is out there that can give us a sense of what’s going on in the community. I’d like to make that distinction clear because I think they lead down different paths, both important social media paths but different paths.
DR. CARR: I think if I learned anything today, I didn’t really know anything when I thought I did. There is much more to this than we thought. I suspect that there will be different cuts of it. The more we look at this, I liked it in the beginning. I like it even more now because I think you can navigate all around these quadrants. There are different models taking you to different places.
DR. ROSENTHAL: To Bruce’s point, I did this with an eye to population intervention.
DR. COHEN: Right, but for the last part of our discussion we’ve been talking about personal. We’ve switched over into–
DR. ROSENTHAL: There are a lot of stuff about that, but much less stuff from population intervention.
DR. COHEN: I agree from what little I know
DR. ROSENTHAL: My recommendations– we have seven initial ones. You guys can take, like, abuse, disabuse, do whatever you want on those, and then the eighth one is some informed positioning recommendation around social media and new data, social media at least has to be required so we don’t skimp that. The first step is to get a survey and a way of talking about it, and we’ll find in each of those buckets very specific things on the population side.
DR. CARR: I think we are going to learn a lot of things because even– I get it that we want to do population, but I think there are some intriguing things emerging we don’t want to overlook. I think right now it’s just to bring whatever you’ve got. To summarize, first, how great it is to get together with all of you and exciting; second, we are going to put together the information we have in a letter. What we need to do actually is to set up a call for December and a call for January, a one-hour call with this full workgroup.
At a minimum the first call will be to review a draft of the letter that will circulate to get your feet back, and the second piece will be to see where we are with this social media and where we go next.
Again, in January, we’ll have a wrap-up of that. We want to be prepared for the February meeting to make sure we have everyone’s input on the letter and also that we’re moving this along. Then we can also begin to plan either in December or January whether we want to invite folks to our next meeting. Is there anything else? This is great. Thank you everyone.
(Whereupon, the meeting adjourned at 4:30 p.m.)