[This Transcript is Unedited]
Department of Health and Human Services
National Committee on Vital and Health Statistics (NCVHS)
Work Group on Data Access and Use
February 18, 2016
Hubert H. Humphrey Building
200 Independence Ave., SW
- Welcome – Vickie Mays
- HHS Updates, Requests to WG, Request for Comment on WG Priority Setting – James Scanlon
- Framework, Content and Process for HHS Website Review for Committee Charge – Helga Rippen, Josh Rosenthal, Kenyon Crowley, Chris Boone, Vickie Mays
- Work Group Plan 2016- 2017 – Vickie Mays
P R O C E E D I N G S (1:10 p.m.)
DR. MAYS: Good afternoon, everybody. What we are going to do is actually get started with the NCVHS Workgroup on Data Access and Use. Welcome to everyone. What we will do is our typical. We will go around the table in terms of introductions. Then what we will do is ask people online to introduce themselves. Then I will also check in, in terms of online to see who we are going to actually have on webcam. All right. Let’s start.
(Intro around table)
DR. MAYS: Thank you very much. We are going to get started while they take care of getting people who are online on. Thank you for those of you who are there bearing with us as we get more and more proficient with the use of technology. Josh, I am so used to you being here, so I don’t know. Change is hard.
Okay, let me get started. What I would like to do before we ask Jim to give his remarks to us is just to do a few introductory comments because there have been a few changes to the agenda. I want to share that. Then I want to share a little bit about what happened in terms of talking to the whole NCVHS committee.
As you know, part of what we are really focused on is developing our work plan for 2016. As part of that work plan, what we have talked about because it has been a request to us by Jim Scanlon’s office is that we come up with a conceptual approach or framework or whatever we are going to call it in reviewing the HHS agency website. Our responsibility is to focus on increasing access to data and increasing use of the data. I think that is a very critical issue for us. Part of what we want to do over time is to make sure that as we pay a lot of attention to data, that we’re actually getting that data into the right hands.
For us, what we have talked about is the issue of use cases. We have talked about those use cases as trying to look at these websites to determine whether or not this data is available to whom and helping agencies get some feedback on not just how usable is it, but who is using it, and maybe more important, who is not using it. As we work on our frame for this, we will talk a little bit about that.
Today what I did was I shared with the whole NCVHS committee a bit about where we are. We have shared with them. I know everybody should have received it be now. It is a frame, I think it was Kenyan and Lilly and Josh worked on, to get us started with comments that came from the last meeting. Of course, those comments involved everybody’s comments. It was just the people who worked in between. Many of these comments actually come from others in the room.
The other thing I shared with them was the issue of the products that we want. As we talk about this frame, we also need to talk about what it is we want to do with it, what is the process for engaging with agencies. Before we have talked about the development of a questionnaire to go to an agency before we do a consult with them to find out information about the data set. Then it would be the actual review once we had access to their website. Then the third piece would be coming up with feedback for them. That is what we also want to think about. I just want to give you kind of these big picture things.
Then the last thing we talked about with NCVHS is where our convergence is. I think at that point, we got a lot of feedback. That particular feedback, I think I want to take a little more systematically after we get our work done and we talk about our work plan. I will also draw Walter directly into that as we begin to think about the things that we can be helpful with NCVHS on and talk about even the training or presentation we talked about doing at the next meeting.
I am going to leave us there as the introduction. Ask anybody on line if you have any questions or comments. Hearing none, anybody in the room, questions or comments? If not, let me turn it over to Jim.
MR. SCANLON: Thank you, Vickie. I want to thank everyone, number one, for agreeing to serve on the workgroup and giving HHS the best advice on how we can make our data, in quotation marks, number one, more easy to find, more accessible and easier to use, and make available to improve health and health care throughout the US.
Let me update you on a couple of things that are going on at HHS. Then I want to describe how we implemented some of your previous recommendations regarding access to web-based data. Then talk about some of the uses of the data.
Within HHS, as you all know, we place the priority on making the data that we develop either as a byproduct of program operations, which can be very valuable data when used for broader purposes, as well as the research and statistics and the surveys and evaluation data that we develop. In some cases, we often have access to directory-type information, like where are the substance abuse clinics located, where are the community health centers located, things like that. In many cases, it is not really viewed as data. But on the other hand, it actually is very useful, not so much for statistical analysis, but for providing guidance and information to consumers.
The more you can help us take that broad concept of data liberation and data access and making it easier to use and find, and begin some sort of segmenting or differentiation among who the different users are, that will help us. I think one of our first emphases, you will remember when we formed and when we created the Open Data Initiative was let’s get it out there. Let’s find it and get it out there.
Number two, we used some of your principles. The data has to be available in machine-readable form. We are pretty much implementing that throughout. Whenever we publish data, we have a route to get the data available in machine-readable form.
Then you gave us advice on sort of, from the developer point of view, how can we better describe and tag the data, so that people who are not aficionados and deep subject matter experts could actually find this more easily. We have implemented some of those tagging principles, the machine-readable form principles and others. As Damon will tell you, we have a group that continues to look at the data sets, tools and reports that we have available and coming online.
With the view towards making it available, always think of how you can make it available to broader audiences, make it more generally available. Again, I think the next stage of sophistication will be available, but who is more likely to use it than others? This is communications 101. Actually is probably 501. It is probably the graduate course. The idea of thinking of who do we intend the data to be used for, some sort of market segmentation.
There is clearly some of our data that is meant for consumers, consumer use. There is other data that is really meant for fairly sophisticated communities, the developer community, the technology community. There is other data that will only make sense to researchers and public health experts and other quantitative analysts and so on. Then there is other data that is probably better for community, non-governmental organizations or for community government organizations and others. That is just about a half a dozen of the different audiences. If not audiences, probably the users of the data. I think we have to think of the portfolio that we are making available, who the intended user is.
As you all know in communications and public relations and so on, in marketing this is the first thing you do. What is the audience, and what is the message and so on? We could probably use your help down that line, as well. We are thinking about that internally, about how to segment that kind of intended users, as well.
Within HHS, we are actually, and Damon will tell you more about open data and Data.gov and Healthdata.gov. More on the statistical and data policy side, we are continuing some of our efforts within HHS to kind of move forward, if not a coordinated fashion, at least we are marching in the same direction. One of the principles here, it is a principle that HSS and the Data Council have held forth about how we look at data streams.
The National Committee has adopted it as well as an overall framework for how we proceed. It is the principle of alignment. In essence, we should look at the various data streams that we have, whether it is from the research setting, the public health surveillance, the statistical data sources. We have programmatic data and so on. Administration data, in that case the claims data from Medicare and Medicaid have been very useful.
Really what we probably will be hoping to see from electronic health records and from that whole area of clinical and clinically-related data. We sort of view them as whatever their original purpose, and we want to do the best to meet those needs, that we view them also for their data value. We view them all, whether it is on the health side, the public health side, the human services, the child welfare side.
To think of them as broader resources for data that can help people make better choices, help us understand how well the health and human services systems are doing. Basically to improve health and health care, there are a number of uses. So that when policies are made or data collection vehicles are created or changes are made in program systems, or in EHRs, that we view them all as ultimately aligning and converging. Not necessarily standardizing, though that is conceivable down the road, but at least heading in the direction.
There is a view that how the data that will be made available from those various streams be made available for analysis, obviously under the right privacy protections and so on, if it is identifiable. When we are designing systems and modifying systems in the portfolio, that we look at how they can be related in a way, so that the data can be comparable and so on. We have taken that principle and sort of in the data collection portfolio, we are looking at, number one, don’t do harm. Don’t create problems and make it worse to be able to do that.
Number two, to actually take steps to understand how you relate to the other data systems and to the health echo system generally. Number three, to actually try to coordinate and make comparable and potentially even standardize. Although standards, you have to know when it is the right time to do so or when you are actually losing good information by doing so.
Remember we develop standards for demographic kinds of information as the Affordable Care Act asked us to do. We developed standards that all the agencies are using in our surveys. That was for race and ethnicity, a fairly granular set of standards. For disability, one overall measure of disability that the census uses. For primary language and for sex, biological sex and so on.
Those standards are up on our website. They are now being implemented in all of our population-based surveys. As a second step, administrative data is a little harder, as you know, because every program has its own rules about what data you need and so on. Even there, there is potential for some sort of standardization.
What we did was some of our program agencies, community health centers and substance abuse grantees and so on, were asking about how could they know how people are getting new insurance, for example, under their Affordable Care Marketplaces and others who really are many of the clients they often see on free or sliding scales. They might have insurance now. To be able to analyze the data they had and look at what their clientele would look like and what the trends might be.
We were asked to provide them. I would call it more guidance than standardization, but they were looking for some demographic descriptors, age and sex, race, ethnicity and so on as primary language. Then some measure of income. We did that in the way of income is a tough item to get on these sorts of things. What we did was give them guidance about how they could ask cut-offs in income that correspond to the federal poverty guidelines and principles.
Most of the eligibility for subsidies in the Affordable Care Act remember were related to income. We provided that to the agencies. They are now including it. It wasn’t a hard mandate. It was this is what you need to be able to understand the clients you have, what you may be seeing. They have used that, as well. Again, these were more or less the same standards we used in surveys and research.
Finally, we were asked to look at how vulnerable populations who were always at greater risk of disadvantage and poorer health and so on, how can we be sure that we are monitoring and assessing their status as we move forward. We began looking at additional groups to provide some standards or at least some guidance. Now most of our surveys are now adding questions about LGBT-related characteristics. We were asked to look at the rural population and the disabled population, as well. We will be looking at those down the road.
Again, this is to sort of place to have our major data systems and data resources be able to at least be comparable and have some core similarity and comparability. Then more recently, our leadership asked the data council to look at in very high priority policy areas, can we begin to look at standardization of questions and surveys and about preferred authoritative data sources in these areas.
There were four areas they asked us to look at. One was, as you can imagine, on health insurance coverage and sort of look at the portfolio of largely surveys, household-based surveys that we support along with census, and see if we could agree on. Number one, look at the variability and how we are asking these questions now, and the variability among the data estimates.
There used to be a lot of variability in health insurance coverage estimates, what proportion of the population was uninsured between the census bureau and HHS and even within. It was the nature of the questions. Well, it was really the concepts. It wasn’t just the questions. It was the concepts themselves. You always have to go back to the concepts.
What we managed to do over a while was to kind of standardize. If we have differences, and it really it is a different concept to say what it is. Now we are much closer. The MEPS and the Health Interview Survey are much closer. We had a workgroup of the data council look at all of those surveys.
When I say all of those surveys, there are probably 12. Given the size of the GDP and health spending, this is a pretty small amount. This is about $400 million for all of the major surveys in HHS.
We had a workgroup of experts look at all the surveys. They looked at health insurance coverage, how do we measure, what kinds of questions do we use, what estimates are preferred as the more valid or appropriate ones? We looked at the very tough area, mental health and substance abuse, which we have some very good surveys in this area. These are big. It is hard to measure some of these concepts. The whole field of epidemiology and mental health and, as Vickie knows, substance abuse, it is really more challenging, much more than some of our other surveys were.
We did have some recommendations there that we were trying to at least give guidance to folks, so that we won’t have one-offs created all the time. A third area was in the area of tobacco use. Again, if one of our top public health problems remains cigarette smoking and its impact on health and so on, there are a number of other efforts in HHS to begin some prevention efforts there.
I think the actual smoking prevalence is actually down to its lowest point in a number of years now. But every year, there are particularly young folks picking up cigarettes again and initiating smoking. Now we have electronic cigarettes, too. I don’t think there is a consensus yet on whether that is good or bad, whether it is reducing harm or whether it is just adding.
Again, we have to collect information. We are a science-based, evidence-based agency for the most part. We like to have evidence for decisions we make. Then the fourth area, I think, was in the area of LGBT. This is an area that probably is the most recent area in civil rights and civil liberties, and a number of policy directions are underway. HHS was asked to see. It is hard to make policy if you don’t have the data, the population itself.
We have now begun adding those more or less standard questions to most of our major surveys. Again, nobody terminates the interview because they are asked that question. Income still remains the most sensitive question in our surveys. That is why it is usually the last question.
At any rate, we are still in the process of looking at and perfecting those recommendations. We hope to have guidance for our agencies to begin implementing. Some of them have already begun that to begin to standard. This is again on the front end on the data collection, so that when we do make it available on the backend, we understand what we are measuring. We are trying to view this more as this is this survey, this is that survey. We are trying to put this together more as a general way of knowledgeable information.
Let’s see, I did want to mention we have an interesting area for the workgroup probably down the road. We have been asked, as we usually are in HHS, to help with the drinking water emergency in Flint, Michigan, which really looks like it is not going to be, as the folks in preparedness say, it is going to be a marathon to address this, not a sprint. We have a number of HHS agencies working with Flint in terms of providing their area of expertise and so on.
Interestingly, one of the first questions that comes up is what do we know about Flint? What do we know about these areas? What do we know about the insurance rate? They’re testing the kids for lead levels in blood now in all of that. Some of this is off the shelf information, the population information. But some of it has to be reconstructed for this effort specifically. We have sophisticated visualization and mapping available to our first responders and planners. We know where all medical facilities are. We know where folks who are reliant on the power and so on. We have been able to add those variables to our med map it is called.
But it just seems to be an issue each time. We are scrambling for it. Maybe that is the nature of preparedness, that the situation, as much as you prepare, your plan is the first casualty when the event occurs. I don’t want to ask them just yet, Vickie, but this can be an area down the road where maybe the workgroup could tell us if there are resources and how we can make it available to help down the road.
I wanted to say one last thing again on some data improvements. Since are monitoring the health insurance more closely now, we are doing it pretty much on a quarterly basis now. The department provided some support that for the National Center for Health Statistics to speed up the availability of the health insurance data that is available. This again was one of the recommendations I remember the workgroup discussed earlier. The sooner we can make this data available, obviously the better it is. If it is stale or historical, it is still useful, but it is not as helpful. Then other one-offs start to arise where the information can be misleading.
The NCHS using the resources we had was able to speed up the data dissemination from the Health Interview Survey. It is available on a quarterly basis now. It used to be available in a six-month lag. Now we are probably down to a probably three or four-month lag. We will probably keep improving.
We were able to disseminate the findings from the first three quarters of calendar 2015 last week. You can get a much better picture of was the health insurance coverage getting better, where was it getting better by state and so on and a more sophisticated look at who is still uninsured. I think we are seeing now probably the lowest uninsurance rates since we began measuring, as someone said, on the health interview survey of 1957, which was way before our time of course.
At any rate, it is nice to see the trends coming down. Some states are very low down to like 4 percent or so, even less than that. But there are clearly groups. Again, this shows you how valuable data can be. Otherwise you are flying blind. Who exactly is insured and not insured? There are probably half a dozen different segments of the uninsured population that are very different from each other. One strategy would really lead you quite astray. Again, this shows you how the data can help you think in advance. Some folks, it is an affordability problem even with subsidies.
Again to tie this all back, what good is the data if no one else sees it. What we do with that in Roman data and health interview survey data is make it available in a public use form. The enrollment data is administrative data. We make that available in machine-readable form. I don’t mean the individual enrollment records or the identifiable information, but maybe a ZIP code area or state or whatever and some of the demographics. It is made available in machine-readable form, so anyone else can look at it. Hopefully they will find things that we didn’t see. That is one use.
Then the Health Interview Survey is largely besides the reports themselves, you otherwise have to be an aficionado to be able to use the public use files or the restricted use files. It is not the sort of thing that will ever be available on its own like that for consumers and general use. The information from that, of course, could be turned into reports and authoritative health guidance and so on that could be used. Maybe I will stop there and see if there are questions.
DR. MAYS: Lilly, are you seeing the icons that will tell you who has their hands raised?
MS. BRADLEY: Yes.
DR. MAYS: So I am going to rely on you to get the online people. Then I will do here. Questions here? DR. RIPPEN: Knowing how much thought and fun and pleasure there is to creating recommend standards as far as data elements, and again great work on the whole challenge of race, ethnicity, disability and so on. Then also some of the areas that is actually the area of focus, the four topics that you highlighted.
I would like everyone to consider maybe that when these gifts, I would call them, of standards are provided, that there might be a mechanism to reinforce their use in the private sector and non-federal sector. There are many different ways to consider doing that.
There is obviously the way for grants. When you have a grant, and someone is collecting data, that they use the standard. Then also kind of somehow reinforce its use because if everyone uses similar standards, not just federal, without a dictate, but really saying a lot of thought when through it, and even having maybe a little background as far as why those items were actually chosen because everybody likes to have their own argument, what we might be able to do is accelerate their use.
I would just like to kind of highlight that. A lot of people are collecting data. I know it would be nice if we can use wisdom in doing so.
MR. SCANLON: I should mention, Vickie, that again we asked the committee when we started the workgroup to give us advice. I think we have the workgroup looking at different data and portals within HHS. I think we started you off with the HIS, and we got some preliminary. Then I think we asked if the workgroup would take a look at SAMHSA, which has a nice data portal.
It is interesting. Every agency has different ways. I think we will hear about that this afternoon, maybe some reactions. Then I have a couple of candidates in mind for what the next ones might be. We will wait until after SAMHSA to do that, as well.
DR. MAYS: Lilly, let me know. Can we just clarify a couple of things so that I have a sense for the workgroup kind of what is possible? I like what Helga was just saying in a sense of, well, it is one thing that it is just the HHS datasets. If we take HHS more broadly, it begins a lot bigger than just your datasets. As much as do we think about NIH? Do we think about having the Institute of Medicine, for example, come up and talk about something like what would be a core set of measures that one would use in demographics? And whether or not there are ways in which then to get others to use the same core set.
For example, with the NIH data, if you spend over a certain amount per year, if it is funded certain amount per year, then that data has to, at some point, be made public. I am wondering whether thinking about like how far out to push that might be something that we should do. Are there ways to get the foundations to do it as they give money, ways to get NIH to do it?
MR. SCANLON: We have been at this demographic data standardization for a long time. To be honest, when we looked at our major surveys with census, in fact, I think it was the NCVHS that actually had a workshop on this, we were actually very close on many items. There were some items where we were different for major demographics.
The committee here has done some work already. Really a while back, we try to standardize. Race and ethnicity were standardized almost by regulation. Then age and sex and how you ask about age and so on, I think we more or less standardized those across HHS.
What we did for the Affordable Care Act, we could only apply them. The secretary had to adopt them. They had to be used by the way the language was worded was that in all federal surveys. The lawyer said we can’t tell other agencies what to do. I think if it is a good practice, and it seems to be working, there are ways to advertise that it is available without making. I would not ask the IOM, to be honest. They are not the best thinkers in this area. They have a sort of one-size fits all. There are parts of the Academy that know a little bit more about this.
I think the idea here is rather than mandating that we hope that the positive force of having things that work, and which allow you to compare your sample study group with the what is happening in the HIS or other big surveys, which is an immense value to a survey rather than just know a one-off sample somewhere and you don’t know how. If you have those similar questions, you have a very powerful way of comparing.
In the second instance, Vickie, I think you are right. We might advertise that we have adopted these questions. Other agencies in the federal government have already begun asking us. We would like to ask about smoking. We would like to ask about sexual orientation or something. What should we do? Again, if you saw some of the questions they were making up sometimes, it would make your hair stand up. It is really one-off.
I can’t imagine that people think you just sit there and make up a question and it is going to work. In many cases, the best solution was to dissuade them from asking their way. In other cases, we could give them something positive. Like VA, education and so on will be using the sexual orientation questions.
But I think the idea of advertising best practices or at least good practices for others to use. Again, if no one knows about them, only a limited group is benefiting. I think that is a good idea. Again, I think the group here knows more about it. The way much of this data production works is that our agencies, we do what we do. The agencies collect the data, either primarily or as a byproduct of administrative processes or support research.
Then we make available what we have in the way we think and the way we know, which is not necessarily the way the internal community. It is just one of the things we learned earlier from the group here. It looks at this. It is really a different perspective about what you make available, how you make it available, what the expectation is? That is where I think the workgroup has been especially helpful.
DR. MAYS: Let me just ask a question. One of the things we are starting, and we haven’t launched it yet, is a blog. Maybe there is someone who might want to do a little thing on the blog about kind of what you have put together. We will talk, Josh, about that.
MR. SCANLON: Now that we have so much activity in social media and other areas, and folks are beginning to look at that, we talked about this here in terms of measuring sentiment and other characteristics, and in many cases it is qualitative, but there are methods. Down the road maybe, if the workgroup could think of when there is some bandwidth and time, we have some experts in HHS. That whole area of analytic techniques for social media is an area that we could benefit from some of the better thinking.
I mean, we know about work clouds and all of that. We know about Google Analytics and all that. There are actually analytical software and other programs that people are using to make much more sophisticated sense of some of this largely qualitative social media information. I think it would greatly benefit HHS to hear about some of those. I don’t want to add another requirement or ask just yet. But down the road, if we could think about it.
MR. SAVAGE: So a question came up for me for the framework that we are going to get to. It sounds like it might be appropriate to ask here, too, about standards for language access on the HHS website. There are standards for accessibility. Are there standards for language across the website? I will ask that, and then I have some thoughts to share, as well.
MR. SCANLON: I am not the expert there. Maybe Damon knows better. Clearly we have the famous 508 standard for visual accessibility and hearing. It is a good question. This came up when we were developing the standards. Should we have a component that looks at what language?
There are clearly requirements for the health care setting under our civil rights laws and other places and CMS for access to language services. But for the website, I would have to check unless Damon knows.
MR. SAVAGE: So to just throw out a few facts, there are about, according to the census, 60 million people in the country who have some form of a disability. Also about 60 million who speak a language other than English at home. The need is equally great.
There may be some added imperative here with section 1557 of the Affordable Care Act on non-discrimination for where there is federal financial assistance in the health care setting. The proposed regulations from OCR talk about the top 15 languages. There is a question about whether that is by nation, you know national or by region. It is very important in the health care settings. Since we are trying to make this data accessible across the country, I raise it here.
MR. SCANLON: We issued a proposed rule. Work is underway to finalize that rule. We got a lot of comments on the language, accessibility, provisions. But for the website, you raise a good question. I don’t even know what the agencies have as policies. Like NCHS, I am not sure. Kate, do we alternate language when we publish anything? I am guess not, but maybe Damon has a better sense. I am just not an expert.
It is a good issue, though. I hear you exactly when we were developing standards for granularity and ethnicity, for example. We looked at all of the detail in terms of the populations and the languages spoken. We sort of had to set off a cut-off of what the groups to use, the subgroups to use, under the granularity. We are very familiar with that. We were even proposing that this was the core, and then everyone can ask additional questions relating to the ethnic makeup of the area in the terms of disabilities, as well.
We did include the disability question. That now should be part of every one of our surveys. You could get an estimate now of the population in that area. You could actually compare it with the national estimate. But it is a good question. I am not sure about the requirements for language.
DR. RIPPEN: Just kind of following up on the data sets that kind of are at least thought through and recommended that you are using. I always think of data clearinghouses. We all think about, okay, we can go to a place and get the data. But how about a place that says here are the data elements?
If you think about LOINC having questions, again the question of whether or not someone like LOINC or another entity, could actually be a place where people could use downloadable questions as with answers, as a way to accelerate. In general, if you are developing software systems, it is the lowest bar as far as what you can get quickly. The question of what data do you use, what question do you ask, it might really be an opportunity.
MR. SCANLON: Actually on the EHR side, and I think on the HIPAA side, Walter knows more than I do, there actually is a set of demographic data standards, I think. Is it HL7 maybe? I think the committee actually helped work on that.
It is not quick, though. Nobody does anything quickly in the standards realm. For electronic health records, where we have to use some different vehicles to get the standards like the certification standards and so on, there are always folks at HHS that are trying to get the regular demographic standards, the HL7 I think they are and then some of the others are based on. I think in that world, folks like to have a standard that has been approved by NSDO already.
For example, in sexual orientation and gender identity, we actually gave them questions to use. I think the policy committee looked at them. But then they went back to something like the SNOMED or DSM. It wasn’t exactly where we wanted to go because we looked at it is a demographic, not as a diagnosis. That is where you go wrong because we didn’t really want to consider this to be a diagnosis. I think they adopted something from SNOMED maybe.
DR. RIPPEN: That is what I didn’t really want to say Loink because it implied the road to EHRs. There are many organizations collecting data from a community level.
DR. MAYS: I am wondering if that is one of those spaces where we try and get some of the data entrepreneurs to think about that, as well. That is the kind of thing of being at Datapalooza or someplace where you start saying, what are the top five things that we would love to see some data entrepreneur do? We were just talking about what would we do at a Datapalooza. That is the kind of thing to put out in that space, and to see whether or not they would develop it. It may be something that we would think about.
PARTICIPANT: I tell you honestly demographics would be a very —
DR. SUAREZ: This is Walter. I was just going to comment that as Jim was describing this topic of demographics across the spectrum of standards and the standard development world, generally most of the standards refer to a standard body outside if there is one. For example, they administrate transactions. When using a claim, are you doing enrollment? Most of the demographic documentation requiring the document in the standard refer out to an external standards code set, if it exists.
To give you an example, for Zip codes, we referred to the US Postal Service for the coding. I am thinking of some of the demographics like, for example, gender. It doesn’t have any outside body that maintains a recognized standard. There are many. That is a good thing. There are many.
For example, on race and ethnicity, yes, there is an OMB standard. There is a census standard, those kind of things. It would be valuable to map. I was just thinking that maybe these already exist, and I haven’t looked at it. How each of these elements, like race, how does it map between the administrative transactions, the electronic standard transactions like EHRs and those kind of things. Hopefully we will find that there is a lot of alignment. But I am sure we are going to find that there is some important differences.
When I send a claim, I codify race and ethnicity using a particular code set. When I am capturing it in an EHR, it might be a different level. That would be an important difference to highlight.
MR. SCANLON: With an attempt made or voucher, you will remember where there was a federal standard or at least the regulatory standard to use that one and propose like for the HIPAA claims. If you were going to use race ethnicity, it was largely the OMB one. It is because federal agencies don’t have a choice. The federal agencies have to use those. I think there were for some other items, as well. Actually the committee, I think, and NCHS worked on some of those to try to have them be related. Again, it is hard to get agreement on standards.
DR. SUAREZ: USHIK was a resource. USHIK, the US Health Information Knowledgebase, I remember they were capturing those kind of components. For certain elements, which are the various places where they are used and code sets that are used? I think that might be a resource to start with.
DR. MAYS: Let me take Jim, and then I will get Josh.
DR. SORACE: The NIH is starting to develop a common data elements repository. I can’t tell you how differs from other efforts. But they are trying to standardize data elements across the various, NCI and those things. It does actually include some of the elements that you find in more health services research and the disease phenotyping from the patient point of view. That is something that you might want to give some thought to.
DR. MAYS: So is NIH doing it in terms of the data they collect? They each have their own.
DR. SORACE: I am new to this. I discovered it about four weeks ago. It is an effort within the NIH. I think it being driven by their big data knowledge programs to make it compatible. I can send the link on.
MR. SCANLON: They also have, on the research side, Vickie you probably know, Promise which is an attempt to pull together terminology and questions and so on from various areas there.
DR. MAYS: It actually works. What happens is, on some of the websites, you see people going back and forth asking questions about Promise. Then they use it. Josh?
DR. ROSENTHAL: I was just going to go back to just a couple of things you said, Jim, and I will be real brief about it. Part of the conversation is always can we create standards. That is great, and that is really meaningful. There is a lot of data up there that actually doesn’t necessarily fall under standards.
It would actually be really helpful to have published whatever was used in the first place. Specifically like entity relationship diagrams or schemas, like rather than creating a standard and enforcing the current data into a standard or describing it. Actually just explain whatever was used to create the thing in the first place, the meta data, the entity relationship, as well as the attribute, right? Going through and doing that would be immensely helpful.
It is kind of a bottom-up approach. When we are talking about how can you get people to do it and enforce it, that was going to be the first step. Take what we have and display the thing that was actually used to create it. If you are talking data entrepreneurs or even very sophisticated folks, our first is always to try to recreate and say went into the PRD? What is the nature of it? That seems like that would be a huge help as far as option, rather than kind of a standards top-down approach to explain and describe what we already have. This bookmarks that.
Real quickly, you had mentioned social and user feedback. I think we will go through this a little bit later today or, if not, at the next session. We also might want to think about like hierarchies of distribution. You have enclaves, you have data centers. That is great. That has a higher bar.
Then you move into destination sites like (indiscernible). Then you have meta sites, like linking and distributing, like Data.gov. There are also other things that we have talked about previously, like data browsers, right? There are a lot of HHS data, Google for instance, in the public data browser. You have a ton of usage on that, that you are not able to track with internal metrics.
Or you have additional distribution mechanisms like ProPublica. We just did something US News and World Reports. Now they are using Part A, Part B, Part D referral hand (indiscernible), right? That is going out to millions of people. Thinking about a broad ecosystem and how to capture that and look at it would be highly helpful.
I will just say one other quick think on Health Datapalooza. I am organizing that with Paul Wallace and Jon Blum. If there are entrepreneurs or you are thinking about things, we are building those tracks right now. Please do send them to us. I should say the difference we are doing this year at Health Datapalooza is instead of just saying, hey, this would be something cool to build, we are really highly keyed on a value proposition, why should someone do it.
I am happy to have that conversation, but definitely want to have that piece of that. Why wouldn’t entrepreneurs or anyone want to do something? There is a very good reason. They have mentioned that a lot of this public data is being used specifically by commercial, not just startups, the big players, which probably is all outside the scope of view of what we have been looking at today. I kept it brief.
MR. SCANLON: I agree completely, Josh. I think in terms of standards, you are exactly right. There is the question of standardization versus technology. I think most of the various sets we have made up, and this came up in HIPAA. It always comes up. Is it too soon? Why don’t we let technology solve that problem for us?
Knowing when not to standardize, I think, is important. As you said, most of the datasets and tools that we probably have up available now are that second instance that you described. It is not necessarily standardized, and it might be implicit, or some of it might be comparable. But it is largely whatever we collected. The question there would be are we describing what we have there accurately?
That is when the advice you gave us about tagging, I think, was very helpful about how we have used. We are very much interested. I really think in HHS, we still don’t have a lot of experts who understand that secondary use that you are describing. The ways of multiplying and communicating the data further, without even any intervention by HHS. Any kind of ideas along those lines that we could share with the agencies would be very helpful.
DR. MAYS: Let me see if there are any other questions online, comments, questions? Okay. Let me just make a final one before we kind of wrap up on this part. You asked about Flint and talking about what we might be able to do a little further down the road. One of the things to even think about now is in the area of mental health. One of the things during disasters is really monitoring the Twitter feed. They actually get a sense of what it is that people are complaining about, what it is that people need. It is like real-time. Even now in Flint, I don’t know if they are looking to see what people are tweeting out. It gives you a sense of people’s needs as well as kind of when people are asking for help or what they are complaining about.
MR. SCANLON: The same thing comes up, Vickie, in every flu season or Zika or Ebola or whatever the emerging disease will be or whatever other worry will come up. There may be a group looking at that. I don’t think it is HHS. I would guess that it is fairly rudimentary. I don’t know if there is any advice the workgroup could give us.
I don’t think it is proprietary, but maybe down the road I can arrange a briefing for the workgroup a web briefing on the med map application, the visualization that we had.
MS. BRETT: I am Kate Brett. I am a team commander of an ASPR asset. There is a cell in ASPR called the fusion cell. That is what they do. In fact, one person 24/7 is doing social media. They are on top of that. They are in the operation center.
MR. SCANLON: I think our preparedness response has gotten much more sophisticated. All of those are being used, then. Again every event is different, but there are some similarities to every event. We have protocols. We have concept of operations and so on. There is always specialized information that is needed.
I mean there is no substitute for actually going in and measuring the blood levels, for example. There are probably some principles about what is useful in this situation that can be prepared in advance.
DR. MAYS: So I think what we will do is that will be on our back burner, but we will definitely take a look at it. Okay, if there are no online comments or questions, what I want to do is move us ahead. We are actually pretty close to being on time because Damon wasn’t able to join us today. Part of what I want to do is introduce the schedule changes.
As many of you know, when we started talking about looking at the websites, Chris Fulcher, Chris, are you online? Chris was able to connect us to Dr. Neely Curran, who actually is at the University of Missouri, right? She had volunteered to actually look at one of the websites that we wanted to take a look at which was the SAMHSA website and give us kind of some rubric of how her group approach is doing this work, which in her lab they do quite frequently.
When we looked at the rubric, we said this looks great. Let us add in some of the kind of data elements that we are interested in, as well. I just got an email yesterday from her. They have actually done their examination of the SAMHSA website from a perspective of a user.
She is online. I told her to check in with us around 2:00. What I would like to do is two things. One is make sure that Lilly has her slides. Lilly said there is a slide deck that she has. If we can get that up, that would be great. I also distributed things to the staff. Those of you who are online, if you want to look online in the event that there is any problem with getting it up there, you will actually have it. Dr. Curran, are you on? Welcome and thank you very much for being a good citizen and agreeing to do this, to give us some insights in terms of reviewing the website, the kinds of things that you found.
If you could start one by introducing yourself in terms of anything else that you would like the group to know. Then two, to talk a bit about the framework that you use to approach this, and then to tell us about your results.
MS. CURRAN: Thank you for having me today. Just to clarify, I have a master’s degree, so I am not a doctor. But I have over 10 years of experience working in usability and user-centered design. We have a research director in our lab who oversees some of the research.
In addition to myself as a research director, we have a team of doctoral students who are trained in information science and learning technologies. Within our lab, we train them to evaluate websites. I use this as a learning opportunity for our students, as well as to help maybe provide you some initial feedback about the website.
There are, of course, many approaches to understanding the functionality and use of a website. For our study, what we did is we had several researchers independently evaluate the SAMHSA website, focusing on some key features of usability. What we provided is a rather lengthy report. I don’t know if it is showing on your screen there. It would probably be better reviewed after your meeting today.
MR. FULCHER: Vickie, I know she sent the report to Lilly. I am just wondering are you able to get it up on this screen?
MS. BRADLEY: Just give me two more minutes. I had to change the settings.
MR. FULCHER: You bet. I understand. Just as far as setting context with Neely, I wanted to add the existing websites that she does evaluations with her team. We are working with them on a Dell Foundation project, as well as really getting on the front end before a website is even created, is really working on this UX design process, user experience. Neely, you are in midstream with the usability testing with something already in place. I didn’t know, while we are waiting for that document to pop up, did you want to comment on that briefly?
MS. CURRAN: I think that would be kind of one of our main recommendations really is if you are considering redesigning or changing some of the design of this site is to really focus on that iterative design. So iterative design testing means that you test early and test often. As you have developers developing prototype or wire frames, you want to bring in the end users.
Vickie sent that spreadsheet of the rubric. I saw you were working on some additional sort of criteria that could help sort of in a way gauge the site’s value or usability. I see that you identified some of the key users, the researchers, entrepreneurs and consumers and community users.
What we would recommend is as you are designing this site is bring in those users and ask them questions. Have them navigate through the site. Understand what their struggles are, what their needs are. In that way, in a really very kind of effective way be able to understand what their needs are and see how they are actually using your site. That is the process that we would recommend in terms of moving forward. Of course, that is part of our work is to sort of bring the voice and experience of the users into the design process.
In terms of the recording, you can see there on page three kind of describes about who we are and the process. Some of the summary is that there is just a lot on the site. I am sure that you are aware of it. Just the overabundance of information and just contributes to users being overloaded with too much visual stimulation and not awareness of being able to achieve the things that they might want to accomplish on the site. Again, by understanding the user’s needs that can help you better design the site.
DR. MAYS: Let me just make sure that everyone knows you all use the SAMHSA website. Did you go to a particular — I mean because the SAMHSA website is big. It has lots of things. SAMHSA website is used for everything from people who have had some event occur, and they want to find out a little bit about issues to where the actual data is stored that you can use.
Just to help us a bit, if you could say exactly on SAMHSA where you went and what part you were looking at, I think that would be helpful. Then in the interim, I will ask the staff. We don’t need to see each other, so I am okay if, during this part, you put the report up so that it is a bigger part of the screen.
MS. CURRAN: Before this site, we focused on the SAMHSA.gov/data. We didn’t address the overall site. We looked at the data section. Looking at the page four, sorry, I know that’s not the most digestible form for everybody to read on the screen there. I think for the purpose of this short discussion, I will just focus on those few first points on their initial recommendation. Then I am happy to take additional comments or questions offline.
That first recommendation was first to consider that iterative testing and the redesign of the site. Again, that involves interviewing the users, understanding their needs, watching and observing how they interact with the site. Maybe you pose a question related to a dataset of their interest. Are they able to use and manipulate the data in a way that they need for their research or entrepreneurship, whatever it may be? That process we have had, as I said, over a decade of experience incorporating those users and their needs into the design of the site.
I think maybe as I joined, it seemed that you were touching on this, as well. I heard Josh Rosenthal bring that up about the data visualization and manipulation. You were talking about standards of that and having it open and flexible. I am sure you are aware of this is that much of the data — we did some sort of competitive analysis to understand how other agencies or non-profits, and see how they are offering the sophisticated data visualization.
From our perspective of the short time that we spent exploring the SAMHSA site, it seems that the data is not as acceptable and usable as it could be. I was looking at some of the documents. I just happened to pull up several here today. It was a 178-page document of a dataset on your site. I just had questioned if while it might be useful, but are there other ways to present the data that can allow researchers and your end users to manipulate it?
Actually I had recently talked with Chris Fulcher and see that his work on the community commons. I put that up there as an example because I think the site allows for you to query the data and do some very sophisticated analyses. I would just suggest that.
(Cross talk) not just about the data itself. There are stories around the data. It provides some examples and meaning to the data.
DR. MAYS: Let me let Josh ask you a question because it may be in the data visualization since that was something he commented on. Josh?
DR. ROSENTHAL: I was just going to sort of ask a question, but sort of also frame up what we are talking about ever so briefly. On the previous meeting and the previous phone call, we talked about this in layers. I sent over some material too late, and I am flushing it out right now.
The basic idea is when we are looking at these sites, there are kind of four layers in terms of does it work. There is the data, and Walter, that is like your wheelhouse, right? Is it usable? Is it useful? Is it recent? Is it complete?
Then on top of that, there is this thing called information architecture, IA, right? That is literally how does someone find information. That is what I was referencing in terms of a meta site. If you go over to Data.com, you will see them pointing into SAMHSA. You will see them with a schema, and you will see them with bread crumb trails, and you will see them with tags by source and like clear minimalistic layout. That is one way to approach it. That gets into the meta data and the tagging that, Jim, you were talking about.
Then on top of that, and once you get those few things solved, then there is another thing on top of it. That is the UI UX, right? How does someone look through it? That is what you are talking about is just to scope out the nature of this part of the conversation.
You can either do a top down checklist, which is great. It is also worth knowing that HHS has done like a good job on that in terms of like look at Data.com. There is a separate issue that like Damon brought up, how do we get the individual sites matching with budgeting and blah blah blah. Nonetheless, there is a model for that.
Then on top of that, there is an ecosystem interaction, so we push it out from a destination site to a browser to a partner to a channel, et cetera, which is the other piece. Part of it is a question. Part of it is also when you are doing this evaluation, you are looking at the UI UX stuff. I just wanted to clarify like what I am talking about is all of these layers basically.
This is great. But the IA stuff is different, as is the data stuff and as is the ecosystem stuff. The responses from SAMHSA, we have to do this, we have to do this for these reasons. Nonetheless, you can solve the objection by fixing the things in the other layers. You don’t have to compress all the interaction of the UI and UX. You can solve it by getting the foundation and the distribution right. I just wanted to throw that out as a clarification. Also because we keep having the same piece of the conversation that it is data, then IA, then UI UX and eco system. This is UI UX, right?
MS. CURRAN: Right. I think that you are an expert there in offering some very excellent advice in terms of understanding the layering of the overall experience of the users and building a site and interaction. In a way, some of these initial recommendations cover that, some of the information architecture.
We mapped up the information architecture of this site, just so you it could be seen in a visual way, so that is on page 20. Just so you can see the information architecture of the SAMHSA site. Again, it is useful to see the underlying structure of the site. There are actually ways to test as people are able to find things, if the information structure best serves the users. There are many approaches to being able to maybe adjust the information architecture and address it again at that underlying level that the user doesn’t see on the UX, the visual layer.
DR. ROSENTHAL: This is the IA on like the site itself, right? That is fantastic. This is brilliant, like awesome. When we are talking kind of about the data, so like this metadata or this ERD or being able to look at it by a tag, the nature of not just a successful site, but a successful site where it is essentially a data repository, so when can the user navigate through here. But can the user navigate in a way based on like what is actually in the data? That is like the ultimate brass ring and being able to get kind of smoking hot. It is being able to take the metadata from the site and put that into the metadata like in the IA, like in a shared taxonomy.
For instance, if you go to Data.gov for a second, I see data type, right? That is an element of the site. That is really helpful for me. But now I need to push it into the IA. It is sort of like a little bit of a parallel conversation to sketch out for everybody that there is a question of like what is in the data in the site, and what is the AI of that? Then how do you build within the metadata of the data, and then how do you push that into the site?
DR. MAYS: So it is sounding like what we are talking about are almost two levels here. The first one is the user comes in, and it is the site itself. Then we want to deal with the data itself. You may get there, and it may be great in terms of the access. Then if we don’t also take care of what is happening with the data, how the data is set up, the ease by which to use the data. The site itself may have great bells and whistles, but it may be the data is just plunked down without a lot of sophistication.
DR. ROSENTHAL: Exactly. The lynch pin is in that tag or the meta layer, right? You can have minimalistic font. You can have wide margins. You can have a good, clear site map. All that is great. What they did at Data.gov, I think, is a pretty good model to work from. When you go there, that points into SAMHSA, and it has great UI and UX. It also has a first cut, a very rough first cut, but a piece of it.
Not is the data complete, but what are the elements? Is it part of a geography that I am interested in? Is it a machine-readable file? Is it an Excel sheet? Is it a topic basically, that sort of stuff? I won’t hijack the conversation, but to say that is exactly right. There are two different levels. The way to connect them and make them successful is by the elements that are in the data, pulling those out into the elements that you search and filter and navigate through in the IA.
MR. FULCHER: It is two levels. Correct me if I am wrong. I think there is one around the navigation to help the user get oriented with masses of data or masses of actually content and links. It is just the navigation to get what you are talking about, Josh. What we have done with community comments around data and our data engine, and Neely you saw that again earlier this week, is really a much more simplified approach. Did you want to speak to that from a usability standpoint?
MS. CURRAN: In terms of the metadata, it is the data behind the data. Right now, what I am seeing from the SAMHSA site is there are a lot of PDFs that tend not to be very searchable and accessible. You can’t layer the data. You can’t extract it in a way that you can then manipulate. Also it is not very searchable. Adding that layer of metadata and the under layer separate from the information architecture allows it to just enhance the user’s experience and enables them to us it in a more meaningful way.
DR. MAYS: Let me take a question in the room. Then we will come back, and then we are going to see if we can wrap this up. Jim?
DR. SORACE: I am just curious. Are we sort of beginning to approach the issues of querying a federated data model? Maybe that is the wrong technical term to us. Basically you have this issue with SAMHSA, you have this issue with FDA, you have this issue with NIHS, this issue with CMS. All of them often use the same standards, although they may have varying levels of granularity of implementation. Do we need to have a data model powwow, so to speak, to start to get a feel for what is feasible or not feasible?
DR. ROSENTHAL: There are kind of two approaches. We are starting to get into, hey, you know what is really important? Having a site that you can navigate, has good AI and metadata. That is important so people can find what is going on. That is great.
Then we are also starting to get around, hey, putting the data out there. That is great. Now we need to figure out that metadata piece. There are two approaches to it. There are always two approaches. There is the top-down, create a standard, get a powwow, define it retroactively, try to implement and update all the current stuff. The top-down approach, that is good.
Then there is the bottom-up approach. Somebody has data models. When you produce a set, when you publish that thing, there is an ERD. Somebody has a model. Somebody knows what the entity is, what the relationship is, what the attributes are. That thing exists on somebody’s desktop. It is not PHI. If you publish that, like that is a bottom-up approach. Say hey, let’s make it transparent. We have liberated the data. How about liberating the information to make the data understandable.
The bottom-up approach would say, hey, liberate that stuff. Then you can actually see what is actually out there and have people have a hack at that. That makes it much easier to sort of go around and kind of figure. You can implement that through UI. They are showing you schema and top-down taxonomy. You can do it bottom-up. I don’t want to say one is better.
But the easiest thing to do that gives you real visibility into like sifting through it is by publishing the stuff that is already out there. Then once you get that stuff published and you see people using it, then you can do a top-down approach. The federated piece, I think, is absolutely huge. You have a bit of that with that meta site like Data.gov, like doing some of that. Essentially stripping out all of the UI and UF metadata work from SAMHSA and putting it at Data.gov.
It still isn’t published. If you were to take the bottom-up approach, you would publish what you already have, rather than kind of getting all the hidden documents together and doing a top-down approach. That just gives you speed, efficiency, and you are basically taking a hard look at what you have already put out there.
One other final thing, that allows the distributors to pick it up, right? When I go to Google public data browser, I see HHS’ taxonomy. Not the one that they had a data powwow about, but the one that I see a reflection of the data as it actually is being indexed by Google, how it is actually structured. There are a couple of different approaches to it.
DR. SORACE: I actually think we would have to do bottom-up first simply because it is the only way we gain knowledge of our own systems, sort of as educational. It would be an interesting thing to attempt. Also, I think one thing we could do, and this actually is a feasibility thing here, I don’t think we would actually have to exchange much actual data to do it. In other words, you can do this without the controversies of actually exchanging any data and data use agreement.
DR. ROSENTHAL: It is a very healthcare thing to do without all the debate and standards. It is a very anti-ICD version of it. You can do that.
I am in Alexa right now. I am looking at SAMHSA. I am looking at the scheme. I am looking inbound and outbound by user category. You don’t have to put it on a contractor’s like bid cycle to do that. It is already out there. I am looking at it as we speak.
DR. SORACE: I guess maybe the third question to ask basically are there any known software tools or entities, academic or other entities, out there that have particular expertise in doing this kind of analysis?
MR. FULCHER: I am trying to follow along on the phone. I may not be catching everything. Are we getting into a two-level discussion here? I am not sure because of Neely’s time in terms of the report. Is that kind of the top level of what we are dealing with, with the SAMHSA site and the conversation that we are having is at the secondary level? I am just trying to follow along in terms of the flow here.
DR. MAYS: I think what happened was that as we began to discuss her findings, the issue of both the data site and the issue of the data started to emerge. I think the discussion right now is really about ways to make sure that. I think the site is fine. It was also making sure that the data also has as much in terms of bells and whistles.
Here is what I am going to suggest because we do want to get to a couple of other things. Why don’t we let Neely finish her presentation, and then I am going to hold any other questions and comments until the end of that. It will merge us quite nicely into the other work that we have done in which we were trying to merge our kind of data issue with some of the site issues.
MS. CURRAN: Thanks. I see that both are giving very important questions in terms of the data and looking at top-down and the federated searches. Just to wrap up again the work that we did in our lab, I kind of showed you some of the information architecture as it is presently.
Again this was a learning exercise for our students. What they do is they do a design review. They go to the site, and then they, just sort of as an example, they say page 16 on the report. They just highlight areas of visual concern that they think could be improved or addressed in terms of the visual design.
The report highlights just some examples of some possible recommendations about how to adjust the visual design of the site from more of the detail of the nitpicking of the site. If you were to consider an overall redesign, you would want to first again address those multi-layers that you have been speaking of, the data, the information architecture. Then this actually comes at a much later stage of addressing these visual designs.
In the report, just to give you an example of the kind of work that can happen and recommendations for a website through user testing or, in this case, it was a designer view by our student researchers. Then the next page, page 17, the students research was to just a rubric. In an essence, it is just to understand what are some key criteria for evaluating a website.
What we did was we had evaluators look at different rubric and then combine them to create this one, again as a learning exercise. I wouldn’t necessarily recommend using a rubric to evaluate all the sites that are under your purview. I mean, the point of it is to understand some key concepts of usability, maybe general rate the site just as a starting point to understand what some of the strengths and weaknesses of the site are.
Each of the evaluators rated the SAMHSA/data site for those categories of language, content, simplicity. Then we gave it an overall score because there were nine sections of the criteria, each from a zero to six scale. Then we gave an overall rating. It showed 25 of the 54 scale, so a satisfactory site.
Again trying to assign meaning to the rubric, again it just gives you a baseline, like where are we at. I would recommend focusing on the other issues, the data and the information architecture and that usability, looking at that much bigger picture. It is a process that can help you understand some of the usability needs and things that we address through our studies. Again, I think the priorities are those things that you are addressing, those multi-layered.
I will go ahead and wrap up because I am sure you have a full agenda. I am happy to answer questions regarding the report offline. Again, if you get to a point of doing user testing, we have worked another large dataset, bringing in researchers, observing how they use the datasets and how they navigate through a site. That could be a point of engagement in the future, if you are interested in working with this in the future. Thank you for having me today. I appreciate being involved in the conversation.
DR. MAYS: Let me thank you and all of your students for the work that you did. This has been very insightful. We will very gladly receive this report and spend some time looking at it. I think it is very useful to us to see the approach that you use, kind of how you did your scoring and be able to learn from it and kind of do these kind of parallel tracks that we want to do about the site, as well as the data itself.
Just before we go, let me just make sure. Are there any questions for her before we let her go? Online?
DR. SUAREZ: One quick question, maybe you mentioned it and I missed it. What is the intent or the expectation or your actions that will be taken based on the results, I guess in terms of modifying, updating, changing? Is that something that you already are planning to do with respect to the SAMHSA database or data website?
DR. MAYS: Let me answer because she was doing it kind of for us to come up with if you did a data site, and you used the rubric that she used, what does it look like? What kind of results that we see? This was a preliminary. It was actually learning for the committee in terms of this is what her lab does on a regular basis as a consultant to agencies.
We wanted to get a sense of if you took one of these, what would it look like. What we initially had planned was to add our data questions in and to have our students do both. They went ahead and started this. It still gives us great feedback. It is going to help us do our design, what our assessment tool will look like. Thank you very much. We greatly appreciate Chris, you for introducing us, and Neely for the work that you put in. I think it helps to launch where we need to go in terms of next steps. Thank you.
MS. CURRAN: So to clarify, where you intending to use the rubric to measure these other sites?
DR. MAYS: What we are in the process of doing is actually putting together a tool to decide how we want to do these things.
MS. CURRAN: Again, I think that in terms of providing a score to a site, time may be better spent understanding the user needs and capturing that through some user testing, and then making those adjustments to the site, rather than giving it the score. I think again there are some other approaches that could really strengthen the sites.
Going at that four-layer approach that Josh brought up, I think, will give you much longer-lasting, better user experience. What I am trying to say is there are limitations to the rubric. Thank you for your time. Again, we are happy to engage in future points to understand the user’s needs and experiences.
DR. MAYS: Thank you. Lilly, you probably want to use the Excel version of the frame that we were starting with, so that you can actually maybe make changes as we talk. Can we get that one up? As we do, Kenyon since you are here with us, maybe I could get you to start by talking a little bit about that frame. Some of it is historical, I should say, in a sense of some of it were issues that we talked about for quite a bit. It has finally gotten put into a (phone interruption). Some of them are issues that we talked about at the last meeting.
MS. BRADLEY: You want the one that Kenyon sent around, the latest Excel with the module, right?
MR. CROWLEY: So in thinking about the rubric that we are discussing now, and taking a step towards how can we analyze the data, the data sites, the availability of that data, for lack of better terms, the goodness of that data and the fit of that data for different purposes. We thought about some of the feedback that we have gotten throughout the past year.
And more recently through the committee, and tried to boil some of those nuggets of advice and directions into sort of bite-sized chunks that might fit into these different categories along the same general format as Neely’s rubric on the website design sort of usability. But in terms of sort of data usability usefulness for different types of users of the HHS products.
In doing that, we ended up bucketing the primary constructs under usability, usefulness, resources and support, and community building and learning. Now under each of these different constructs, there is a number of sort of sub dimensions which match feedback that we have gotten through sort of experts and committee feedback. For example under usability, there is sub dimensions for a file format in terms of being machine readable or not, and then what way it is machine readable. For provisioning of data dictionaries, for having metadata, for the presence or absence of similarity indexing, how tagging is done.
And for these elements, tried to think about what would a needs improvement look like, what would an effective use of, for example, file format look like, what would an exemplary use of, for example, file format look like and for each of these sub dimensions. We took a shot at populating many of these. We left some blank that we could continue to work through.
Essentially, that is the structure of this. Again, starting with usability, usefulness, resources and support from data provider and community building and learning. As these sort of four items seem to be the synthesis of the many sort of other items such as access and availability of data that could be bucketed under these.
DR. MAYS: Let me just say a few more words, so that people understand. We were marrying what we had come up with what Neely had come up. So at the top, across the top where you see needs improvement, effective, exemplary, it was the rating scheme that she had in her original rubric. The issue of whether we would actually use those numbers or not, if we wanted some numbers or what it is that we actually want to get out of that, I think is something that is still on the table. I am just kind of giving you the background of where this all came from.
As we talked in the meeting, it seemed that there is the issue of usability. Now what is coming up is that we need to think that there are two tracks to this. We need to think about the access point of getting into the website before you even get to the data of how easy is that and then the data.
The question will be whether or not we want to make sure that we are really clear as to what we are asking about, if we are asking about the site, if we are asking about the data. Some of it will speak for itself when you are talking about the data dictionary. I think that particularly because of the use cases that we are trying to do, which is consumers and community groups, that we need to make sure, and I think this is what Josh was saying, that we are asking about both aspects.
The question of how to do that, I think, is very important at this point. Do we say this is about the site? Or do we ask the question in which we are very clear as to for the particular variable that we are asking about, where it is we are asking about it. And whether or not we still do the same frame for both the site and the data in terms of usability, usefulness, resources and support. I think we should kind of talk about big picture.
MR. CROWLEY: At the highest level, it is like how do we make the dataset effective for these. Whether it is effective for them to use it is a combination of how usable it is. Usable has a number of elements in terms of how easy it is to learn, how quickly they can do things with it. If you use findability as a dimension of that, how do they find it and access it and understand what others are doing with it.
Then usefulness, one of the common definitions is how fit is it for a specific purpose. That is when we start thinking about the individual user class. Is the data granularity sufficient enough for a researcher to do a meaningful analysis and make recommendations? Or likewise, is for a consumer, is the presentation of that data in a format that allows them to answer the questions that they are going to the site to answer about the help of their family or whatever it is.
For each of these, there is difference for presentation of data, different ways of providing, ways of making it understandable and accessible for different audiences, which will ultimately determine its effectiveness for those audiences.
DR. MAYS: To me, that is the big question. If we are trying to do this, most of these data sites were built for very specific and sometimes narrow audiences. They were built to be used by a researcher. They were built to be used by a consumer.
We are now asking for each of these groups should we, for example, be asking each of the groups whether or not this website is working for them. Are we putting in a set of questions that allows us to do that? Or are we going to do this differently? I think to me, that is the big question. The complexity of what we are attempting to do is such that we can end up with a ton of questions, or do we find an approach that allows us to understand the experience of these different users.
MR. SAVAGE: So I am going to pick up on your last comment, Vickie. I have a range of comments, but I will work with that one. When I go to a website, I don’t tend to think of it, whether it is meeting my need as a researcher or as a consumer or whatever category of individual I am. I am thinking more about does it have the information that I need. That is sort of what I am thinking of as useful.
When I looked at this category, I both looked at whether there were other kinds of individuals besides the four listed there. I did have some additional, like public health, providers, payers, vendors, but maybe that is entrepreneurs. I also thought of it in terms of some of the key issues that HHS is working on, that we are working on. I am sorry I wasn’t here at the last meeting, so maybe this was discussed or not. Things like interoperability, is this data useful information for some of the core goals. Or maybe actually to back up from that, a lot of stuff gets structured around the National Quality Strategy goals.
I sit on the Advanced Health Models and Meaningful Use Workgroup of the HIT Policy Committee. When we were looking at use cases to test, one of the ways we weighted them was to look at what they contributed to those goals, to those priorities because those are key national priorities as a way of sort of saying it. That is more or less useful given that we can’t do everything all at once.
Disparities is a national issue. How useful is the data to that? It is not to say that if it doesn’t address it, it is not useful. I guess I am raising that usefulness has many different cross-sections. Others besides just the kind of person occurred to me as I was looking at this. I will save some of my other comments for later.
MR. CROWLEY: This was an initial path to sort of seed these categories. As we think about providers, community members, even if there are specific subsets to consumers, maybe there are some things that are more appropriate for paternal, so that needs to be tagged in a certain way or done a certain way.
DR. ROSENTHAL: Vickie raises something like really important. In terms of every site can’t serve every community, right? So how do you go about not biting off an elephant? One way to do it is to say what is the nature and role of a site, a destination place, an enclave, a portal. SAMHSA has something you have to go in and get.
Well, there might be different ways to get out of doing it site by site, or to meet like a different community demand across everything, so that DAta.com isn’t a meta site that doesn’t have any data. It just points to other things.
It could be a different incarnation flushed out to solve a lot of those issues for a certain audience. Or another form of distribution, this is at the very top layer ecosystem, like a browser. I don’t need to know data. I don’t need to know how to code. I can literally explore data, interactive and analyze by clicking simple visual buttons.
Or I am a consumer. My parents are never going to check this stuff out. Do you know what they do? You look at US News or Docfinder. Guess what? Those guys are using a bunch of Medicare data as of last week, same thing with Pro Public or whatever. Maybe that is a better way to do it.
Then the only other thing I was going to say is like I think it is Venn diagrams. There is site usability for sure, what we are talking about. And there is data usability, but it is like where those things overlap and the link between them. If I go to Amazon, and I want to look for a toaster, the site is really nice. I can see stuff. When I look at a toaster, I can see other toasters. That is site architecture.
But if I go to a meta site or a SAMHSA site and destination, and I want to look up the data. There is a question of is the data usable. The link between it is, if I want to search for something, I am not searching for show me toasters. I am searching for show me other stuff that have JSON. Or show me other stuff that are actually at a neighborhood or block level. Or show me other stuff that basically have blah blah blah. There are definitely two layers, and there are separate classrooms.
To your point of getting really specific, I personally think that the metadata out of the data becomes the way that you navigate through the site. I can say let me see a site to find all the JSON files, right? Or let me see a site that could find something that has to do with health that had a ZIP code. They are definitely distinct, but like there is a connective piece that I think largely defines success. You can either do that through destination by destination, or through a meta site, or maybe just say it is the wrong model and just do it through other distribution mechanisms.
DR. MAYS: Remember there are three pieces that we should be thinking about. The assessment itself of the website, the questionnaire that would go to the individual whose website it is that we are looking at. It may give us the opportunity to say, who is it that is your primary customer? Who is it that you would like to be a new costumer? One can take care of some of this that way.
Then the third piece is the feedback that we give to people. We may keep ourselves a set of running comments about what is the assessment that we want from the person owning the website as to what they think is important and what they hope to achieve. Let’s do Helga and then Jim.
DR. RIPPEN: I just want to build on that. This whole question of how do I find the information I want, and the nuance is who am I, too. When you think about taxonomies and ontologies and all those fun things, they get really complicated.
One approach could be relating to the who, what, when, where, how and why sort of approach to life where it is not only what the topic is, but which particular sector the purpose and the target, as a few examples. Because ultimately the question is I need information to do something. Again, it is really about who I am, what my needs are, and whether or not it is embedded into meta tags somewhere on a site. It has to be associated with the datasets in a way that can be actually leveraged.
If you look at the Data.gov, they try that with regards to the sector at least. If you start getting more granular than that, again it goes back to use and usefulness without making a judgment on it. It is the what sometimes, and the why.
DR. SORACE: One real simple categorization is basically do you need data from more than one agency. People are researchers or app developers, they will need to go to some sort of metadata site that actually points them to the right sets of data sources. They will have more sophisticated searches. This is sort of the Data.gov approach, but maybe on steroids.
The basic user, like most of us will be when we go to sign up for Medicare, and we are just doing the basic administrative functions of the agencies. Those home pages will need to be optimized for the public more generally. Then as we understand the community that if you are a real wonk, you go over there. I actually think the first cut is pretty straightforward. Do you need to get more than one agency or not?
MS. HINES: But most people don’t even know what agency has the —
DR. SORACE: Then there has to be a search function that you begin with, if you are searching for federal government data. I think the steroid part might be that it gets beefed up a bit. I am not trying to despair it at all. I am just saying it does that. That is where people should be directed when they are curious about the data as opposed to the function.
MR. SCANLON: That was the whole idea of Data.gov. I mean, we are sort of starting the pre-Data.gov days. Again, I think it is an oversimplified version. It is 1.0, but it is a place to start. The theory is that there is a lot of data out there for different audiences from different purposes and programs. You can’t really make the program meet all the needs that it was never intended for
As Josh said, you have to take what is there. The revolution is not here yet. You have to take what is there. Data.gov and HealthData.gov were places to begin to consolidate that kind of information, make it available in machine-readable form. Tag it properly. It would be a place to start if you were starting. There are dozens, probably hundreds, of specific data resources on agency websites.
I want to be careful that we are not trying to boil the ocean here. I mean this is data. We are not trying to tell Medicare how to run its program. Our lane is the data resources that agencies have specifically said, posted and tried to make available, how to make that more accessible, how to make it reachable, how to make it easier to use and so on.
Again, the visuals are fine. But if the television doesn’t work or if I don’t like the programs, it doesn’t matter how nice the cabinet is or how wide the screen is. It is a combination of design features and usability features. But really ultimately, it is I came to this site because I need information about estimates of health insurance coverage in Massachusetts or wherever and trends and so on, or something else, the prevalence of smoking in my county or area or nationally and the health area.
Am I finding it? Honestly if we can’t do better than Google, which we often find we can’t do better than Google, when we try to organize the searches and so on. I think the workgroup gave us a lot of good principles to use in terms of how would someone know what you have and be able to access it and know how the data is structured and how it is measured and what the variables are. And to be able to find it when people are searching, which is good. I think those are the kinds of principles we are looking for.
I don’t know what the next generation of Data.gov would be, but I don’t want to make this task impossible. I think we are looking for very practical. Not that we are going to change Data.gov, but for Data.gov for example, when we think about the next generation, what would we do differently.
When NCHS is thinking about a next generation data site for distributing its information, what would you think about differently? You would have to go through this process of who are you trying to reach, what is the purpose? None of us are looking for overall guidance on agency, what HHS and federal policy for websites just now anyway. That is a different process.
DR. MAYS: I think that one of the things that we are probably thinking about is what are the basic principles. We started out with principles. What are some of the things that you need to do at a base level? I know when we were talking about tagging, for example, that is like the more complex or jargony your tagging is, then the less likely you are, for example, for people who aren’t in the group that uses it all the time to find things. Can you improve your tagging? I think that is some of what I think we were getting to.
MR. CROWLEY: So this actually kind of gets back to one of Helga’s points, but also building off which others were saying. This next generation of what is doable or simple and some basic principles of that, of understanding sort of who is doing the looking and what are they looking for and are they finding it.
One of the open questions we have is we really didn’t know exactly who is looking, what they are looking for, which sites they are going to, what they are pulling. Then what they are doing and how they are finding it. One is sort of a foundational concept. Is there a way that we can sort of add some degree of transparency or openness to the search that individuals are doing? Whether that is capturing, hey, if you have a question, enter your question. Catalog the question and have the ability for sort of social responses to that data question. Maybe that part of that answering of the response comes from just the data community at large.
But maybe part of that response also comes from sort of dedicated data navigators within the agency who know where these data are, who knows how to use it. Then if you make that open to the community, what you find is if one person has a question of trying to find out where the health insurance rate in Chicago was or whatever it is, and you are starting to collect that. There are platforms that do that now.
I mean, if you look at things like Google product forums or even going back years ago to some of these Yahoo answer sites. There are some architectures you can put in place that allow for natural language and for asking questions and cataloging that, that can form a community about finding these answers. Within that, you link out to the data.
Then you sort of have threads off of these questions, so you know what they are using it for, how they are using it, if there are products that have been created from that data or learnings that have been useful for some context for others. You can capture that. Maybe you are tagging that, as well. Tagging is great. But that is sort of the next level when you can use the community to sort of build an allegiance around these. Then build in both the experts at HHS who within these questions that are shared by many can point people to the right directions.
DR. MAYS: That is why I was big on a go-to frequently asked questions where if we did a blog, and people kept going to it, and people kept putting, this is where I found this and this is how I solved that problem.
MR. CROWLEY: What are you looking for? If you captured a year’s worth of what people are looking for at Data.gov, it would probably tell you a lot about what people are looking for.
DR. RIPPEN: Actually I want to build on that because it goes back to the classic design philosophy of user-based design. Again, this is a way to get that information that you kind of really need to understand what the use cases really are versus what we believe them to be.
I think that is always important because that will actually provide some interesting guidance. It will actually help address peoples’ actual needs. I think there is an implication because from a staffing perspective and that kind of thing, that will have an impact.
Again going back to if I am using something, use cases, again that is why I didn’t know if this was something to just kind of use as an example, a made-up example. Let’s say I am a researcher interested in either asthma or cancer. I wanted to know all of the datasets that are environmentally related that are geocoded to certain a certain county.
If you think about it, now that is an example of not just HHS. You have the toxic sites. That is the EPA. You have the forestry sort of thing because that actually has a lot of implications. Water aquifers with regards to potential issues, too, and air quality which is NOAA sometimes and other sources.
Again, figuring out how does one organize or allow information to be searched that have certain characteristics. In this case, it is environmental. It is cross-cutting across all the agencies. It is geocoded. That is just really one example.
MR. SCANLON: That is to some extent a tagging issue. It is self-characterizing on the part of the agency. What is the geographic detail and what is the content environment? It is a tagging issue. There are ways to do that. Otherwise, I don’t know how, unless you already knew what these answers were, you wouldn’t know where to go.
DR. RIPPEN: Again, one hopes one is wise in doing that because it does take effort. Again in context of early capability, early wins, and especially if you do it consistently, so whatever search capabilities or filtering allows you to support it.
I think what Kenyon was also recommending is important, if you really want people to optimize and use the information. What is your real experience? Then you will also go to different kinds of needs.
DR. MAYS: Let me just raise something that came up in the NCVHS meeting, and that is for the population. If Bruce were here, he would be smiling because this is exactly what the issue would be. That is actually the format for which to make that happen. Is there an argument to be made that is a business case for all those other agencies, other than HHS, to NOAA and to Forestry and et cetera.
To say, if you did this in terms of your tagging, and we did this in terms of your tagging, and there is the ability to geocode, we all went. I think that is something that needs to be thought about. I don’t know how doable that is, but I think that is something to take back to pop. To say before they go out to that group again, is there a business case to be made.
Before we leave, I want to go back to what Kenyon was suggesting. This is something to ask Damon, as well. Would it be feasible to collect this kind of thing on Data.gov just to see? I like what Helga was saying. It is like we think we know the use cases. When you said that, I was like, no, I know what the use cases are.
I realize that we may have carved out some. But then you came up with payers and providers and the public health. Somebody else may bring that. It may be interesting to see if what you are suggesting is actually doable and to monitor that for a while to see what would we get? I mean, I don’t know exactly how to get it started, if you had it up there.
MR. CROWLEY: There already is the search box on the site. There are probably some regulations. That is sort of a lightweight way of seeing what people are searching for.
MR. SCANLON: The tagging, Jim, you will remember actually when we got recommendations from the workgroup and their federal standards, as well, for the tagging part of it, it is the originating agency that takes those instructions and tags when they enter a data set. We ended up doing that. You would have to have an agreed upon code set.
DR. SORACE: These are not easy. One user group for all of this is actually federal government researchers and analysts. I just want to keep coming back to that. To a certain extent, there is a design principle called, I think it is the non-flight version is Eat Your Own Dog Food. In other words, you produce some piece of software. You produce a solution, and then before anybody else uses it, you have to prove its utility to yourself.
There actually is a stage we could sort of think about going through in terms of how to generate that. I think that there is more interest in that and across all these other datasets, especially in the NIH community. They are broad, and they are influential, and they are an important group. There are others, but there are ways to get some feedback on this.
DR. MAYS: Let’s decide what we are going to do in terms of next steps with this. Let me make sure I hear from people online. What I want to do is get each person’s input, and then I want to make some decisions about what we are going to do in terms of next steps, so that we can wrap this part up and then make some assignments.
One of the things that I am going to do in the meeting is try and get things assigned and wrapped up because we are kind of struggling in terms of staff resources. When we agree to something here, I am going to at least try and have us decide what next steps are.
DR. ROSENTHAL: I actually don’t think this needs to be real rocket science. I think this could be really simple. Like the first cut at Data.gov. Honestly it is not just topics. They have format. They have geography. They have some basic objective top-down tags that you can do. This publishing what is out there and like a basic schema, put together a sample schema.
You can just look at the sources that you have around whatever agency and say what are basic common elements. Or you can go to a public data browser and look at a scraper and say what have they programmatically computationally scraped and identified as your element. You can do that pretty easy peasy, like unified tagging. It is that metadata stuff that I keep talking about. I think it would be hugely helpful. You already have like a meta site across that at Data.gov. It would be very easy to expand that out as a trial.
DR. MAYS: Chris?
MR. FULCHER: I agree with Josh. Simple is better to get further down the road. The one thing that I want to throw out there is that given the limited resources of time, how do we task these pieces and who do we task them to? That is the question I have based on and building on Josh’s comments.
DR. MAYS: After I kind of hear what everybody’s comments are in terms of what they think next steps are, that is exactly what we will do next. Who else is online? While we do that, we know Lilly is online. Lilly, let’s get to your comments, as well, while they tell me who else is on.
MS. BRADLEY: I have nothing to add here.
DR. MAYS: Jim? Any others?
MR. CROWLEY: Just adding to the thoughts that occurred to me under usability, repeating what I said earlier, I think language access is a function of usability, accessibility. We have heard about staleness. I don’t know if that is the kind of thing that would come up here and also literacy levels.
DR. MAYS: Can you just say what you mean by literacy levels, something like this?
MR. CROWLEY: In my work, we actually distinguish health literacy from HIT literacy, which is literacy with the technology. I was just trying to generalize.
DR. MAYS: All right. It sounds like simple is great here, so let’s talk about next steps. Helga?
DR. RIPPEN: I think I am kind of on the hook for the population health group to try to do a crosswalk on that fabulous fun-reading draft report on all of the different determinants and indices. I guess the question is there are a lot of different ways of moving a kind of an assessment forward. The first is based on prioritization.
We could use that to drive at least a test case to go to agencies to say, okay, to actually give us more details on their dataset. We are talking about usability. Like currency, which is kind of the freshness, whether geocoded or not. What are the data elements? Kind of whatever we think is simplistic information tags that might be useful to match the data to the individual, and maybe some other descriptors about it. So we could do kind of an environmental scan of who has what, but focused as opposed to give me the list of all the data you have and spend a lot of time.
If that is the case, I am willing to try to lead at least a development of the matrix. Now again, there are a lot of other things that are in this usability component to look at. Again, I am more like a keep it simple sort of thing. If you are asking people to take time, but that is just something I will offer.
DR. MAYS: I think we do because that is one of the things we have to remember is that when we make the request, and we call ourselves being helpful, and then we give some person three days of work, we are not going to be seen as being very helpful. How about if I ask you and Josh because Josh has been — Josh, how does that sound?
MS. BRADLEY: This is Lilly. Can I just clarify? Helga, are you offering to do a test case or to be in charge of the modules?
DR. RIPPEN: So I am offering to work with whomever because fun loves company of putting together a matrix that is guided on some of the work that has been done here and some of the conversations with a target based on some of the needs from population health. I am being balanced here. Because again, you have to start somewhere. If we need the information anyway, it is a way to start doing the bridge.
It would be the construction of the matrix, which then I would actually bring back to this group to review before anything is done. Then we can see, well, who wants to have the fund. At least the first step, if not the follow on, just to move it forward because we need to move it forward. Then even recommending the different agencies and the different datasets that we would like them to focus on based on that.
MS. BRADLEY: The matrix would be used to evaluate websites?
DR. RIPPEN: The matrix would be used to evaluate the datasets themselves. I think that there has been so much fabulous work as it relates to, I think, the website and some of the nuances. This actually is a very narrow sliver. It may not be enough, and or maybe we should expand it. I guess I am trying to do two for one. I don’t know how well it will work, so I guess I will defer to this group.
DR. MAYS: Let me just say I think our start point really is the data itself. When we were working with Neely, it was seeing how we can combine the two. It sounded like part of what her lab does is look at it from the point of view of coming into the website. I think as a start, what would help is for us in the lane we are the strongest in, which is about the data, is to start there. I think that is what Jim is interested in. I think that is what the agencies are interested in.
Then I think we don’t want to neglect the other part. But let us get this part started. Then because we haven’t had a chance to read her report, we just got it. It gives us the opportunity to look at what she has done and to figure out, do we want to marry these. Is the marriage of actually dealing with the website itself, something that HHS is interested in. If so, let’s build it onto the data issues.
DR. RIPPEN: Actually I would say it is kind of a natural progression. If we are wise in the development of the matrix, and we do have the kind of, I guess, categories, the tagging terms of however we want to call them, then I think going back to the website with regards to, well, how does that interface between the data availability and the ability to actually find it and use it and things like that marry? I think some might say it is reverse engineering. Some might say, well, okay, we can argue about which way is the better way to do it. I am just being biased in what I am interested in.
MR. SCANLON: I think the matrix, if I am visualizing, would be the priority domains for health and social determinants of health, so various things. It could be food, housing, and environment. Then the other side of the matrix could be these characteristics. Then it would be trying to fill in. We would assume that there are measures and metrics available where there would be. There may not be. It would be, for example, at the county area or race ethnicity granularity or whatever national kind of data. They would be the descriptors. Tagging is sort of one way of conveying that.
Then you could see, so what is needed to use these. Then the data availability, it could be websites or some other place. It sort of carries through, where is this available?
DR. RIPPEN: And then the nuance about what could it be used for. If it is not geocoded, or if it is not for statistical purposes, then what is it? Is it for other purposes? Given like your examples of health care systems, right? Again, working with whoever is interested, we would at least come up with what we think based actually on the chart that I guess we will get sent an updated version of. Then also the report to see if there is anything in there, too, going back to leveraging. If that makes sense. I am just offering it.
MR. SCANLON: The other thing is to tie it back into access. Can we ask the agency is it available on the website?
DR. RIPPEN: Even the nuance of you may have it available on a link, like I know that there are some CDC sites that you can do a link, but you can’t actually access the data. Is the data accessible or is the map? They let you do the visualization. It is kind of like what is available, or do you need to actually request special access for? It is the categorization of how hard is it to get. Can you see it or can you touch it? How deep can you touch? That is what I am thinking. Again, Josh may have some additional wisdom given that he has been working on this longer than I have.
DR. MAYS: I want to add one more layer. It is just a matter of when to do this. This is something that Kenyon has been working on, as well as I. That is the issue of what is the principle of this. Meaning for example, if in this matrix what you are saying is that there should be tagging, so what is the data stewardship issue like? Why are we tagging? What is kind of the best practice in terms of the tagging? Would you do tagging by wanting to hear from the users? It is that kind of thing.
I don’t want to lose that, but to say that part of after you get this matrix, then I think the issue of principles is probably a good next step. What you start giving is, when I am thinking about it, as we go through this, we look at what is happening in terms of what is happening. The principles eventually become what we send to the secretary about what we think will work well for HHS in general. These specifics in terms of the matrixes kind of how you work it. The matrix may end up going with it. I think in general, we need to establish some principles for best practices of displaying data to make it more usable and accessible.
DR. RIPPEN: Ultimately, that is the end goal. It is one thing how important is it and what do you have. But if you can’t deliver it, so yes.
DR. MAYS: All right. I think that we have done well in terms of wrapping that up. What I would like to do is take a break. We are going doing good on time. It is about 3:30. Why don’t we take about a 15-minute break and come back at 3:45. We should be good.
Then what we are going to do is talk about our work plan. What we are going to bring up there are all the things that are on the table that we have talked about in terms of we have talked about blogs, we have talked about lots of things. We want to lay them all out, start putting some priorities around them, and then starting to decide whether we need to recruit additional people to help us.
MS. BRADLEY: On that last one, could we come back to it at the beginning of the next session? Just what the timeline vaguely, whatever information you would want moving forward about the timeline?
DR. MAYS: Okay. Thank you, Lilly. We will. All right.
DR. MAYS: We are going to reconvene. Lilly, are you there?
MS. BRADLEY: Yes. Which document do you want up?
DR. MAYS: A couple of comments first. You wanted us to go back to a timeline. What I would like to propose, Helga is not in the room, but let’s see if we can do this. Josh, are you there? Let’s see if we can have the matrix ready for our next meeting. Our next meeting is June, right? Okay.
What I would like to do is see if between now and June, the matrix is ready meaning the matrix has been developed and the matrix has been circulated among all of the workgroup members. And that we are able to come into the full committee and actually talk a bit about the matrix there. Then in the workgroup meeting, we can then make decisions about who will be the actual test cases. How is that? We don’t have Helga here who volunteered, and Josh is not hearing me.
MS. BRADLEY: What do you think about one to two weeks before? Would you want it circulated?
PARTICIPANT: Don’t you want to have time for input among the workgroup and all of that?
DR. MAYS: Yes. Okay. Thank you, Lilly, because this is exactly what I have been pushing us to do, and you are holding me to it. Let’s do the backwards calculation. The committee meeting is on June 14th.
MS. BRADLEY: I can look at the date. This is helpful. Do you think a week is good or two weeks or a month?
DR. MAYS: Wait, no. We are going to actually do it. How far in advance before something needs to be in the e-agenda book? We need to be finished by say June 2nd, so that we have time to fine-tune it. I don’t think we need to worry about having to discuss it at the executive committee.
MS. HINES: It sounds like you need the workgroup to vet it and then also the test cases. I am assuming are you having a monthly workgroup call between now and June?
DR. MAYS: I don’t know if it is going to be monthly, but we will have different calls at different times. I think that we want it finished by June 2nd. Lilly, what I would suggest is that what we try and do is to come up with four points of contact. Helga just came back in the room, so let’s ask her. We are coming up with a timeline.
What we said is that if the matrix is finished in two months, and so what we would do then is to circulate it, to get it vetted. To come up with if we can have some even small test case maybe, so that when it is final is by June 2nd. Which means it can go into the e-agenda book. We can discuss it at the full committee. Take that feedback, since you are going to use the pop as kind of the example. We can use that feedback, and then work through whatever feedback we get at the actual meeting, which would be the last day, was it June 16th I think you said? Oh, June 15th.
DR. RIPPEN: So the nuance is going to be we are going to have to rely on Jim and others to make sure that whoever we use as a preliminary test case can get it turned around in a given timeframe.
DR. MAYS: What I am going to probably suggest is that our test case may be somebody at NCHS. They have been pretty good about willing to be guinea pigs.
DR. RIPPEN: I would be worried to help identify whatever works out just to kind of facilitate. If we get it in, let’s see, where are we now, in April, and I don’t know how long this group will take to want to review it because reviewing is really important. You don’t want to just say, oh, any comments. I don’t know the next meeting —
DR. MAYS: If we can come up with about when in April we think it is ready, and the sooner we can tell people, say within a two-week period if they would do the review.
DR. RIPPEN: I would say if we get eight weeks from now and then two weeks from that point, have a phone call to review. Or I don’t know how much time people need.
DR. MAYS: I would say two weeks if we know ahead of time to put it on our calendar. The only problem is Datapalooza is very early this year. I think it was May 8th to something. If we hit their Datapalooza, they are really — May 8th to May 11th.
MS. HINES: Why don’t you try to stay right smack dab in the middle of April, something like 13th, 14th or 15th, when you have it ready?
DR. MAYS: So let’s say in the middle of April, somewhere between the 13th. Then what we do is we distribute it. Then what we try and do is have a call. Let’s give people ten days. That is probably good.
MS. HINES: Why don’t we just set up the call in the next week or two? That way, people will have it on their calendar.
DR. MAYS: Right, and the Datapalooza people can be real clear about what they can and can’t do. They get very busy. Okay, sounds good. Maybe the test case can be because they came to us before. Well, I was going to say it is NIHS to see whether they would be willing to do it again.
PARTICIPANT: Okay. Does that make sense having the Health Interview Survey?
DR. MAYS: Or you know what? We can also ask Jim because he may want SAMHSA. We just did all the other stuff for SAMHSA.
DR. RIPPEN: Just so I will know the timing, we will have a draft April 13th. Okay, ten days later or whatever, a call, will go through to say, you know, how do we fine-tune it. Then once we have fine-tuned it, we will send another version out to just make sure that everyone is comfortable that their comments were —
MS. HINES: That is not going to give us time to do a test case and have everything done by June 2nd.
DR. RIPPEN: So then everyone will be very trusting that we did actually incorporate the comments. We will send it out for the test case, right?
MS. HINES: We would have to have it lined up. People know, okay, May 1st, this is coming. We need it back by May 25th or whatever.
DR. MAYS: Also, in terms of the changes, the way we can do this is if we do it on video conferencing. We can actually see all the changes being made, and we can agree right then and there. If necessary, we are small enough that I can arrange a Google Hangout. Rebecca if we can do that time wise, that would be it. I appreciate scheduling it here.
MS. HINES: I think having Lilly write that up on that sheet she has got there would be good, too. I hope, Lilly, you have got — Helga has committed to having the matrix done by April 13th with Josh. We would have a meeting scheduled sometime within a week to two weeks after that.
Some people already will know it is coming and to block off some time to be ready for review and provide feedback, which then would immediately go to whoever the test case is, which we need to line up in the next month. Because if we say on April 30th, oh, we need you tomorrow to be our test case, that is not going to happen.
DR. MAYS: Again, we will ask Jim. That is a contact with Jim.
DR. RIPPEN: I think maybe we will get some preliminary ideas about the matrix, too, and maybe a summary so that people know what it is, too. Otherwise —
MS. HINES: I can use that summary to enlist someone then to agree on May 1st. Okay, you will have three weeks to give us this feedback.
DR. RIPPEN: And then what I will do is because it is important to leverage everyone’s expertise, and I know a lot of people thought about a lot of these things for a lot longer than I have, is any time there is any kind of summary or anything, I will send it to the group. Any draft levels, just for people that have the tolerance or the desire. If there is anybody else that would like to work with Josh and I, just let me know.
DR. MAYS: I think what we should probably plan on is let’s just use one of the ones because we have information either about NIHS or SAMHSA. We would then have some baseline information about it. I think it would help us more rather than picking a brand new one. I would say one of those two.
MS. HINES: So Lilly, we are going to April 13th is when the matrix will be ready. But Helga is going to send us a summary beforehand, which I will use to hopefully engage someone to agree to be a test case. Late April, there will be a workgroup call to review the matrix. We will have somebody, hopefully HIS or one of the SAMHSA systems, to do the test case for us in May and give us the feedback. Then have a workgroup call at the end of May or first week in June, so that we can have the thing for the eBook to Marietta by June 6th.
DR. MAYS: Then when we send something out to the workgroup, also solicit and see. Once we do the work plan and see what the other things are, what I will do is also send these items out to the workgroup. People can self-select into them. We have people not on the call who will do it.
Then I know Paul is coming on pretty soon. There are some other people. Don’t fear that I think it is just the two of you. Of course, I will help out without a doubt.
Okay. All right. Lilly, can you put up the work plan?
MS. BRADLEY: Yes, give me a minute. What version? I have multiple versions of that. Do you want the PowerPoint, or do you want your Word document? Did you even send me the final?
DR. MAYS: The final got sent out. If you don’t have it, yes. I am going to get started while you find the actual PowerPoint. What I would like to do is to start cataloging all of the things that were possible. This is where I need to find out what it is that was brought up, the things in the NCVHS meeting, the things that we have talked about in the past, and this is where I think some of us who, it is like, oh my gosh, that has been there a year. Can we do something about it? We want to bring those things up, as well.
Let me start the list and see where we are. The blog, so we have talked about the blog as kind of a — I am going to describe it because not everybody has been in on all the conversations. One of the things that we see happening often is that we will come up with great solutions. Josh will talk about this is how you fix that problem, or Kenyon will identify a great piece to read that is insightful about dealing with government websites.
We talk about it. Whoever is out in the ether listening to us learns about it. But we don’t get it any further. What we talked about was should we have a blog, and letting the blog be everybody who works with government datasets wants to come to this blog because it is chock full of information.
But it is not just us putting things on the blog. It becomes the go-to place. Other people, as they learn things, start putting things on this blog. We kind of talked about that. It can’t be an HHS blog at that point. It is Josh, somebody, somebody doing this blog. But we have to talk about if we do it, we have got to find a way to drive traffic to it. We have got to find a way so that it is not stale, that there is constant materials that are put on it.
So Josh has designed a prototype. We have seen it. Let’s put that on our list as a possible. Walter brought up the issue of having a blog that is an approved HHS blog. Those blogs tend to have kind of these longer articles in them.
That kind of blog would be selecting issues that come up, that we would then put on the blog. It would be approved. The workgroup could put up issues. When NCVHS had issues, it could put them up there, as well. So there is that. There are two different kinds of blogs we talked about. Kind of the one where anybody and everybody posts to it, and another one that is like — I guess ONC has a blog like that, right?
DR. SUAREZ: Yes. ONC has a blog. Of course, ONC is a federal agency. The people that post on the blogs are federal employees. They use the blog to get feedback. Sometimes I believe the members of the advisory committee, like the Health IT policy committee perhaps, have written something on those ONC blogs maybe. I am trying to remember.
DR. MAYS: I think this is where we need Damon. It also got said maybe we wouldn’t have a separate one, but could we use one already in existence, which is I think Damon’s group has some kind of blog.
DR. SUAREZ: That certainly would be an opportunity.
DR. ROSENTHAL: Real quick, this is Josh. There is a blog on Data.gov. You don’t have to be an employee, but is there an approval process. It is pretty widely used.
DR. SUAREZ: Health data or just data?
DR. ROSENTHAL: Health data.
DR. SUAREZ: Healthdata.gov?
DR. ROSENTHAL: Yes.
DR. FRANCIS: The more we can use something like that to drive traffic, the better. Otherwise it will be really hard to find.
DR. MAYS: So that would be three options.
MR. CROWLEY: Looking at even our own blog, one thing we could do as far as the blog marketing strategy is look at what are those other key blogs cross posted. Because I mean a lot of blogs repost other blogs. Whatever the top ten blogs in order to get the most hits might be worth reach out to get some kind of cross-posting arrangement.
DR. MAYS: I think I will ask Chris about that for HDC. As a matter of fact, work with them around that. I think the next thing is for the June meeting, so that is quarter two, we just want to come back with the information. Something about each of these blogs, something about what the top ten are, and then we will make a decision as to whether or not we want to actually do blogging.
Helga, didn’t you have another item? For some reason, I remember from the last meeting, other than the matrix, there was another item that you had volunteered to do, no? No problem. I got my ask in, so I am good. Number two is we need to think about Datapalooza.
DR. SUAREZ: I assume most of people, or many of you all, are going to that 2016 Datapalooza.
DR. MAYS: We got asked. Again, I need to deal with Damon. We got asked. I should deal with Damon about whether that ask is still there. I didn’t realize it was so early. I went on, and the secretary is presenting. It is an incredible list of people.
They may be early because they are now doing it with Academy Health. That means we get to the health services researchers and health services people really easily. Anyway, on Datapalooza, that would be in this quarter since it is in May. When Jim and I were chatting it would be is there something we could do at Datapalooza that is really about enticing the entrepreneurs to develop more things for us. I have to talk to Damon and find out what they want from us from Datapalooza and what it is that we can do. Josh, aren’t you organizing Datapalooza?
DR. ROSENTHAL: Yes, I am. I am doing a good chunk of it with John Blum, and then Paul Wallace from Austin Labs, who was the chair of the Board of Academy Health.
DR. MAYS: So I will probably either talk to you or to Damon to find out what kind role the committee can or you want us to play. Okay.
DR. ROSENTHAL: We should have asked Damon to come up. He should have a mandate to get NVCHS to do something already.
DR. MAYS: He did ask, but I didn’t know what we —
DR. ROSENTHAL: It should be there. It should happen there.
DR. MAYS: Okay. I will circle around with Damon then. Lilly, you and I should come up with a time to talk with Damon. One, to bring him up to speed about what went on in the meeting, and then two, to deal with Datapalooza. Okay?
Other things in terms of our work plan. Let’s talk about short-term, long-term. Let’s actually talk about the things that Jim put on our list today and see where we are with those things. Making data more usable, one of the things he brought up was the issue of market segmentation. He would like suggestions for us about the issue of identifying audiences. In terms of those audiences, how to do market segmentation to get them, I guess, to drive them to and get them to use the data. Let’s put that as market segmentation is one of them.
The other thing that he brought up is the issue of standards. Walter, do you have any sense of what it is that we might do in that? He is talking about standards for the data. It was kind of we have this demographic. Indeed, that is something that he talked about. How would we fit into that picture?
MS. HINES: To answer or address what need.
DR. MAYS: What I have that he talked about is to think about, for example, these issues of the data systems. He talked about how they had talked about how they had worked on standards and all of this. The potential for standardization of measures is what he talked about.
MS. HINES: He was just saying that is an activity that is already underway at HHS.
DR. MAYS: I thought he was asking us whether or not that is one of the things that we wanted to do in terms of websites. I think suggestions came up about do we want to say that data on websites, try and encourage them to think about these standard approaches to using these demographics. If you don’t see anything there —
MS. HINES: I don’t because I know enough about all of this. He was basically saying that the surveys, I mean unless Jim Sorace, you have some other insight understanding, he was explaining that the HHS did a lot of work to make sure that all of the surveys were using the standards for the basic race, ethnicity, gender, primary language.
DR. SORACE: They were developed by ASPE, I think, in response to I forget what law.
MS. HINES: Section 4302.
DR. MAYS: I thought what he was saying is for us, in terms of either technology or something, to drive people to hear is where the standards are. What we were trying to do was get people to use them.
DR. SORACE: I think people should really look at these common data elements. There is a lot of work that we might want to do internally. I am going to put something in perspective. We actually publish an NTTAA report annually dealing with the Department of Standards, National Technology Transfer and Adoption Act.
We have about 830 plus employees working at about 160 different organizations on standards. That is the size of the standards.
MS. HINES: I am not getting the link between making the request to us. All I was hearing was him explaining the progress that was made on that front. Then frankly, they haven’t made a ton of progress, although there has been lots of cross-agency discussion on tobacco, mental health, health insurance estimates and LGBT to see if internally HHS can say, okay. You go to this survey for this and that survey for that. So many agencies are collecting all of this data. I didn’t hear him making a request of the workgroup.
DR. MAYS: Okay because I thought there were suggestions that came up. Believe me, I am willing to let go.
MS. HINES: He did that same report for the committee, too.
DR. MAYS: I know, but what I thought he asked us to do, and again maybe I just didn’t hear, is we had the discussion about well, suppose NIH published what they did. I thought what he was saying was with us can we drive people to some place on websites where it is greater information about the fact that there are these standards that are being used for the way they collect data and what it would be. Then we talked about non-federal sources doing that, NIH doing that, so there were those suggestions.
DR. FRANCIS: What was said I think for us was the role of the private sector in part because so much data gets collected in so many different kinds of ways. With the NIH proposal that was at least floating, and I don’t know if we have a role in that, but it was to suggest that when research is funded, the data collection standard be either strongly suggest or required.
MS. HINES: So that is it. It was nothing internal to HHS, but what could we do to take what HHS has done and encourage and promote outside of the government to use those. There we go. That is what it was.
DR. FRANCIS: The knowledge, making it available, that is at least how I understood it. I might be wrong.
MS. HINES: I don’t what this workgroup would do in that respect. I mean, if we are aware of — I don’t know.
MS. BRADLEY: I think that what Jim talked to, he first was talking about how they had implemented the recommendations from the workgroup. He was then making one request, which really was he is calling it market segmentation, but he is really just pointing to the fact that the desire by the workgroup to look at use cases would be very helpful. He interprets use cases as market segmentation, which in some ways it is. It is just a different name for it.
Then he really then just did go into talking and providing an update to the workgroup about how some of the things that are going on. The discussion that I got from there are expanded to talk about how you further expand efforts like standardization, the kinds of things they were doing with trying to bring alignment around how you capture demographic elements or insurance. He did not make a specific request there. There were ideas that the workgroup exchanged to each other and then he also commented on, but there was no request.
MS. HINES: Right, and that is because that work is still ongoing via the data council.
DR. MAYS: Okay, so maybe the only thing we would do here is think about blogging about it and putting it out in some kind of way. It is something that I think is not in the June meeting. It is something that would be for the third quarter, to think about it in September.
MS. HINES: Again, this doesn’t seem really complicated. It is more like who in this workgroup knows how to reach that audience and come up with a strategy for making them aware that these standards have been developed. So HHS surveys are using these standards. How do we get the rest of the world to use them? To me, that is like tasking two people, give them three months and come back and report on it.
MR. SAVAGE: I would just point out a couple of possible intersections. There is a federal health IT strategic plan for five years. There is a 10-year nationwide interoperability roadmap, both of which have components about pushing standards, public-private. It is more particularized. I think we are talking about broader kinds of data.
The strategic plan is actually 39 agencies agreeing that is going to be a vision. Now implementation is another question. The interoperability roadmap is being coordinated by the national coordinator, but there are a lot of calls for action of private sector folks. I am not really saying this is the answer. It is occurring to me that these are big long-term announcements, efforts, to drive standards, and maybe there is something to bootstrap or something to look at as a model.
DR. SUAREZ: I guess I am trying to understand the question around this, whether it is standardization of the data content inside each of these databases, you know, federal agency databases. Like demographics like Jim was talking about or identifying what are the differences.
Or are we talking about other types of standardization? For example, and I think I mentioned it before, and there is work done already around the metadata about the databases that exist. There is another level of standardization. I think there is work already done around that.
Certainly there is a third level perhaps of standardization. We heard it today and talked quite a bit about APIs, the Application Programming Interface capabilities. Is there something like that considered for federal agency databases?
There is a standardization in the interaction with the database including basically API-type technology and other types. There is a standardization in the metadata that describes the database. There is standardization in the data content.
MS. HINES: That is what Jim was talking about this earlier was they have standardized actually the questions that get at race, ethnicity, gender, disability and primary language. That is what HHS has finally accomplished. How do we get everyone else who collects those same data to do the same thing?
DR. MAYS: Here is what I am going to suggest. I would like to put this on the agenda for our November meeting. What I would like is Paul is not on, but Paul would be a great person to talk about some of this. I think Paul would be good to talk about the metadata about the datasets. I think that Paul would be good to talk about the standardization about the datasets.
If we wanted to do this, what would it consist of? Then Paul, in terms of sitting on the ONC, could talk about exactly what you are saying. There are other places where we just put this information.
DR. SUAREZ: On the metadata, I know we heard from Damon about it. He knows, and he has of course good knowledge and understanding. I think he would be a perfect resource, but I am not sure how much Paul as a non-federal member knows about metadata on the databases. Anyway, there is already work done and materials written about the standards for metadata of databases.
On the standard for APIs, that is a very different scenario. I don’t understand about the data content. I think that is where we want to see. The USHIK effort, and the NLM and NIH effort that Jim mentioned that you already receive already the link to it. Thanks for that, Jim. There is that question about where do we want to focus on? There are these three areas, perhaps even more.
DR. MAYS: I think we need information before we jump into the space. It may be that Paul isn’t great for all three is what you are saying.
DR. SUAREZ: No, he is great for everything. I think it would be helpful to do other resources.
DR. RIPPEN: I guess what I am hear is several different buckets of discussions. One is outreach. So how does one provide information about something, right? Either about the data that is available, the blogging, so that everyone could learn and share knowledge.
Then also even what HHS, for example, and what Jim said. Their use of this is how we ask questions can be reused. Having to come up with how many times or how many different ways can I ask about smoking is kind of a waste of time if one can leverage it. But then, one needs to find it. There is this whole question of accessibility.
The other one is then what about how do we actually make things possibly more usable in the sense of understanding what we have and how best to use it, right? That includes then the tags, the meta tags, the organization, et cetera.
Then I guess the third, which I think is not really here, and I don’t think it is necessarily the scope, which is where are the gaps?
DR. SORACE: There are three things and they are incredibly unrelated. The first one is that government standards policy requires that the government use non-government standards whenever possible. We are not to make government-unique standards. That is actually laid out by OMB. This is just any data. I don’t care whether it is standards for IT or standards for making a ladder or standards for cosmetics. It is laid out in circular A119. Just keep that in mind.
I am not sure that we know our own data models well enough to say that. I use LOINC and the CDC database. Somebody else uses NIH. Now I find all the federal government databases use Loink and maybe drilled down on their codes. We have never coated or aligned these data models to the point that it will allow you to search on that. That is the second sort of unrelated thing.
The third unrelated thing is if you want people, one group that you really might want to talk to when it comes to actually doing data linkage and has done a lot of work with the MLM, but has also looked at environmental data and other data is actually the computing group at Oakridge National Labs. They have actually done a lot because they have a big super computer, and they can grab the data. Occasionally when they get through a data use agreement every ten years, they get to turn their toys on it. But they actually have more use. They actually have some experience in cross-linking and integrating databases.
I don’t know whether it would be worthwhile. I would be happy if you want me to try to drill down with them. But they may be very good to talk to about how to grapple with these metadata tidying issues. I will tell you the fourth thought that actually is related, and that is I have seen many HIT projects fail. The reason why they fail is they get over scoped, and you can’t do them. I call them complexity traps.
You have actually chosen to do something that is mathematically impossible. Look at the number of combinations in the number of fields that you have to start to link to one another. These projects blow out astronomically quickly. The road is covered with road kill out there. I am not anxious to see anybody become part of the stew.
They have to be very carefully scoped with that. You can do an awful lot with them. But knowing when to call it quits, having a sophisticated sense of laziness, is really crucial.
DR. MAYS: Here is what I would suggest, which is let’s just do data gathering, have this for the November meeting. Jim will be there. The determination is the priority. Or what I can do is ask him. For this one, I am going to push off. Part of that is I want greater clarity. If it turns out that we don’t really need to be doing this, then our other work is going to take precedent, which is the website. That is why I am pushing it off so far. If we have this much kind of lack of clarity, if it isn’t as high priority, then I want to make it kind of lower level priority. That is why I am going to push it to November.
DR. SORACE: I actually think finding out where many of the common standards like SNOMED and CPT and what not are actually used in our own data models might have enormous value.
DR. MAYS: That goes to then that group. Yes, that wouldn’t go to us. I would say that something like that would go to standards. This is part of why give us some time to investigate and decide whether how much priority is it in terms of from Jim’s perspective and Damon. There are our customers. Decide whether it is us or it is the bigger group. That would be my suggestion. Let’s find out.
DR. SUAREZ: I would suggest we craft a message, some summary that describes these various items and share it with people to see, with Damon, with Jim, with others to see of all of these things that we have been talking about, data models and data content standards and APIs and other things. Which are the areas that you think would be most helpful for us.
MS. HINES: I would second that.
DR. MAYS: The data model, I think that is already decided by the full committee. Just in terms of the standardization issue, but not the data model.
MS. HINES: I agree. We should do this list, and then take it to HHS and say, what of these things would you like us to actually focus on and possibly do?
DR. SUAREZ: I would not take out data models. The reason I say it is because that will add to the importance and the value of if they say, yes, this is an important item, then we have more. Naturally we have even more reasons to do it.
MR. CROWLEY: This is probably a future thing, but just so it is on the list to be thinking about in the future. When we think about sort of useful tools for researchers, there are a lot of statistical brief that is coming out of our federal government, HHS and their partners. If we had thought about taking the code that underlay the actual statistical analyses and made that available as part of the releases, that allows for reuse in a number of different ways.
Also as you are thinking, this sort of ties to the next part of this comment for maybe a future product, too, look at the data science education business. Right now, data science is one of the fastest-growing curriculum areas across most universities, multiple departments. There is the curriculum for that.
I mean, I haven’t seen much focused on open data and using government data. There might be an opportunity to create course packs or data bundles or codes or other things that could be directly reused as they are training the next generation of data scientists that may be a product worth exploring in the future.
DR. MAYS: When do we want to consider that by? I am in a position of like pushing things off a little later. We have a June, September and November meeting, right? I think that is November. I have a feeling that this other work that we are going to do is going to take up the next, yes, is the next two. Let’s do that.
The other thing that Jim talked about which was in the future was this issue of what kinds of things, this is the Flint project, are there resources to make available? Can we think about each time, he talked about there’s Medmap. He even said about kind of having a presentation on Medmap. Whether or not in terms of dealing with things like disasters, are there any things that we want to recommend or suggest? Any thought about that?
DR. RIPPEN: So I think again going back to who would be best suited to really answer this, so if the question is if there was an emergency, a public health emergency, what are the types of information that are critical to help address it. If that is the question, the I would actually ask the Office of Preparedness to say, okay. If something happens, what do you need?
It is different from, okay now, if there is something specific about a contaminant, an infectious disease, those kinds of things, again it can be very broad. Then the other kind of nuance, especially if you think about why Flint as an example, it is an infrastructure question, right? If you looked at one of the challenges was tracking the lead pipes. Then also just with regards to the bussing routes for the schools as far as catch man areas, right? Now you have potential there on top of that.
I guess it just depends is it general or specific? Then there is a short-term and long-term. They are going to be having to monitor watersheds. They are going to have to be monitoring unfortunate scores, education scores.
DR. MAYS: I think what he was asking us is that whether there is anything we can do in terms of faster data availability and whether there are any kinds of resources that we could think of in order to make those available to disasters. That is when I came up with the issue about monitoring the Twitter and kind of what you learned from that. I think what he is trying to ask is are there other data? I took it to be actually social media data, and I may have made it too narrow.
Is there anything that we could do, either in terms of mapping? He talked about Medmap and wanting us to see Medmap. I think to me what he was asking us is are there any other tools that we are aware of that could help them in terms of data being made faster. Medmap they now do. He thinks that is helpful. Is there anything else we can think of? It may be no more than a discussion to actually try and think about it. Then to see whether he really wants us to interact with Nicole Lurie. It may be Nicole has — we come up with something, and it is like we do that.
MS. HINES: I was just going to say Kate let us now there is that center. I visited the CDC operations in Atlanta. They have like this entire war room with all the screens up. They must already have some sort of packaged approach to going into a community because they have got all that set up. The question is what would we do? I think we would have to talk to the people already doing the work and say, what is missing? We don’t even know what is the gap.
DR. MAYS: Yes, and he wants it from the point of view of the data.
DR. RIPPEN: Actually again going back, not to plug the health group again, but again, if we are really talking about social determinants of health, and if we are really talking about kind of thinking about communities in not just a health systems philosophy, and what data may or may not be available, I think using maybe Flint as a test case as far as what information would be necessary. That might be interesting to say what are you using and what is missing. It might be a way. I don’t know.
MS. HINES: He did mention some of it is very idiosyncratic because someone found the lead pipe data on 24,000 index cards. That is not going to work. Then they went to some other source to find out where the lead pipes were. That is very specific to that event. I think he was looking more broadly at quick availability of a particular area.
DR. SUAREZ: I have worked with CDC on projects like Biosense and syndromic surveillance reporting and situational awareness type of monitoring. There are a lot of things really going on. My understanding was more about identifying not so much necessarily gaps, but perhaps more alternative and non-traditional methods of gathering feedback that could create an alert. CDC has a lot of monitoring around a lot of things.
Well, just like that, there is also alert mechanisms within say the internet basically all these methods to identify messages or what is the topic of the day. You see all these big topics that are going on the internet and different social networks. I mean, that sort of type of potential mechanism to help alert about a health event. Of course, health events are a little more difficult.
DR. MAYS: That is why I thought about the social media. Let’s just say there was the Zika virus. You thought it was transmitted in terms of rain or something. Then you see people tweeting this out. You would know, oh, we need to get this message out.
DR. RIPPEN: If you think about in this case lead exposure was because a physician was seeing a lot of high lead. If you look at where was the indicator, it wasn’t necessarily through social media. Now this is very powerful with regards to education and outreach, and also for other things like flu and symptoms and that kind of stuff.
Unfortunately, having done work in other emergency disasters at the national level and response, even though people have data, that doesn’t mean that people have access to data. Especially as it relates to different departments that have different data sources.
Again as we know, things are interrelated to other things. During response, trying to figure out who has what isn’t always as easy. Everyone owns a piece of a response. Not all of the pieces are actually activated at the same time.
Again going back to the leveraging things like social media, leveraging things that are traditionally surveillance, which is changes in any kind of measure, lab test. Going to changes in patterns of buying things, and then going to different datasets that are more traditional.
I don’t know who it was, if it was you, Josh or not, with regards to all the different datasets that are available like through Google map, through all of these other kind of sources is also important, too. Just like the bicycle, the Fitbit and tracking bike trails.
DR. ROSENTHAL: Like maybe people aren’t even aware exactly some of the pieces that are going on. I am putting this in a slide that I will send to everybody. If you go to ProPublica, you can buy for $2000.00 a CMS data set which is brilliant. It is cleaned up, and with perfect metadata and everything. That thing has been used a lot, a lot more than any interaction on that site. Guess what? That is actually powering Yelp.
Even in my day job, like we used part BDA and referrals. Guess what? US News just inked a deal, so that is going into their doc finder and their hospital finder. Just to be totally clear, the core datasets we are talking about have distribution like beyond what you are going to be able to track incoming straight-lined distribution, pure data sets made usable with mainline media and just consumers making decisions on it.
DR. MAYS: Josh, so that you know because Jim is gone right now, but he asked if you and Lilly would put together a little memo or blurb for him about where these federal data sit in other places that people go to. He would like to get a sense of that and kind of what is available, and kind of how you get it. Kind of give also the websites, so that they can look at it.
It is a push-out that he would like people to know about that this federal data is in all of these other places. He doesn’t think that is as well-known as possible. He asked if I would talk with you and Lilly. I will just kind of put that on your plate for now. Then we can talk about it later.
DR. SUAREZ: When you say all of these other places, you mean non-traditional places because there are all these other places like research. I mean, of course CMS data, you get it for research purposes every day.
DR. MAYS: He knows about that. But like for example, being on ProPublica, he doesn’t know about that. Or knowing how, when you go on Google, you can actually find some of the federal data that is on Google.
We always think that you go to either directly to the agencies or directly to these websites. Part of what data entrepreneurs have done is actually taken the data, and they have made it more appealing. If it is open data, they take it, and they make it very data-ready.
PARTICIPANT: Was he looking for apps or the websites?
DR. ROSENTHAL: No. Just a couple of things really quickly. There are a couple of levels to this. One is ProPublica took CMS data. I am staring at it right now in the ProPublica data store. They cleaned it up, and they are selling it. That is exactly what you are talking about.
But then guess what? They feed it back into Yelp for the Yelp ranking. In my day job, I literally do that. We are live with 75 million members across every provider type. That is using like 200 different datasets until we made it available. But then guess what? We linked to US News. When you go on and look, find the best doctor and find the best hospital in America, they are using it.
Sometimes we are making it clean for other people to use. Sometimes we have actually sold it into US News or into Yelp or into other places like that. You are not going to know it is federal data unless you are following that.
DR. MAYS: Now see, I think that is what he wants to know. Whether or not he wants to know about apps that actually make the data available, I think he would probably appreciate that, too. If it is not in the traditional radar of how you get it through the federal government, I think that is what he is interested in learning more about.
DR. ROSENTHAL: One other thing real quickly. I am looking at Alexa right now. I see that 30,000 inbound. There are a few people going to Data.gov and a very few people going to Healthdata.gov. Yelp and US News using this dwarfs that basically. It is a magnitude of order and more usability and more users attacking it than we are even aware of, like tenfold or a hundredfold or a thousand fold.
MS. HINES: I think you have just defined the issue that underlies this workgroup’s purpose. The question is what do we do about it? I think that answering the question, where do the federal data sit outside of the traditional going to the agency website is a fabulous research question that this workgroup ought to answer sooner rather than later.
DR. MAYS: Agree. No, I had a sense he wanted that soon. Let’s start with Josh and Lilly pulling that together. That I would like to have for the June meeting, so we can kind of go back and forth with that, and then share that.
Walter, what about the items that came up today in terms of NCVHS? One of them was to have a data visualization presentation at the June meeting during the time that normally the workgroup would present. We talked about helping. What it was is like helping the full committee get a sense of how to do things like infographics. Because I think we may have a person coming from Esri, so I can actually ask them about that. That is one of the things. That would be for the June meeting. What were the other things? Let me find my notes real quick here.
DR. SUAREZ: This will be a presentation to the workgroup or a presentation to the National Committee?
DR. MAYS: No, a presentation to the National Committee because we understand about infographics. That was the discussion where we were saying we can make you recommendations, but then having you really grasp them and do something about it sometimes doesn’t work.
When we were talking, I think it was Alex that brought it up. It is like, okay. We have reports. As thought leaders, we wanted to go further than just the report itself. How do we do that? That is when we were talking about the complexity of doing an infographic. We talked about introducing some of the issues and data visualization, and to kind of get the committee up to speed. So that in its reports, they can think about developing these things. Does that sound feasible still?
PARTICIPANT: Who would do the presentation?
DR. MAYS: Let me see. I think the best would be to get someone from Esri to do it. There is a person at Esri that I think is volunteering to join us.
DR. SUAREZ: The way I understand it, it is a presentation of our products and our sort of gathering of ideas into a report.
DR. MAYS: It would be kind of a how-to of when you have a report, how to go from a report to a visualization of what is in that report. It would really be about how do you do an infographic, so that you know what you need to do. I think it was you that was saying that even though you have a group at CDC, you can’t just go say, here is the report. Now produce an infographic. Instead, you have to give them greater detail. The discussion is do you want to start thinking about how to do that? In order to do that, it is like helping you have more information about what it entails.
DR. SUAREZ: That is something we can look into.
DR. MAYS: I think that we just need to make sure that on the agenda as we plan it, we have the time for it. To me, that is not critical. If it doesn’t work for the June meeting, then we can do it for another meeting. You need to look time wise.
DR. SUAREZ: I think that is what we need to plot. It is good that we have identified it as a topic to include it.
DR. MAYS: Exactly. That is why I am looking at Rebecca. Like as we plan, it may be sometimes the workgroup time gets a little tight. Anything else that we talked about? Anything else that you want that got talked about?
DR. SUAREZ: We talked about the data model part. We will continue that discussion. The relationship between the quantitatives around data modeling for the entire national committee and then the discussion that we just had about that applying to this particular workgroup. That is another one.
DR. MAYS: Having the framework people meet with Damon’s data leads. That is just something that we need to follow up on. It is kind of the pop and us. Damon has this group called data leads. They are very interested in a framework and roadmap. So we just need to hook them up. They don’t need us. That is the only other thing. I think Rebecca, you could do that with you.
Then our last thing is come up with tools for assessing sentiments about data. This was Jim’s point about, for example, how do people feel about health insurance? He wants to know more information about the tools that are in the background of data, often that tell you more about the context of the data, why people are answering the way the people answered. Are there algorithms in the background of collecting data that you can learn more about the sentiments of the data? That is for future times. I think that we can put off until either November or our first quarter of 2017 in terms of the plan. Anything else anybody? Josh, anything else? Hi, Paul.
DR. ROSENTHAL: Can I ask one other quick thing which we bring up all the time, and is totally silly? Nevertheless, I just will be remiss if I don’t do it. The green button idea, one of the original mandates way back in the old days from Jim was like, hey. Are there any other data sources we should be considering exploring? How could we deal with privacy, mosaic, blah blah blah?
We went through the idea of allowing users to opt in to share their data under different privacy restrictions. A little story, my sister has leukemia. I would very much like to contribute my data. I would waive my rights, blah blah blah. Maybe if there was a pool of other people who do that, we could actually pull the data stories out of that.
I know it is silly, but nonetheless, pretty innovative and different than what has been considered before.
DR. SUAREZ: That would be sharing it with whom?
DR. ROSENTHAL: I would contribute it to whatever target you wanted to, researchers only, deidnetified. The basic idea is if I am an entrepreneur, and I want to find things on leukemia, I can go QE and I can dig in and basically go into an enclave. Then I have issues that those people may not have actually contributed access, blah blah blah.
Let’s just say I will share with a generic research school. I will share with HHS. I will share it with an unrestricted, deidentified thing that has different privacy rights where entrepreneurs or researchers or community people who don’t come under standard security protocols.
DR. MAYS: Let’s put that on for discussion because that is pretty broad. I understand how it falls under access and use. But I have a feeling that would fall back on the full committee.
DR. SUAREZ: Well, if it is about donating personal health information to federal agencies to have them put into a federal agency database, that is one issue. Maybe this group is part of that discussion. Or it should be the place where that discussion happens.
DR. ROSENTHAL: I would even opt in to allow it to be made public deidnetified. This came out like way two years ago. This was originally when Todd came in, and he and I were talking about it. I am like before they actually kicked off the data group.
The basic question was could we make public data? If I am an entrepreneur, and I want to hit public data and I want to hit something interesting, and I have these limitations, right? Good limitations for good reasons, are there ways around that, to be blunt?
Also on the demand side, how do we engage consumers? Well, a consumer can get engaged by looking at his stock ranking. Maybe a consumer can get engaged by building an app.
Do you know what? If I want to help like leukemia research, I can go and do something with the research society. I can give it to the federal agency. I would love to make my data available and let a thousand flowers bloom to the entrepreneurs outside the agency and allow the agency to access about other public data.
MS. BRADLEY: Do you know John Willbanks? I think he was working on that with the Apple Research Kit.
DR. ROSENTHAL: There are a bunch of people toying around with it. There are private companies that do it privately. The point of the story is that this is something that pops up a number of times. We don’t have to dress it with ideas. Could we actually create a set design for public usage opting in on people who want to contribute? Maybe it doesn’t fall under it, but it is something that I would be remiss if I didn’t mention it again.
DR. MAYS: Okay. So it is on our agenda. It is either November or first quarter. Okay. I think we have covered a lot. I think that we need to kind of go through a little more systematically with the work plan. Lilly has captured everything. We will put it into place. Then we will look at it and reorder it if needed.
DR. SUAREZ: Lilly, I will send you and Vickie the template that we are using to plot all of these elements. There are three components of our agenda and work plan. One is topics and topic discussion points and discussion topics and things like that. The other one is activities, so if we have workshops or hearings planned. Then the other one is products. If we have a letter or a report to put out.
We have that organized by quarter over the next six quarters through the middle of next year. It would be helpful to plot into the items that we just discussed for the workgroup. We have them aligned and basically included in the packet that lists all the other subcommittees in the National Committee. I will send you that.
DR. MAYS: Okay. Anything else before I bang this gavel and let us run out the door? Anything else?
MS. BRADLEY: Vickie, did somebody volunteer to help us with the assignment on the blog? Or did you want to take care of that after?
DR. MAYS: Let me take care of that after because I need to talk to Damon and I need to talk to Josh. Josh is already doing something. We are going to send an email out with these tasks, as well. As we do, we want to get some other people signing onto it. That is also one of the things that we need to do.
Okay. I want to thank everybody for their time. I want to thank you very much for your ideas and staying and working through this. It is greatly appreciated on behalf of the workgroup. Hearing no other comments, I am going to bang the gavel, and we are free to go.
(Whereupon at 5:00 p.m., the meeting was adjourned.)