The development of large public databases by government ministries, departments, and agencies (MDAs) has been ongoing in earnest in many countries around the world since at least the 1990s. The most basic of these government data systems are registers, supporting a range of government services, such as health insurance, social security, vehicle and business registration, and census-taking among many others. These registers form the basis of numerous vital public services whether the services are delivered electronically or not. Other systems are layered on top of these registers in order to support decision-making, planning, and policy-related research. To function well, many of these systems reside behind rigid security and multi-level authentication and authorisation protocols as they regularly contain very sensitive personal information about citizens.
Data about health is often considered some of the most sensitive information collected and held by governments and institutions. Yet over the last decade, there have been a number of initiatives focused on open data in the health sector. Broad et al. describe open data as “data made available by governments, businesses, and individuals for anyone to access, use and share”.1Clearly this should not apply to the detailed personal information within health registers. So, when it comes to open data and health, it is paramount to understand the particular data held within each system, to think carefully about the levels of access that different stakeholders may want or need, and to determine how, or whether, the data may be safely anonymised prior to publication as open data. To do this, it is useful to consider a data spectrum for health, to enumerate the different stakeholders creating and using data, and to consider the challenges they must overcome before open data in the health sector can evolve from being a minor sub-community and enter the mainstream.
The spectrum of stakeholders and data access
Keen et al. (2013) state that government MDAs and private firms coexist, often exhibiting a dichotomous relationship between public and private interests in the national health system and the data therein. The following broad categories of actors can be identified within the national health system: the state, private-sector firms, citizens/patients, doctors and other health professionals, researchers, and a broader diaspora of interested parties, including health charities and journalists. All these actors, as illustrated in Figure 1, have the potential to generate data that could be accessed and used within the health sector, and all may also be users of data generated by other actors.
Different actors seek to use data for a variety of purposes. In particular, users seek the data from registers, for example, to access and update information about individuals. They also look for data to support operational requirements, such as organisational planning and decision-making, and to improve efficiency and effectiveness of services, as well as to analyse for research purposes to inform policy and practice development. Data may also be used by patients to locate and access health services.
By examining how different uses of data are currently regulated, it is possible to identify a spectrum of data openness ranging from closed data with highly restricted access through to data that is openly published in reusable formats. Between these two ends of the spectrum can be found planning and decision-support data with ranging levels of restriction on access and reusability as illustrated in Figure 2.
The level of openness of data, who it is shared with, and in what level of detail, should also vary according to circumstances. For example, it may become vital to understand who and where patients are located in the case of an outbreak of a deadly disease epidemic, but detailed information may not be necessary when citizens engage civic leaders to mobilise resources for local health centres to be established.
This chapter will examine how to ensure health data is effectively placed on this continuum depending on its intended use. While the focus of this chapter is on the open data end of the spectrum, where individual records are generally only available in anonymised or aggregated form as Figure 2 indicates, the potential uses for open data often overlap with the needs of stakeholders who might also have access to shared or even closed data. This is an important point to be made as it affects the politics behind the open publication of health data. The challenge of working out where certain datasets should fall on this data spectrum is further compounded by advances in computing technologies that could potentially enable the deanonymisation of sensitive data on individuals.
The state of (open) health data
Data availability and use: Laying the foundations
Information technology has been the key process automation enabler in government, which has led to more and better data, and has dictated areas for integration in order to bolster efficiency in service delivery.2E-government both improves the quantity and veracity of data. Examples of e-government in health services include, but are not limited to:
- National health insurance schemes.
- Health registries (births, deaths, treatments).
- Electronic health records (patient records inputted at facilities by medical personnel).
- Electronic prescriptions.
Progress toward implementation of these systems has varied, but in Estonia, for example, 99% of all prescriptions are now electronically issued by doctors,3creating a potential wealth of data about prescribing practices.
The different political landscapes from country to country have an influence on which health programmes are prioritised by governments and the stage of development of the supporting data systems. For example, in Kenya, the health sector has been devolved so as to be able to offer more resources for better services to citizens at the subnational county level. However, this does not necessarily have to lead to poor data integration. In Kenya, the Health Data Collaborative,4established in 2015, provides a framework that stipulates how partners (international agencies, the United Nations, governments, civil society organisations, philanthropies, donors, and academics) engage and align data initiatives with the common aim of improving health data. Similar health data collaboratives exist in Tanzania, Malawi, and Cameroon.
The World Health Organization (WHO) hosts the Global Health Observatory,5which is a one-stop portal initiative where countries share both their health data and health priorities. Various countries are moving to implement their own national health data observatories or portals, and the scientific community is moving toward the adoption of common open data principles as evidenced by a number of platforms making clinical trials data available6and scientific journals, such as the British Medical Journal, campaigning for more open data publication.7
In many countries, the health sector has seen significant investment in capacity building over several years. For instance, the District Health Information System 2 (DHIS2)8is used in many countries as the national Health Management Information System (HMIS) to collect, manage, and analyse health data. At the time of writing, the open source DHIS2 software is used in over 40 countries in Africa, Asia, and Latin America, and countries that have adopted DHIS2 as their national HMIS software include Kenya, Tanzania, Uganda, Rwanda, Ghana, Liberia, and Bangladesh. The core development activities of the DHIS2 platform are coordinated by the Department of Informatics at the University of Oslo,9supported by the North American Aerospace Defense Command (NORAD); the President’s Emergency Plan for AIDS Relief (PEPFAR); the Global Fund to Fight AIDS, Tuberculosis and Malaria; United Nations International Children’s Emergency Fund (UNICEF); and the University of Oslo.
The introduction of HMIS software does not, however, automatically lead to good data quality. Processes often require data to be manually transcribed from paper into computer terminals at the health facility level before it can be captured and collated in the HMIS, where regional and national health management teams review data for quality. A review stage can impact the timeliness of data and its availability for operational decision support, with some delays of up to six months before data is made available at the local facilities where it originated.10
When data is keyed in by health workers solely for the purpose of reporting to administrative agencies, there may be limited local ownership of the data and, as a result, limited investment in its accuracy. There can be tension between the creation of systems that support doctors and clinicians in their day-to-day localised work and systems that emphasise centralised reporting. Arguably, a focus on open data availability can place extra emphasis on centralised reporting, with MDAs pushing healthcare providers to enter as much standardised information as possible. However, if system architectures do not give local stakeholders access to the information they need for planning and prioritisation, they can ultimately lead to expensive, error-prone, and patchy data.11One remedy for this comes through the use of automated data collection systems, relying on data created at source from digital keypads, mobile devices, and user interfaces that eliminate the need to transcribe from paper in the first place.
In summary, initiatives at the international, national, and subnational levels are actively encouraging health programmes to improve data management. These initiatives cover not just the creation of data, but also focus on strengthening the use of data by targeting monitoring and evaluation processes. This suggests that, although there may be a long way to go in terms of data quality in some settings, the right steps are being taken toward a strategic approach to establish a conducive environment for leveraging data (UNECA et al., 2016) as evidenced by:
- Legislative and policy reforms that will allow for harnessing data.
- Significant investments in information technology, tools, and infrastructure.
- Greater collaboration and coordination among health stakeholders.
- Investments in administrative data collection and use at the subnational level.
- Supporting and resourcing national statistical offices as key facilitators and drivers of national data ecosystems in their respective countries.
However, much of the focus here is on data use within a single stakeholder group or the use of data shared securely between two particular stakeholders. When it comes to opening up data for wider use, a number of gaps and challenges emerge.
Available but not accessible: (Missed) opportunities
Data from the 2016 edition of the Open Data Barometer indicates that health sector performance statistics exist in 98% of countries surveyed and are available in some form (such as aggregate tables in print or via PDFs) in 85% of countries, but only 7% of countries had openly licensed and machine-readable datasets.12
To allow for the maximum range of use when datasets are made open, they should be disaggregated to the lowest levels of administrative geography possible and split by gender, age, income, disability, and other categories. Many governments have made commitments to opening up datasets via their own open data portals, often included in the National Action Plans submitted under their membership in the Open Government Partnership. However, often data that exists in national HMIS remains locked away in countries where they are deployed, and few portals host statistical datasets on health that contain full details. When health data is published, it often does not meet the level of detail demanded or it is too outdated to meet the needs of users.13Although platforms like DHIS2 could be configured to generate regular, anonymised exports of data by using application programming interfaces (APIs), it appears this is only rarely the case (Tanzania’s HMIS portal being an interesting exception)14 For example, while the DHIS2 demo shows the location of all health clinics in Sierra Leone, the national open data portal gives no clue that such data even exists, nor does it provide links to the regularly updated dataset.15
For academia, particularly in Africa, the use of data to generate scientific output has remained very low (overall scientific research output is less than 1% of global research), limiting key opportunities for locally driven research that could address key development challenges.16Alongside the limited quantity of open data, the usability of open data platforms also limits discovery and the uptake of data. In the example of the Kenya Open Data Initiative Platform, usability experiments revealed that more than half of the users found it difficult to navigate and could not find the information they were looking for via the platform.17Where data is found and used for research in Africa, there are further challenges related to the ecosystem for knowledge dissemination, with much of the research published in non-indexed journals or left in unpublished dissertations.18Although there is more data being generated inside public and private health services than they can analyse themselves, the potential for external stakeholders to get involved in working with this data is currently almost entirely lost.
Increasingly, there is a push from data communities, including the open data community, to engage with policy-makers and other stakeholders to ensure that decision-making is driven by data and research. There have been successes in this regard; however, much remains to be done as evidence is often not a driving factor in decision-making. Many governments will grapple with other considerations, such as budgets, politics, and development partner priorities when it comes to resource allocation,19and these decisions can be as basic, as, for example, “Do we buy SMS bundles to disseminate information to patients, pay our staff, or buy additional hospital beds?”
As already noted, the lack of a supply of fresh data, especially from government as the key source of official statistics and operational information, has led to limited progress in developing open data initiatives in health. To date, many seem to have fallen short on scalability and sustainability. This can be attributed in part to failures in identifying high-priority use cases for health data that are driven by demand from multiple stakeholders, which will serve to embed open data initiatives within the wider data ecosystem. The integrated approaches illustrated through the examples in the box on what happens when health data is open are, at present, the exception rather than the rule. As a result, projects have often failed to actualise value through visible results that could lead to continued investment and development.20To make sure more opportunities related to open health data are realised, policy-makers, practitioners, and funders will need to address three key challenges.
What happens when health data is open?
The following examples illustrate the potential of open health data:
Maternal mortality in Mexico: working with the Government of Mexico, the Data Science for Social Good programme at the University of Chicago has explored how available datasets can be leveraged to support reductions in maternal mortality, a key target of the Sustainable Development Goals (SDGs). Researchers, working with a combination of open and shared data, explored how analysis at the regional level could present a more granular picture of how current interventions may be working.21
In Uruguay, A Tu Servico has taken data on healthcare provider performance and made this accessible to citizens, supporting them to make better decisions during the annual one-month window when Uruguayans can choose whether or not to switch healthcare providers.22Data made accessible through the site has been used by politicians, media, and by over 35 000 citizens (more than 1% of Uruguay’s population).
During the Ebola outbreak in Sierra Leone, responders made use of HDX, the open data Humanitarian Data eXchange platform, to bring together up-to-the-minute data from different stakeholders, visualising the results through open mapping tools.23The Ministry of Health and Sanitation released geocoded data on health facilities, while others released data on ebola cases and current organisational responses. Multiple stakeholders used the data to identify the regions that needed the most urgent medical supplies. Using an open data approach reduced the friction on data exchanged during this crisis situation.
Ready for impact? Three key challenges
As explored in the previous sections, technology has been a key driver of e-government and has resulted in substantial growth in the amount of health data available. The coming decade could see further dramatic developments in the use of technology in healthcare, and, consequently, the rapid expansion of data availability, especially with the trend toward big-data enabled healthcare. Potential open data users must be prepared for this expansion, while also ready to address the critical need for information governance. Most importantly, the orientation of open data projects must move from analysis to action to create an evidence base that can reveal the different components needed to secure meaningful impact on the health system.
Working with big data
The potential for big data to improve health outcomes and create new revenue streams and complementary services has often been acknowledged.24One of the trends emerging as the healthcare community recognises the potential value of the data generated by advanced medical equipment is “servitisation”. In commercial circles, servitisation describes the trend in the business of companies moving from selling goods to selling “bundles” of goods, services, support, self-service, and knowledge. These hybrid product-services place the emphasis on the service component and have a much heavier reliance on data,25creating new potential opportunities, including economic, social, and environmental efficiencies. In this new world, for example, expensive MRI scanners are constantly monitored and repaired by a service firm, while older models can be acquired by health systems with smaller budgets, such as MDAs in developing countries. Consumer technologies also now collect a wealth of data that may be of value to healthcare stakeholders with mobile phones and fitness trackers recording countless data points every day.
However, before healthcare stakeholders can realise the benefits of big data (including large anonymised open datasets), there are a number of prerequisites:
- Infrastructure that can handle the required storage and analytics as managing large datasets can be complex and expensive. This infrastructure also needs to allow stakeholders to determine how and when data should be disposed of when it is no longer of value.
- Access to data for external stakeholders, recognising it is often not the government agency which collects it, but other stakeholders who have the skills and resources to create new value from data.
- Integration of data from multiple systems, including the ability to connect new streams of big data with systems that are still using brittle legacy architectures.
- Connectivity to high-capacity internet. This has a huge impact for the developmental potential of health data in environments with poor connectivity.
Even if open data approaches enable access to data that is generally more evenly distributed, the capacity to use it may not be. More attention must be given to who ultimately benefits and whether healthcare inequalities might be challenged or reinforced. As a result of servitisation and the other broad trends in the delivery of healthcare, private firms (hospitals, banks, insurance) and civil society organisations are increasingly in possession of data that can also contribute to national or government healthcare objectives, even though the data may not be of great utility to the organisations that have collected it.26This draws attention to the non-state actors who are collecting important data that could be used to complement state data. Discussions about legal reforms that could allow privately generated data to contribute to official statistics have already begun but are mostly ongoing, and major advances have not yet been realised.27However, some of the recent literature has expressed concerns that this kind of public–private data sharing may reinforce relationships between state and private sector actors and weaken the power and positions of both citizens/patients and professionals.28Working out what should be shared beyond the private–state axis and how more data should be open to researchers and citizens to use remains a vital task. The success or failure of open data in health may largely depend on how the question of trust between organisations is addressed as big data flows continue to develop. This is ultimately a question of information governance.
Information governance and regulatory frameworks
Open data is not just about technology. It involves a mesh of people (with newer technologies implemented mostly in a piecemeal fashion), processes (policies and guidelines), culture (changes in attitudes, behaviours, and practices), and legacy systems (including existing IT infrastructures).29This “ecosystem” produces complex dynamics around data. For example, published data does not remain static. It can keep changing continuously with new fields introduced or integration with other related datasets, including those from non-health sectors, which also bring new challenges, namely the potential negative consequences from privacy breaches or from unethical research.
Many health problems are highly personal and patients need to be confident that their conversations with doctors and other professionals are confidential. While the data is important for treating the patient (primary use at administrative or operational levels at the facilities), secondary uses, such as medical research or planning health services, may pose a challenge. Striking a balance between primary and secondary uses of data is increasingly difficult because modern technology makes it possible to combine data and identify individuals through statistical inference.30This provides one of the regulatory paradoxes of open data in health: the more details a dataset contains, the more valuable it is (for example, to detect patterns of health inequality), but also the greater the likelihood of identifying individuals and disclosing sensitive personal information.
The European Union’s General Data Protection Regulations and the data provisions of the United States Health Insurance Portability and Accountability Act (HIPAA) try to provide frameworks to address the security and reuse of data on individuals, but many countries still lack suitable legal frameworks (see Chapter 23: Privacy), and questions still remain around the appropriate reuse of personal experimental data in research-like activities.31When there is a lack of clarity between closed, shared, and open data, citizen trust may be undermined. This was evident when the Government of the United Kingdom proposed a data-sharing framework in 2013 for medical records from the National Health Service (care.data) using the language of “open data” even though the scheme would not have published individuals’ information under an open licence.32After a backlash from citizens, the scheme was cancelled, and awareness and opinions about open data were also tainted.3334
From use to action
Even in the absence of the socio-technical infrastructures and governance frameworks needed to identify what and how increased health-related data can be made open to academic and citizen stakeholders, there have been, as noted above, cases where health data has been available, accessible, and used; however, these cases have not always led to long-term change.
There is need to move from just data release to action. Although open health data may build transparency, if there is no real commitment and accountability for the use of evidence in decision-making within government, then effective adoption and use of data will not occur. For example, when citizens report on poor service delivery at a health facility and feedback is not acted upon, enthusiasm for data understandably wanes. The converse is true when data is visibly acted upon. In Swaziland, UNICEF’s U-Report platform is used by the quality assurance teams within government to perform customer satisfaction surveys using a free short message service (SMS). Given the cultural context, a client might not provide clear feedback on what the problem was with the services they have obtained from a facility, but, with SMS, they are anonymous, and they might even mention names of those who have caused problems at the facility. Actions undertaken in response to this information are clearly evident to the client, and as a result, they are even willing to pay for the SMS.35Getting from data use to action requires relationship building and the development of products that can scale and be adapted to different healthcare environments. As the Prescribing Analytics case shows (see box), it can be a long journey between discovering the potential for change in health services using open data and seeing that change realised at scale. At present, few initiatives outside of academia may have access to the funding needed to pursue these longer-term programmes. Expanding the number of stakeholders (funders, academia, technology innovators, medical charities, governments, etc.) who are able to invest the necessary resources, and work collaboratively to take open data initiatives from proof-of-concept to full implementation, is vital.
Case study: Prescribing Analytics
The Prescribing Analytics website36was created by a group of open data enthusiasts, companies, and researchers at a 2012 “NHS Hack Day” event. The project used newly released prescribing data from doctors to look for potential cost savings from prescribing cheaper drugs, identifying GBP 27 million a month potential savings from changing the approach for one drug alone.37Unsurprisingly, this single finding did not change doctor behaviour. Indeed, the problem of expensive drug use had been reported as early as 2006 using other data sources; however, the project team has gone on to develop the Evidence-Based Medicine DataLab38at Oxford University, as part of the Open Prescribing project,39which provides data, tools, and email alerts to doctors to help them find clinic-level cost savings and prescription improvements. This journey from idea to implementation of a platform tailored to the needs of key stakeholders highlights the movement from data release to impact and the need for longer-term research on the potential impacts of open data in health.
The International Open Data Conference (IODC) brings together a few thousand people every two years. Major healthcare conferences may have ten times that many attendees to discuss research, products, and innovations, most of which have a data component. Over the last decade, open data has made some inroads into the medical science community; however, concerns over privacy, infrastructure, and the challenges of creating trust and sustainable projects based on open health data have made limited progress. Yet, there is much for the open data field to learn from the health sector as it forces continuous engagement with issues related to personal data, ethics, and the interaction of different stakeholder groups.
This chapter has started to sketch out distinctions between different stakeholders and the different approaches to data sharing, as well as to highlight challenges arising from a private–public nexus of data sharing that could exclude citizen access to data. However, much more needs to be done to bring clarity to the health and open data discussion. Lumping together administrative data for decision-making and longitudinal data for research purposes can frustrate progress. This is because the goals of the stakeholders are different: some are focused on health planning and policy improvements, whereas health facility managers are mostly interested in day-to-day patient management. Building infrastructure capacity will be an ongoing issue as the technical foundations to produce and use open data vary substantially around the world even if all regions are heading toward increasingly digitised healthcare.
Perhaps when we look back on open data and health in the next decade, we will have a much clearer framework available to understand the different potential applications from policy and epidemiological research through to enabling decision-making by patients. Ultimately, the search for innovation should continue with a broader view of real-use cases and examples of stakeholders that have been able to access health data, build services, or develop policy, and then make the impact sustainable.