The State of Open Data - Histories and Horizons - Conclusion and Recommendations


The State of Open Data project set out to explore how effective open data has been in addressing challenges related to social and economic development and to democratisation around the world. In this concluding chapter, we contemplate the impact that open data has had in addressing these challenges, reflect on the current strengths and weaknesses of the open data movement, and set out a series of recommendations with the goal of strengthening the future contribution of open data to sustainable and democratic development.

Meeting the challenges

Our look at the history and horizons for open data across sixteen sectors, seven regions, and the work of seven different stakeholder groups provides numerous examples of open data deployed as a tool for change. Open data has become a key element of the policy toolbox and proven its value in fields as diverse as agriculture, anti-corruption, and environmental research. A proliferation of pilots and prototypes have turned into ongoing projects and initiatives, working to establish new data infrastructures for corporate governance, transparent public procurement, or monitoring progress toward the Sustainable Development Goals (SDGs). In a number of cases, these initiatives can point to solid results, such as supporting environmental research, increasing access to healthcare, and enabling improved humanitarian coordination.

Yet, those seeking a quantitative measurement of effectiveness may still not be satisfied. To a substantial degree, this can be attributed to a lack of research that goes beyond ad hoc case studies. Few, if any, of the open data interventions described in this book have been subject to rigorous independent and longitudinal impact evaluation, although, increasingly, decision-makers seek this kind of robust evidence. Although it might be argued that it is still too early to talk in depth about measuring impact, the current evidence gaps, when set against early claims of large economic or social impact to be secured from open data, have undoubtedly fuelled perceptions that open data has not lived up to the hype.

We are more optimistic. Recognising Amara’s adage that “[w]e tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run”,1we see evidence that many open data communities have been putting solid foundations in place for future impact, and are, in many cases, quietly creating substantive change with relatively meagre resources. In a number of instances, practitioners have made rapid progress in addressing the data components of a problem space but have been much slower at converting insights gathered from the use of data into policy and operational decision-making. For example, back in 2012, open data on doctors’ prescribing behaviour enabled new analysis which identified the potential savings to the United Kingdom’s health service from the better use of generic drugs, but it has taken another six years to develop the tools, processes, and communication approaches needed to present this information to decision-makers for action. We should keep in mind the complex contextual elements of effective open data interventions23and the time needed to secure alignment of all factors toward desired outcomes when seeking to track the extent to which open data projects are achieving impact.

The increasingly sectoral focus of open data work also provides an opportunity to strengthen collaboration between open data generalists and domain specialists. There are few contemporary social, economic, or democratic challenges where the solutions will not have at least some data component. Although it may initially appear easier to forgo openness and use data to address problems inside of organisational silos, evidence from across this collection suggests that the additional effort required to incorporate effective open data approaches can create a wide range of opportunities for innovation, collaboration, and value added. The question now for the broad open data movement is how to scale up and sustain the stakeholder engagement, infrastructure building, governance processes, capacity development, and cross-community networking that appear central to successful long-term open data initiatives, while not losing sight of the value of simply making data available and progressively enhancing its usability and usefulness over time.

Analysis and recommendations

The different lenses used across this book (sectors and communities, cross-cutting issues, stakeholders, and regions) provide a range of vantage points from which to assess the state of open data. They reveal sectors and communities at different stages of development: some sectors embedding open data ideas, others with a solid but marginal open data community, and still others where open data ideas are yet to move to the fore. They outline how each stakeholder group has a unique contribution to make, and they reveal some of the tensions that may characterise work in the coming years, such as between governments focused on institutionalising reforms, looking to prioritise engagement with larger established civil society actors just at the time when civil society is being encouraged to create space for smaller and more agile actors to engage with open data. The chapters reviewing cross-cutting issues reveal how open data movements have, over the last five years, engaged more and more with critical questions of data literacy, privacy, and gender equity, while, at the same time, revealing unresolved questions of how open data initiatives should respond to calls for Indigenous data sovereignty. Finally, regional chapters show how the state of open data varies between and within regions, leaving us with a number of different narratives at play within the broad global open data movement and different cultural and linguistic communities placing emphasis on different aspects of open data.

So where does the broad open data movement go from here? For all the differences across regions, stakeholders, and sectors, we believe there is a strong case for continued development of distinct work on open data as a key input into discussions of the future of data in our societies. We suggest a number of key areas for action relevant at this particular inflection point in the development of open data ideas and practice. The 12 recommendations below, three for each of the audiences identified in the introduction, are not exhaustive and are offered in addition to those put forward in particular chapters. Our hope is that they provide useful guidance in both strengthening existing work and identifying new avenues forward to secure the role of open data as a tool to meet sustainable development challenges.

Practitioners: Think politically and increase inclusion

Power is a recurring theme throughout the chapters in this book. Data is seen as a source of power, and advocacy for open data as a strategy to redistribute that power. There is clear evidence of this working in practice, whether enabling new market entrants to challenge established businesses providing geospatial data or supporting researchers to independently model extractives deals and reveal how countries could be securing higher revenues from their natural resources. However, there has also been growing recognition that, in conditions of unequal access to the skills and resources to work with data, or with wider existing patterns of exclusion and disadvantage, opening up data may not always lead to desirable outcomes. Indigenous scholars in particular have outlined how the models through which open data is put into practice often ignore important considerations about community rights to privacy and the control of data, failing to take into account the systematic biases built into existing government datasets.

It is evident from the work on corporate ownership and beneficial ownership disclosure that in some sectors, the open data community is at the heart of working out how to strike the balance between privacy and publicity, as well as between ad hoc data publication and the creation of high-quality datasets. This work is inherently political. While open data work has, for a long time, been able to maintain a broad-based appeal to actors across government, civil society, and the private sector, future projects will involve negotiations over the shape of data infrastructure that may make keeping everyone on board in a broad coalition much more challenging. We believe that unpacking the particular governance challenges in individual sectors, the specific interests and agendas of different stakeholders, and the range of dynamics in different regions, is essential for open data actors to be able to think more politically in the future. Future work must recognise that, while some aspects of open data practice can be positioned as a general public good, other aspects will entail maintaining and communicating a clear position on the kind of development to be pursued in order to build support.

The politics of data is particularly evident in debates around urban governance and “smart cities”. Within the urban macrocosm, choices are playing out with regard to who will build, own, and operate data infrastructures, and how open they will be. As Landry explores in Chapter 16: Urban development, while open data is likely to play some part in the future urban landscape, there are other competing technological visions, focusing on centralised big data systems, corporate-owned sensor networks, and algorithmic decision-making. In general, open data discourse still reflects its origin in opposition to government data hoarding, yet the new landscape of data collection and use in 2019 calls for a re-articulation in opposition to emerging models of data power in our societies. In summary, practitioners need to:

Engage with the politics of data. Thinking politically involves considering the agency and agendas of those who create and use data and exploring the opportunities and constraints that affect their actions. It involves addressing collective action problems and building coalitions, recognising that there may be different ideologies and interests at play and particular data-use projects may not please all stakeholders all of the time. Acting politically may, at times, involve the need to look beyond the technical quick-wins to the real change needed in the longer term and also require a greater diversity of people and organisations to solve a given problem.4

Understanding history and context is essential to support a more political approach, and this calls for deeper collaboration between open data practitioners and long-established civil society organisations.

Prioritise inclusion and equity. Data is a team sport.5If work on open data is going to help deliver on the SDGs, it is important to build diverse and inclusive teams. This may not always be a comfortable or easy process as it may involve questioning assumptions or creating new shared organisational cultures. Brandusescu and Nwakanma, in Chapter 20: Gender equity, explore how both equity and inclusion involve questioning whose reality is represented within available datasets and who undertakes the labour or directs how data is used. Ultimately, every practitioner is potentially capable of sharing or distributing the power of data and data analysis in some way by taking action to support inclusion.

Provide renewed leadership for openness. Communities and movements benefit from engaged leadership on all open data initiatives from local open data projects to global sectoral collaborations. When, as Howard and Constantaras (Chapter 27: Journalists and the media) describe, democratic governance and openness are under threat on a number of fronts, the open data movement needs renewed leadership that can bring together technical and political agendas and ensure that the many individual open data projects working to address particular sustainable development challenges add up to more than the sum of their parts. The horizon for open data needs to represent more than just a narrow technical strategy. To avoid losing its transformative potential, the multipoint open data movement of the future requires diverse leadership that, rather than being distracted by the latest technology, revisits and reinforces the core values that underlie work on open knowledge and open data.

Policy-makers: Pick a problem to solve

Eaves, McGuire, and Carson, in Chapter 35: North America, Australia, and New Zealand, argue that open data has been central to the development of wider governmental discourse around data and analytics over the last decade. Although many policy-makers are now looking at a much wider spectrum of closed, shared, and open data, it is not unreasonable to suggest that without the last ten years of open data, policy and public discourse would be far less equipped to respond to the rise of algorithmic governance, AI, and concerns about invasion of privacy through the use of large private datasets. In taking government data outside of its own “black box” and giving both government officials and citizens greater awareness of what data can do, open data initiatives have helped many people to hone their technical intuition in important ways.

However, as we look to the decade ahead, emphasis is shifting from “open by default” to “publish with a purpose” and to the strategic use of open data. Policy-makers are starting to understand that their role is not just to release data, but also to play an active role in governing data infrastructure and use, making sure, as Ubaldi describes in Chapter 26: Governments, that distinct streams of work on regulation, technology, engagement, and value-creation are aligned under a common vision and strategy. There is a risk, however, that further institutionalisation of open data policy will lead to some of the generative space around open data being closed down, re-creating government as gatekeeper, rather than as platform provider and engaged collaborator. Publishing with a purpose should be introduced alongside, rather than instead of, the default of “raw data now” which can be improved over time (at least for non-privacy impacting datasets). This requires policy-makers to adopt parallel tracks of activity that:

Embed open data approaches in problem solving. Any time an organisation is commissioning a new data system, reviewing the data it collects, or seeking to carry out data analysis, it should be able to consider: (a) the existing data it may be able to access from others; (b) whether the data to be collected and shared could be provided openly and meet common standards; and (c) the outside actors who might be engaged as partners in working with that data. This may involve taking open data ideas further out into sectoral communities in the attempt to integrate open data into domain-specific expertise.

In short, open data should no longer be the sole responsibility of specific data or technology teams, instead it needs to be considered a methodological element across different policy domains and initiatives. This should come with the recognition that, while effective open data approaches do require a level of technical skill, they also involve complementary work on strategy, governance, collaboration, and public communication, and frequently need vision and leadership that can bring together different stakeholders.

Maintain space for innovation and civic engagement. Many of the successful applications of open data explored in this volume, from transit apps to budget data websites and air quality monitoring platforms, were not conceived of as part of top-down policy initiatives. Instead, they emerged when interested parties identified a need and were able to discover open data that could help address it.

It remains central to the value proposition of open government data that the proactive publication of “open by default” data can enable outside parties in civil society, the private sector, and other government agencies, to find value in datasets in ways not foreseen or realised by the original data owner or steward. As data availability, quality, and more widespread data literacy develop, it will also be important to revisit and renew engagement models, such as hackathons, open data day events, and networking activities, to increase their inclusivity, while embedding the idea that open data is not only a tool for government-led problem solving, but can also create new civic spaces that can be used to connect different stakeholders and communities.

Explore openness as a golden thread across data policy. Open data is sometimes seen as competing with, or in opposition to, other important data agendas around privacy, machine learning, and data analytics. Yet each of these other areas of data policy can be approached with a focus on openness. For example, much open data and privacy activism has shared roots in a common concern about who has access to the power that data creates, and the open data field has already developed tools to strike a balance between open, shared, and closed data. Beyond this, open data may be a tool to bring greater transparency to questions of how data is being governed and who has access to certain sensitive datasets. In AI policy, governments have widely recognised the role of open data as a source for innovation, and there is space to further explore the role of open data in supporting the independent audit of algorithmic outcomes.

Ultimately, although some groundwork has been laid, the narratives that treat open data as a strategic tool for problem solving and that link open data with other areas of data policy remain underdeveloped. Policy-makers have a key role to play in promoting the integration of open data into other areas of work, while not losing sight of the simple idea that the data governments collect should, whenever possible, be shared as a resource for all citizens.

Researchers: Rebooting the research agenda

The global spread of open data has been tracked by studies such as the Open Data Barometer (ODB) and Open Data Index; however, both the ODB and this collection are highly reliant on case studies of open data use in order to generate an understanding of impact. This can lead to a degree of bias in the evidence base, capturing more about case study research that has been funded, rather than producing a representative picture of impact on the ground. Although substantial work has taken place on input variables affecting open data projects (described as enabling conditions and disabling factors in Verhulst and Young’s useful periodic table/logic framework),6much less has been written about the range of impacts that open data has beyond those captured in a supply-demand-use framework or about the comparative benefits of adopting open data problem-solving strategies. Smith and Reilly (2013) describe openness as “a complex process, not a state”,7and the chapters in this volume illustrate that the complex processes that lead to data use also enable and support new relationships, collaborations, and strategies for sustainable development. These must be better captured in future research.

We also note that research on open data faces particular methodological challenges. Because open data is by definition available for anyone to reuse without permission, gaining an accurate sampling of all those affected by data release can be extremely challenging. Our hunch, albeit a hard one to confirm based on current research, is that the impact of open data has been substantially underestimated in case studies, since many beneficiaries of open data will simply be unaware that the information they have used for research, advocacy, or decision-making is only accessible as a result of open data policies. Current case studies generally only capture projects that are conceived of as open data projects in the first place, when project architects have opted to frame their work within open data narratives, rather than identified as work undertaken to address social, environmental, and political challenges and that has benefitted, knowingly or otherwise, from open data interventions. We also note that, just as Van Schalkwyk describes research (see Chapter 30: Researchers) as an output that may be discovered and able to influence policy many years after it was first published, open data that, at first, may appear to have limited value, may end up having a much longer and more useful life.

As a result, we see the opportunity to reboot the current research agenda around open data, such that researchers:

Document the history of open data initiatives. Moving forward more strategically will benefit from a greater understanding of the past, particularly at the end of the first decade of open data. Studies of open data initiatives in their historical, cultural, and political contexts, that provide accessible documentation of the journey to date, can be instrumental to support reflection and learning and to enable new actors to become informed, involved, and empowered participants in the future of the open data movement.

Compare open and non-open models. In light of the broader data agenda that has developed over the last decade, instead of asking “What was the impact of open data?”, research should place more of a focus on “How do open data strategies compare to shared data or closed data approaches in addressing sustainable development challenges?” Methodologies should consider the potential added costs or added value of open data approaches in order to support better resource allocation decisions.

Improve quantitative evidence through natural experiments. For example, when historical data is also made available as part of open contracting interventions, it provides an opportunity for a natural experiment, comparing procurement outcomes before and after data was openly available. Although the robust quantification of outcomes from a particular open data intervention does not provide generalisable evidence of open data impacts, it is essential for testing the null hypothesis and establishing a level of confidence in open data theories of change.

Rebooting the research agenda also needs specific action to address the current disconnects within the research community that Van Schalkwyk diagnoses (see Chapter 30: Researchers) in order to build better interdisciplinary and cross-continental research collaborations. We note that the resources for much of this work may not, however, come in the form of open data funding per se, but instead from support for thematic and sectoral research agendas, where work to understand the role of data could and should pay particular attention to the openness of the data or the potential for open data to directly support project goals.

Funders: Mainstreaming, movement building, and data literacy

Funders have a pivotal role to play in shaping the next decade of a global open data movement. Bringing together different stakeholders around common data challenges and building data infrastructures to support the public good will not happen at scale without continued support from donors and investors. However, it is clear that open data has passed its peak on the hype cycle and that funders whose support was based on open data as an emerging technology are rapidly shifting their focus to other areas, such as AI. Although, in some areas, open data work has garnered support from a range of funding sources, including sector-specific funders or programmes, in other areas, open data work is more vulnerable to a complete loss of funding based on a dependency on just one or two open data-focused funding schemes. There is already evidence to suggest that, if not well managed, competition for funding between open data organisations will drive further fragmentation and undermine more collaborative ways of working. There is a delicate balance to be struck between the need to support sector-specific funding programmes to identify how open data can help to deliver their project goals and the ongoing need to ensure that the core of the open data movement does not become hollowed out, undermining the essential need for continued knowledge sharing, innovation, and the development of open data techniques, ideas, and approaches.

While many funders have, in recent years, explored a shift from focusing on open data supply to looking at open data use, many chapters in this volume also call for an increased and broader focus on data literacy. Data literacy is not just about open data, but open data can be an invaluable asset for inclusive and empowering data literacy-building programmes. We suggest that an underinvestment in data literacy building has been a major factor in limiting both the quality of data supply and the uptake and use of open data over the last ten years, and that investment in intermediaries, while valuable, does not obviate the need to see the majority of organisations and individuals engaged in social change and development work having direct access to much higher levels of data literacy.

In short, across the community of funders involved in work on open data, there is a clear need for funders to:

Continue to invest in a core open data movement and in shared open data infrastructures. The open data movement has yet to develop the kinds of professional associations or institutional structures that offer the potential for self-sustaining knowledge management, networking, and professional development. Even where sustainability mechanisms do emerge, developing and maintaining the inclusivity of the global open data community will require resources, as will public-good open data infrastructures that may not ever be self-funding. While funding may increasingly fall under wider “data rights” or “digital economy” headings, or be drawn from sectoral funding programmes, without enhanced donor coordination around open data initiatives and programmes, the full return on investments from the last decade may not be realised.

Integrate open data approaches within sectoral funding programmes. The idea of “mainstreaming” in international development funding has a mixed history. However, funding teams who have worked on directly resourcing open data work over the last decade will in the future have a larger role helping other thematic funding teams to identify the open data elements in their work. This does not mean trying to force open data into all projects, but instead should involve identifying where a project is already adopting a data-driven approach and exploring the extent to which an open data approach could enhance this. Much as governance or gender advisors have helped development-related donor programmes explore new dimensions of their work, data and open data specialists could have a major role to play in supporting funders in the coming decade.

Focus funding on (open) data literacy. Securing the benefits of open data, and mitigating risks associated with the abuse of data by powerful actors, requires much more widespread data literacy. Models for capacity building exist, but few have been tested at scale. Funders should set an ambitious vision for increased data literacy, both as a focus in its own right and as an element of other sustainable development projects.

Fundamentally, open data literacy is not just about technical skills. It involves a critical awareness of the right to know, the power of data, and how data can be explored and questioned. As Montes and Slater state in Chapter 19: Data literacy, the “focus should be placed on the value of openness in fighting inequality, versus focusing solely on the value of data analysis”.

While presented as recommendations for funders, implicit in the points above is a call for practitioners to also give long-term consideration to ways to self-fund networking and knowledge sharing, and to explore opportunities for open data approaches to be used within sector-specific projects, as well as to give more space to data literacy building in programme design.

Looking over the horizon

The history of open data over the last decade has seen open data become a well-established tool within the global policy toolbox with a wide community of supporters across many sectors. There are few development problems today that do not have a data dimension and, thus, will not have an open data dimension in the future. The risk that funders, policy-makers, and practitioners who were among the first to engage with open data as the “bright new thing” will shift their attention to emerging technologies in the next decade is real. But there is a strong foundation now in place for open data advocates to argue that open data must also continue to share the spotlight – as a complementary and, at times, corrective element of emerging public, private, and civil society data initiatives.

It could be argued that our conclusions and recommendations reflect the kind of “dogged optimism” that Wilson finds characteristic of civil society’s engagement with open data, always calling for a “fresh push in a long struggle that is just on the precipice of success” (Chapter 24: Civil society). Yet, this optimism is rooted in the evidence found across this collection that, over the last ten years, open data policy and practices have matured significantly. While shared learning and proven best practices from the past decade are not evenly distributed, they still offer the basis for a strategic way forward and substantial progress in the years ahead. The short essays in this volume are packed with examples, organisations, and innovative ideas that illustrate open data in practice, indicate the distance travelled, and highlight critical questions for the future. It is our hope that they inform and inspire new investment, research, and collaboration that will widen the effective use of open data.

As we look to the horizon, we cannot easily predict what the open data landscape will look like a decade from now. However, if there is a doubling-down on data literacy, an ongoing commitment to community and capacity building, and a deeper recognition of the politics and power dynamics around data, as well as of the vital strategic role of openness in protecting society from the darker side of data, then there is good reason to expect more opportunities to realise open data benefits for all.