The State of Open Data - Histories and Horizons - Issues in Open Data
The chapters in this section address a number of cross-cutting issues that shape the current state of open data.
In 2009, while giving a TED Talk on “The next web” and invoking an earlier talk by Hans Rosling1on the potential of data, Sir Tim Berners-Lee invited the audience to chant “Raw data, now”.2That message was highly influential in early open data work. Coupled with the argument that government-collected data has already been paid for by taxpayers and so should, by rights, be available for them to reuse. The mantra of “raw data, now” led to a focus on getting as much data as possible, as quickly as possible, on to open data portals. Notably, although the open government data movement initially sought to draw a clear line between data about individual people and non-personal government data, focusing only on the latter, Berners-Lee’s speech covered the whole spectrum of data from official government datasets to social network data, research data, and crowdsourced citizen-generated data. His argument was that access to raw data was the first step toward building a web of interlinked data, noting that “there’s not an immediate return on the investment”, but that “it will only really pay off when everybody else has done it”.3“Raw data, now” was both a political call to remove the gatekeepers restricting the flow of data to skilled users and a strategic move, seeking to pre-emptively challenge the “data hugging” and organisational inertia that might prevent the potential benefits of data sharing from being realised.
In the decade that has followed, that idea of “raw data, now” has faced a number of critical issues and open data advocacy has had to adapt in response. First, the line between public and private data has turned out to not be so easily drawn. As early as 2010, governments attempted to balance open data and privacy concerns, although, as Chapter 23 (Privacy) explores, it was not until 2015 that privacy principles started to be widely incorporated into the international discourse on open data. As the chapter notes, considerable work has now gone into developing tools and resources to support governments and civil society in addressing privacy risks related to open data efforts. Because new technologies can transform older documents published by governments into searchable data, these risks relate to more than just the publication of new datasets. Although anecdotes of open data-related privacy breaches are relatively few and far between, it has become clear that with very different legal frameworks, cultural practices, and risk profiles across the world, providing raw data on demand may not always be possible. Instead, a more intentional approach has to be taken, weighing the potential public benefits of opening data against the privacy rights of individuals included in the data or even of third-parties who may be affected by the data.
Chapter 23 also introduces the Open Data Institute’s data spectrum, situtating open data on a continuum from closed to open. This, alongside a range of other conceptual innovations, such as “data stewardship”4and “responsible data”,5may serve to blur (artificially) the neat boundaries of open data, presenting open data programmes with the choice of maintaining a narrow focus on data availability or considering the consequences, positive and negative, of broader data accessibility and use.
This leads to the second critical issue. Data use requires users with the technical and analytical capacity to transform data to information, knowledge, and action. As per Chapter 19 (Data literacy), data literacy has moved up on the agenda of funders, governments, and civil society networks, but capacity to make the most of open data remains scarce, and training and capacity building delivered to date is woefully inadequate. Calls for more investment in this area are well warranted and should be matched by a continued shift in open data measurement work to look not only at the supply of, and the demand for, data but also examine data use (Chapter 22: Measurement). However, when it comes to enabling open data use, there is also an interaction that exists between the “rawness” of data and data literacy building. Chapter 19 describes how lower-quality data requires users with relatively advanced technical skills, not to mention relevant domain knowledge. The “rawer” the data, the higher the investment needed in data literacy to enable use, which brings us to the third critical issue: systematically improving data quality requires concurrent work to build better data infrastructures.
In Chapter 18 (Data infrastructure), the authors describe data infrastructure using a comparison to the physical infrastructure of road networks in order to bring focus to the problems caused by “potholes” in our data and the inconsistent quality of data infrastructures around the world. More specifically, they describe the need for shared identifiers, standards, and registers that, along with guidance, policies, and organisations, will support the realisation of joined up data and help to meet the Sustainable Development Goals (SDGs). Governments have to shift from simply throwing raw data over the wall to working collaboratively to build shared spaces around open data. The most difficult challenge will be to find the resources, and funding models, to support a “step change in investment and in the level of effort put into creating data infrastructures” (Chapter 18). The current web of documents has broadly been funded by advertising and the investments of millions of individual publishers. Funding the more complex work to maintain (inclusive and open) infrastructures for a web of data requires new approaches.
Together, concerns about privacy, questions of data literacy, and visions of data infrastructure, all contribute to a shift toward a new framing of open data, which was captured in the strategy announced by the Open Data Charter in 2018, signalling a shift from “open by default” to “publishing with purpose”.6At its best (and the way in which it is undoubtedly intended), this new mantra highlights the need to understand that all datasets, and the programmes of engagement around them, embed certain values and potentiality, and that responsible open data practices involve going beyond raw data to co-create datasets, data infrastructure, and opportunities for reuse with a wide range of stakeholders. However, at its worst, the idea of publishing with purpose may risk governments acting more as gatekeepers, making their own self-interested political and strategic decisions about what, or what not, to release, instead of acting primarily in a stewardship role to protect privacy, promote inclusion, and maximise data reusability. The three chapters (Privacy, Infrastructure, Data literacy) argue for embedding rights, inclusion, equity, and a culture of openness in any updated strategies for advancing open data, but the degree to which this may be achieved in the coming years is an outstanding question.
Chapters on gender equity (Chapter 20) and Indigenous data sovereignty (Chapter 21) also pick up on issues of inclusion and rights, calling for a focus on the ways in which government data (mis-)represents marginalised groups and on how data divides can reinforce marginalisation. Although principles of gender equity have been recognised in international law for decades and Indigenous community rights have risen in prominence, taking major steps forward with the UN Declaration on the Rights of Indigenous People (UNDRIP) in 2009, both chapters note that these issues have only started to appear on the open data agenda since 2015.
In the case of gender equity, Chapter 20 describes a growing space for discussion of gender, as new groups have emerged and built strong women-led networks of support. The authors reflect on anecdotal evidence that suggests the stereotypical open data organisation is no longer exclusively young, urban, and male, but also note that there is little systematic evidence available to track progress on gender equality in open data. Furthermore, simply counting who is present in the field can hide disparities that require women to work much harder than their male colleagues to be heard and recognised. The chapter also identifies the potential for bias within datasets, highlighting how a lack of gender-disaggregation in data or the non-collection of data of relevance to women’s lives will frustrate efforts to monitor progress on the SDGs. As the authors state, “as long as gender data gaps persist, any open datasets created based on raw data that does not adequately represent women will have limited potential to support transformative action on gender equity” (Chapter 20: Gender equity).
A similar challenge exists for Indigenous communities. In Chapter 21 (Indigenous data sovereignty), the authors describe how national statistical systems often render Indigenous populations invisible or introduce biases that fail to reflect community needs, priorities, and self-conceptions. However, they go further than simply arguing for better representation of Indigenous communities within government datasets to promote Indigenous data sovereignty, staking the claim of Indigenous populations to collective ownership and self-governance of data resources generated by or about them. They critique “Eurocentric conceptualisations of privacy and licensing”, arguing that Indigenous communities need to be able to exert more control over how their data is used to protect against exploitation, misinterpretation, and misuse. Indigenous data sovereignty is offered as more than just a critical perspective. Through a series of case studies in the chapter, the authors illustrate how it offers practical principles and tools, and, ultimately, a source of deep wisdom that can direct more sensitive governance of data.
For some readers, these chapters may make for challenging reading. They call into question many issues that have been taken for granted in the past when planning open data projects or carrying out research into the dynamics of open data. They challenge a traditional reliance on schematic definitions of openness and question neo-colonial biases, encouraging readers to consider how far their own biases or positions of privilege may affect their perspectives and calling on them to confront a much more multi-layered world. These chapters call for both an individual and collective response, reflecting the position of the authors of Chapter 20 (Gender equity, p.296) that “This need for self-reflection is a part of a larger cultural challenge that needs to be addressed actively in both professional and personal capacities”.
At the more systemic level, Chapter 22 (Measurement) explores the role that the measurement of progress and impact plays in shaping the state of open data, noting that, here again, issues of inclusion are currently poorly addressed. Rethinking open data measurement tools to better consider gender equity and Indigenous data issues, as well as the complex realities of administrative geography, still needs attention. However, the potential consolidation of existing measurement tools in the coming years may provide opportunities to improve methodologies, as well as the communication of results.
The scoping of topics to cover in our examination of the state of open data began in mid-2017. Since then, a number of issues have started to take up more space within both policy and academic debates around data, offering potential new frontiers for open data. In the case of a number of these issues, such as the rise of data collaboratives (private–public partnerships for data sharing)7and data trusts (governance frameworks for managing data),8discussions remain at an early stage, and, therefore, the decision was made not to address them in this volume. In other cases, such as with blockchain and distributed ledger technologies, we chose to let coverage emerge from sectoral and regional chapters, rather than addressing them in specific cross-cutting issue chapters. However, the rapid growth of debates around algorithms and artificial intelligence (AI) led us to the decision to add a chapter in order to survey how this growing field might shape the future state of open data.
As Chapter 17 (Algorithms and artificial intelligence) explores, since 2015, an increasing number of governments have published AI strategies, frequently turning to open data as a “raw material” to fuel the growth of their AI industries. At the same time, machine-learning techniques are able to create new sources of data from unstructured inputs, changing the landscape of data availability, particularly for developing countries. However, as Chapter 17 argues, the rise of AI in the form of both enthusiasm and concerns should not be regarded simply as a new hook for selling the idea of open data, nor simply as a source of tools for data analysis. Instead, many forms of AI represent a distinct form of data power, tending toward centralisation of control and toward black box data analysis in ways that run counter to the decentralising logic of many open data initiatives. Lessons from the open data movement related to equity, inclusion, privacy, data infrastructure, and data literacy all have important contributions to make in shaping the future of AI.
One of the great successes of open data has been that it opened the black box of government data, bringing into view their processes for data collection and use that were previously hidden from the public. In doing so, as many questions have been raised and answered, leading to the identification and exploration of a number of cross-cutting issues for open data. For those who view open data solely as a resource for innovation or for improving the efficiency of business processes, it may be possible to side-step some of these issues and simply reuse data when it is profitable and useful. For those who see open data as part of a wider social and political movement, the encouraging message from the chapters that follow is that there are many people already working to address these issues that will shape the future of open data, and solid progress is being made on the construction of a critically aware field of practice.
1: 1 Rosling, H. (2019) The best stats you’ve ever seen. https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen ↩
2: Berners-Lee, T. (2019). The next Web. https://www.ted.com/talks/tim_berners_lee_on_the_next_web ↩
3: Ibid. ↩
6: Open Data Charter. (2019). Bringing power into the open – 2019 strategy. https://drive.google.com/file/d/1fY6EBrXfal1e289FzPIaqhJBaIZ6MPuU/view?usp=sharing&usp=embed_facebook ↩