Over the last decade, many civil society and non-governmental organisations (NGOs) have advocated for public access to, and the use of, government data. They have drawn on arguments regarding the benefits of open data for society at large based on what intuitively seems like sound logic. Data that is open to the public increases the transparency of public institutions that routinely produce data. It can improve efficiency by interconnecting typically siloed datasets, and, because open data is free to reuse, it can resource innovation and development. In the words of Lawrence Lessig, in 2009, on the relationship between open data and greater transparency in governance, “How could anyone be against transparency? Its virtues and its utilities seem so crushingly obvious.”1
The origins of this logic can be traced to leading governments of Western democracies in the eighties and nineties.2In 2007, a definitive meeting of government officials took place in Sebastopol, CA, where the logic for open data was codified into what became known as the Sebastopol Principles.3Subsequently, governments, donor agencies, multilaterals, and civil society have fuelled interest in, and the implementation of, open data initiatives. However, there is a problem. The logic that these open data efforts have drawn upon, while intuitively appealing, is, in most cases, premised on a hunch, and one frequently based on experience in relatively few countries and contexts. Given the scope and depth of the purported impact that open data may have on the development of society, the responsible thing to do is to seek evidence in support of claims that open data is delivering on its promise. This is a key role of research in this domain – to produce the knowledge that confirms empirically the benefits and limitations of open data. This is the role that researchers have been asked to take up over recent years.
Since 2007, researchers have responded to that demand. There has been a substantial increase in the volume of research on open data being produced. Figure 1 illustrates the increase between 2007 and 2016 in journal articles on open government data indexed in the Clarivate Web of Science.4While solid data is not available on the numbers of non-indexed articles and grey literature publications about open data, it can be reasonably assumed that these have also increased.
However, it is not possible to conclude based on this increase that questions about the impact or operation of open data have been resolved or that the most critical knowledge gaps around open data have been filled.
There may be several factors that work against the assumption that more research invariably results in more valid and reliable knowledge about open data’s impacts and operations. For one, researchers do not constitute a singular or cohesive stakeholder group, and their different intents are reflected in their published articles. At best, researchers are a collective by virtue of their shared interest in the collection, synthesis, and use of evidence to advance particular objectives. While some applied researchers collect evidence to inform and shape policy or practice as their primary objective, other, more theoretical, researchers are interested primarily in the potential of evidence to confirm or advance knowledge. How evidence is collected, what counts as evidence, how evidence is scrutinised, and how it is put to use, will vary according to the objectives that bind particular communities of researchers together.
This should hardly come as a surprise. Sociologists studying organisations have argued that the behaviour of social actors (including researchers) is governed by different “institutional logics”,7and those studying social networks call on us to acknowledge the different “programmes” that shape the actions of those who belong to particular networks.8910However, too often the agendas behind research were left unexamined in the rush to find facts or figures to bolster the belief in open data’s potential during the early phases of the open data hype cycle. This chapter puts the emphasis on the factors shaping research to redress this imbalance.
When we focus on open government data specifically (i.e. not open research data or open data created and consumed only in the private sector), then we can identify research and researchers in relation to two main global networks:
- The science, technology, and higher education network or, in short, the “science network”.
- The transnational NGOs and social movements network or, in short, the “development network”.11
Although individual researcher affiliations and projects may cross over between these networks, as when a researcher in academia takes on consultancy-type research that leans more toward the application of existing know-how rather than the creation of new knowledge or a researcher in a non-profit organisation produces new knowledge without any immediate apparent application. The key question to ask when assessing researchers as stakeholders in the open data world is “Which institutional or network logic are they connected to?” This chapter will argue that we should not ignore these different logics because they determine how research is conducted which, in turn, determines both the relevance and the validity of the findings behind any conclusions regarding the impacts and operations of open government data.
Furthermore, if we want to see research contributing critical knowledge that can further efforts to realise the potential of open data, and that can sensitively illuminate where that potential has been overstated, we need to pay attention to the shape of research networks and consider the diversity of viewpoints represented. Bias in research toward particular geographies, genders, organisational positions, and academic disciplines can affect the stock of knowledge that the open data field has to draw upon with significant consequences.
The sections that follow will explore questions of how current research networks around open data have been shaped and challenge the idea that researchers constitute a single stakeholder group, as well as examine what this means for the current composition of research and research impacts.
Shaping networks of research
Although research is often presented as pure inquiry directed simply at addressing some particular question, those who ultimately hold the power to shape a research field are those who provide the financial resources to conduct research and those who determine the rules and standards governing research practice. Both money and standards as influencers correspond to different types of networked power in the information age.12Money makes it possible to direct the focus of research and to create new spaces for research through funding particular research programmes, meetings, or events. Standards, in the form of required practices to secure publication of research, or the cultures of particular research networks and academic disciplines, establish the boundaries of particular research networks and allocate power within those networks to particular groups and institutions. In the context of open data research, these forms of power have played out differently in shaping open data research within the science network and the parallel development network.
Funders in focus
Common to both science and development research networks is the influence of economic capital. Researchers in both networks rely on financial resources for their research activities. In the case of predominantly university-based research in the science network, funding is provided by government and/or student fees but also from third-stream income in the form of financial grants from alumni, donors, and research funding agencies. Tenured and other scientists are relatively free to pursue their own research agendas, although there is increasing institutional pressure to seek third-stream income to fund research. In the case of the development network, research is most often the remit of non-profit organisations and individual consultants, relying almost exclusively on financial support from external funders. Dependency on funding from donor agencies is, therefore, more acute in the development research network, and, consequently, the type of research undertaken is more exposed to the vagaries of the strategic priorities of funders. It is also important to note that researchers dependent on funding operate in a competitive environment, and this can dampen collaboration and the development of a shared research agenda.
Looking at a sample of publications on open government data indexed in the Clarivate Web of Science (which is biased toward the science network) in 2017, Van Schalkwyk and Verhulst13found that the European Union (EU) was the single largest cited funder of research on open data. Even so, EU funding only accounted for 20% of those publications that acknowledged financial support, making it clear that funding for open data research comes from a variety of sources. More ubiquitous is the finding that 64% of acknowledged funding came from national-level funding sources, reflecting perhaps a focus also on national-level studies of open data. What Van Schalkwyk and Verhulst (2017) were not able to fully address, however, is why 68% of publications in their sample did not acknowledge funding. It is known that researchers underreport when it comes to acknowledging funders, and acknowledgements, when they are provided, are not always indexed.14Nevertheless, the low number of research publications without funding acknowledgements could indicate that researchers in the science network are able to determine and conduct their research without external funding, and it may also reflect the relatively small size of open data research projects and the ability, at least early in the rise of interest on open data, for publication to be secured with relatively little input of resources.
It is not possible to use bibliometrics to provide a comparable analysis of publications produced by researchers in the development network because these publications are not indexed and are seldom assigned the ISBNs or DOIs needed for tracking and measurement. There is also no representative repository of publications produced by researchers in this network. However, we can look at funding of open data research across the development network using projects as a proxy in the absence of bibliometric data.
One of the first large-scale open data research projects was the Emerging Impacts of Open Data in Developing Countries (ODDC)15implemented by the World Wide Web Foundation as a two-phase research project from 2012 to 2015. ODDC was supported by the Open Data for Development (OD4D) programme, which is a partnership funded by Canada’s International Development Research Centre (IDRC), the World Bank, the United Kingdom’s (UK) Department for International Development (DFID), and Global Affairs Canada (GAC). In Latin America, the Latin American Open Data Initiative (ILDA)16has also received funding to conduct research on open data from OD4D and its funders plus financial support from the Avina Foundation. In addition, research in support of Africa Data Revolution Reports,1718the state of open data in Eastern Europe and Central Asia,19as well as research on open data in the Caribbean,20have all also received research grants from OD4D and its funding partners. The GovLab, based at New York University but active within the development network,21received research project funding from the Omidyar Network for its Open Data Impact project in 2015,22which was extended to cover developing countries with funding from FHI 360 and the United States Agency for International Development (USAID).23GovLab has also received funding from the John S. and James L. Knight Foundation in support of its Open Data 500 (OD500) research project focusing on commercial reuse of open data.24Between 2013 and 2017, the Making All Voices Count (MAVC) project supported 61 research projects, several of which focused on open data. MAVC was funded by the Omidyar Network, the Swedish International Development Cooperation Agency, DFID, and USAID. The Global Open Data for Agriculture and Nutrition (GODAN) project,25while not exclusively a research project, has commissioned research on open data in agriculture. GODAN has received funding and in-kind support from various partners,26but funding for research activities appears to be covered primarily by its grant from DFID.
While the list of projects above is not exhaustive (given that we lack a complete database of research projects and one of grey literature publications), what emerges from the overview of open data research project funding in the development research network is the presence of a few key funders from the constellation of funders active in the broader open data space (see Chapter 25: Donors and investors). In particular, government funding through DFID, IDRC, the World Bank, and USAID, as well as private funding from the Omidyar Network, has played a significant role in directing research and in setting the open government data research agenda.
Standards, culture, and key influencers
Among open data advocates, there has often been talk of the importance of standards to make data interoperable and to tie together communities of practice around particular datasets. However, this focus, which some have termed the “magical thinking” of standards,27an ignore the way in which standards as fixed practices, norms, and rules also create barriers to entry and foster forms of exclusion for those who may not have the capacity to meet the standards bar that has been set. As Davies (2014) writes: “Whilst our network society can make law, markets and norms more visible and contestable, by default, code, data and standards become an embedded part of the background, rarely subjected to scrutiny, and rarely open to be shaped by those who they affect.”28Whether for datasets or for a field of research, standards are typically set by early movers in more developed and well-resourced countries. Such standards can then set unrealistic or unfeasible expectations for later adopters. It should not be assumed that standards set for scientific research, or the datasets widely used by researchers, are apolitical as their creation and application is inevitably an expression of the will of those exercising the full extent of their influence with varying impact across different social contexts.
In the case of the science network, accepted standards and their associated practices (the derivatives of the so-called “Mertonian norms” of science)29include requirements to seek approval from ethics review boards, cite peers in written work, and subject written work to peer review prior to publication, all of which affects who gets to contribute to the creation and dissemination of knowledge. While, in theory, these standards are intended to be meritocratic, they may nevertheless preclude many individuals and groups from contributing to certain forms of open data research.
For example, Van Schalkwyk and Verhulst indicate that of 205 publications on open government data indexed in the Web of Science, only 0.5% of corresponding authors listed their affiliation as being an NGO, 2.0% as a private firm, and 3.0% as government.30In other words, over 90% of publications were authored by researchers from within the science network, suggesting that relatively little progress has been made either in bringing practitioner perspectives into academic research or in bridging between academic and development networks.
Different norms exist in the development network where rapid publication may be prioritised over peer review and there is a strong orientation toward case study research or to participatory research projects such as the Open Data Index. Values of inclusion and transparency are also often referenced by development stakeholders in their research. It is notable that the development network has had its own early movers involved in setting standards that shape the research agenda, such as technical definitions of open data (e.g. efforts by the 2007 working group on open government data,31the Open Knowledge Foundation,32and the Exploring the Emerging Impacts of ODDC project)33 leading open data working groups and standards-setting bodies (e.g. efforts of the Open Data Working Group of the Open Government Partnership and sector-specific groups working on data standards in contracting, agriculture, etc.), and engaging in the drafting and promotion of the International Open Data Charter.34Instruments and assessment criteria for evaluating open data readiness, implementation, and impact have also played a significant role in shaping development network research with the World Bank’s Open Data Readiness Assessment,35the Web Foundation’s Open Data Barometer,36the Open Data Inventory (ODIN) from Open Data Watch,37and the Open Knowledge Foundation’s Global Open Data Index,38all leading both to the secondary analysis of collected data and influencing the methodologies used by other researchers.
Although early projects emerging from the development network, such as OD4D’s ODDC programme, sought to foster an interdisciplinary research agenda and contribute to academic as well as development literature, it appears to have resulted in few publications indexed in the Clarivate Web of Science. It is also worth noting that while the Web of Science does index non-English journals, they are also known to be underrepresented. The index itself, therefore, may mask and reinforce the exclusion of contributions from a segment of the research community.
[Un]welcome to science?
The Open Data Research Symposium (ODRS) is designed to be a space where researchers can share and advance knowledge exclusively on open government data. University academics, independent researchers, and researchers from NGOs are all encouraged to submit and present papers at the symposium.
For the inaugural ODRS, a policy manager from an international NGO active in the open government data space submitted a paper abstract. The abstract was reviewed and accepted for presentation as a full paper at the Symposium. After the Symposium, all authors of accepted abstracts were also invited to submit their full papers to a special journal issue on open data. Submitted papers were put through a double-blind peer review process prior to publication in the special issue. The policy manager’s paper was subsequently not recommended for publication by the reviewers.
The following exchange provides a small example of a misunderstanding of the accepted norms in the science network (i.e. peer review and possible rejection) versus those of the development network (i.e. participatory and inclusive).
EDITORS: All the reviews on your paper entitled “XXX” that was submitted for consideration to the special issue on open data of the Journal of Community Informatics have now been received. Unfortunately, the reviewers have recommended that your paper not be accepted for publication. Once again, thank you for submitting your manuscript and for your patience as your paper went through the peer review process.
POLICY MANAGER: “What a welcoming community! Why was I invited in the first place?”
Fields and fora
We can see from the above that, in spite of some efforts, the different networks around open data research broadly operate in parallel, but still remain separate. The science network, in particular, contains a number of different disciplines which have been more or less active at different points over the last decade with the production of research on open data shaped substantially by their different research cultures.
Initial interest and research influence on open data emerged in the computer sciences, where the emphasis was on data as an object rather than on socially embedded practices. This research was largely interested in the technical aspects of open government data, focusing on methods to link open data by means of formats, languages, and standards that could result in possible efficiency gains and better data use. This research was, therefore, largely detached from the social world, seeing government data as an input but not studying the government processes that generated the data. However, as the implementation of open data initiatives matured and attention turned to the successes and failures of implementation, as well as to the terrain of policy and governance, a gradual switch toward examining social factors as determinants of success or failure emerged, bringing in a more diffuse set of researchers from backgrounds in geography, politics, economics, and business.39It is possible that this interest was fuelled by the concomitant interest of civil society in the social benefits of open data (e.g. countering corruption to improve service delivery). Regardless of the catalyst, social research into open data has become more established, although social research agendas are now arguably turning more to framings of data privacy,40data rights, data justice,41and social inclusion42as opposed to maintaining a core focus on open data.
The extent to which these new areas of enquiry are driven by an increase in media coverage, or by the shifting interests and strategies of funders, is not clear at this stage, and it remains to be seen whether this reflects a shift of interest into new fields or whether these new themes might be integrated into a coherent development network field of open data research.
Klein et al.’s analysis of where articles on open data have been published illustrates the current state of this shift. In 2017, they found that most articles had been published in the journal Government Information Quarterly, a journal that invites submissions from a range of both technical and social disciplines, and the distribution across the remaining journals that had either a technical or a social focus was 50/50.43However, by volume of publications, technical disciplines still predominate. Using the subject categorisation of published articles added by the Web of Science, rather than academic journals as a proxy for discipline, Van Schalkwyk and Verhulst found that the two most prominent subject categories in the literature published to date are both from technical disciplines: computer science (27%) and information and library science (23%).44Overall, Klein et al.45found that academic social research into open data has had a preoccupation with issues related to governance, such as transparency, participation, accountability, and collaboration, and, to a lesser extent, with the economic benefits of open government data, such as innovation and added value.
When it comes to the fora where researchers can meet, the only regular conference that focuses exclusively on open government data is the ODRS that is co-located with the International Open Data Conference with the goal of bridging academic and development research networks.46Three symposia have been held to-date (Ottawa 2015, Madrid 2016, and Buenos Aires 2018). Funding for ODRS has been limited to in-kind support from the World Wide Web Foundation (ODRS 1 and 2), GovLab (ODRS 2 and 3), and the organisers of the International Open Data Conference. USAID has provided travel support for participants (ODRS 2016), and the OD4D network has also provided funding for travel grants and logistics (ODRS 2018). Although a number of other conferences related to web science and e-government have included open data tracks, these appear to have been relatively short-lived and have not led to the creation of distinct open data research sub-fields nor contributed to an overarching research agenda on open data.47
The state of research stakeholder networks: A snapshot
The factors outlined above, such as funding or disciplinary standards, may be difficult for individual researchers to affect. This section provides a brief snapshot of the effect of these forces shaping open data research and explores whether we are seeing an inclusive or integrated landscape of research emerge.
- Gender. Using the names of corresponding authors for articles on open government data indexed in the Clarivate Web of Science, Van Schalkwyk and Verhulst48found that 30% (70 out of 209) were female. An analysis of attendees at the 2016 Open Data Research Symposium found 34% of participants were female. At the 2018 Symposium, 29% of the attendees were female. These findings indicate an underrepresentation of women in open data research.49
- Geography. Van Schalkwyk and Verhulst indicate that 88% (189) of research publications indexed in the Clarivate Web of Science were published by authors in the Global North.50The trend data show that there has been an increase in the proportion of authors from the Global South, but that the gap remains wide. Zhang et al. (2017) found that most authors are from the UK and US, both in terms of theoretical and practical research on open data, while in China, for example, most researchers focus on practical research.
- Collaboration. Bibliometric analysis of research published in the science network indicates that the trend is for research publications on open data to be co-authored. Seventy-nine per cent (170) of publications were found to be authored by two or more researchers, and the average number of authors per publication was found to be 3.29.51Of those authors who collaborated, 33% (71) did so with colleagues in the same organisation, 23% (49) collaborated with colleagues in the same country, and 11% (23) collaborated within a region (e.g. Europe, Africa). Only 3% (7) of collaborations were between regions within the same development classification (e.g. collaboration between authors in the US and Europe), while 8% (16) of collaborations took place across development classifications (i.e. North–South collaboration).
- Impact. The impact of the new knowledge produced by open data researchers can either be measured as impact on the production of new knowledge or impact on society in terms of changes brought about that are attributable to research outputs. Neither is easy to measure, although impact on knowledge production is typically measured using citations as a proxy for impact (the greater the frequency of citation for a research publication, the greater the impact of the research in the science network). Van Schalkwyk and Verhulst (2017) have indicated a marked increase in the number of citations, which is to be expected as the number of publications on open data has increased over time, finding that, on average, each paper is cited 5.88 times.
Alternative metrics (or “altmetrics”)52can also be used as proxy indicators for the “impact” of research among different stakeholder groups outside of the science network. The most highly cited academic open data paper according to Klein et. al.53was, at the time of writing, mentioned 17 times on Twitter and twice in policy documents.54Altmetrics for a comparable paper from the development network had 77 mentions on Twitter, two in policy documents, as well as a mention in one blogpost. In both cases, the data indicates that it can be a number of years before the publications are picked up and cited as sources.
From the data above, albeit primarily from the science network, it appears that there is a long way to go before open data research represents an inclusive stakeholder group. In particular, the lack of Global South representation and limited international collaboration are cause for concern when we consider the identified need for research to understand the role of open data in securing progress on the Sustainable Development Goals (SDGs). From altmetrics data, we also get a hint that research uptake in the wider discourse around open data is limited as well.
Reshaping research: Challenges and opportunities
Although papers and publications abound, when it comes to findings that can illustrate whether or not open data is fulfilling its promise, it is not unreasonable to ask why so little progress has been made in ten years? Why is there still so much uncertainty about the actual potential of open government data? And why has the open data research stakeholder group not managed to place research at the heart of open data discourse?
One key reason is that researchers as a stakeholder group are fragmented and uncoordinated. Those in the science network are more focused on technical aspects of open data, while those in the development network have been trying to emphasise the importance of social dynamics if open government data is to be transformative. Connections between these networks are few and far between. Entities that can function as connectors, such as GovLab at New York University, the Singapore Internet Research Centre (SiRC) at Nanyang Technological University, and the AidData lab located at the College of William & Mary, could play an increased role in this regard provided they can maintain a balanced position between both networks.
Balance is also needed in other areas. If open data research continues to be dominated by those in the Global North and does not become more collaborative (across disciplines and regions), and if it does not address the dominance of male researchers, then the knowledge produced will be of limited relevance to those regions and communities that are often touted as being the main beneficiaries of open data. This requires action from those shaping research networks and greater consideration by researchers of the collaborations they enter into and the ways in which they can contribute to a more inclusive and interdisciplinary research field.
Another key challenge to the construction of a research field, and the collation of clear findings, has been the limited duration or reduction of resources for open data research. If, as already suggested in this chapter, research in the science network is mostly undertaken without direct financial support from external funding agencies, then it is likely that, aside from the institutional pressures to “publish or perish”, such research is not directly subject to external time pressures. However, the lack of external funding also suggests that independent science-led research projects are likely to be fairly small and not linked to large empirical data collection. Even EU Horizon 2020 funding, identified above as the largest single funding source for open data research, is generally split between scientific outputs and more applied research and development activities. While the duration of these projects can vary, a typical small- or medium-scale research project would generally last two to four years, and larger projects could run for three to five years.
In the development network, the time allowed to complete funded research is shorter and has been shrinking. Commissioned research by the likes of GODAN, the World Bank, and the Open Government Partnership have allocated as little as one to three months for research to be carried out. The Impact of Open Data in Developing Countries research project by GovLab was to be completed within one year. Researchers had two years to complete research for the first phase of the Emerging Impacts of Open Data in Developing Countries, but for phase three of the research project, they had just seven months. While this is not a comprehensive survey of all calls for open data research across the development network, it does indicate that the time afforded researchers to complete their research is likely to substantially constrain the kinds of studies that can be carried out and the extent to which impact studies can be conducted. Of course, not all research requires an equal amount of time. It is often possible to complete synthesis and desktop research within in shorter time frames, but empirical research requiring fieldwork and longitudinal data collection inevitably needs more time.
It may be that research in the development network has been afforded less time because of the needs and priorities of actors in that network and because the shorter time periods are suitable to the kinds of research questions being explored. If so, shorter research timelines may not be problematic, and even less so if empirical studies are completed in the science network. But there are potential risks to consider. For example, if short-term research is done without reference to relevant empirical data and is presented as definitive, or if short-term research is more concerned with “real” social issues and scientific empirical work remains abstract and esoteric, then the risk of continued reliance on weak evidence to inform our understanding of open data will become a reality. An open conversation not only about the open data research agenda, but also about the kinds of resources needed to advance that agenda, is greatly needed.
As this chapter has shown, the role that different research stakeholders play is being substantially shaped by funding, culture, and field building. This observation is not unique to researchers as other stakeholder groups have had their engagement with open data influenced by similar forces. Crucially, one of the reasons we still have limited evidence on the impact of open data is that impact measurement in general is notoriously difficult. There are many widely adopted social policy interventions that have a surprisingly shallow evidence base. In particular, it takes time for evidence of impact to become available, and effective measurement requires methodologies to be refined and extensive empirical data to be gathered.
However, as open data work heads firmly into its second decade, the need to provide answers to questions about its value becomes ever more critical. Short-term project-based research that values relevance and application at the expense of programmatic research that incorporates robust empirical data and theory building will ultimately result in advocacy campaigns and open data policies built on shaky foundations.
This is not to say, however, that open data research enters the next decade starting from scratch. In fact, both development and science-network researchers have completed much groundwork to date and subjected some of it to critical attention. It could be argued that open data research has created the space for new, emerging areas of enquiry, such as data justice, privacy, and rights. One the other hand, these emerging fields will need to guard against some of the challenges faced by prior open data research because they still lack the conceptual clarity needed to support open data studies. A reboot of open data research could offer the opportunity to keep and strengthen good foundations and put aside the more shaky outputs from earlier short-term projects. If the forces shaping networks of research align appropriately, it may be that open data researchers can provide the answers needed to fulfil the hopes of the Sebastopol “pioneers”.