How innovation in search engines needs renewing with
open working and open indexes

Image: LA at Night, Wikimedia, 

Without being able to build on top of existing — search tools and indexes — innovation in search engines is being held back and letting down researchers and the public. The Open Access and Open Science movement that have worked hard to make free hundreds of thousands of publications, but at the last mile search engines are failing to effectively deliver on discovery. Public knowledge is hidden in plain sight — a phenomenon called “Dark Knowledge”. This article is a call for open infrastructural ‘ways of working’ to be adopted as ‘the new normal’ to turn this situation around in software and interface development for scholarly search.


Open Knowledge Maps

In late 2013, an Ebola epidemic hit Western Africa that lasted for three years and cost ten thousand lives. One of the reasons for the severity of the epidemic was that public health officials in the affected countries were not prepared: to the best of their knowledge, the Ebola virus had not been observed in the region. This assumption proved to be wrong – and tragically, this had been public knowledge for more than thirty years as a New York Times investigation showed. Ebola had been found to be endemic in Western Africa in papers dating back to 1982. But while the knowledge was out there, it was not discoverable by the public health officials in Liberia, Sierra Leone and Guinea.

This phenomenon that public knowledge is hidden in plain sight has been called “Dark Knowledge” (Jeschke et al. 2018).

Open science is seen as an antidote to Dark Knowledge. But while the open science revolution has dramatically increased the accessibility of scientific knowledge, discoverability is falling behind. With two and a half million papers published every year (Ware and Mabe 2015), and tens of thousands of new research projects launched every day, discovery becomes increasingly difficult. Traditional approaches involving search engines providing long, unstructured lists of scientific outputs are not sufficient. We can also see this reflected in the numbers: the majority of publications and datasets are not reused (Lariviere, Gingras, and Archambault 2008) (Peters et al. 2016), and even in application-oriented disciplines such as medicine (Brownson et al. 2006), only a minority of results ever gets transferred to practice.

This also means that we are missing out on a lot of return-on-investment of funding programs such as the European Framework Programs. In many ways, we cannot cash the cheques written by the open science movement, when we do not dramatically increase discoverability of research.

One of the reasons for the lack of innovation in discoverability is the legacy of closed and proprietary services. Take Google Scholar for example. When Google Scholar came about fifteen years ago, it was a groundbreaking literature search engine. The scientific literature, however, has doubled in the meantime, and Google has not made enough investment to keep up with this growth. As a result, Google Scholar is of very limited use today and does a bad job at helping researchers to find relevant papers for their information needs. Now, this lack of innovation would not be a problem, if other tools could build on top of Google Scholar. But unfortunately they can’t, because the Google Scholar index is not reusable. Innovators in this market have to first build their own index, which is not helped by the fact that Google has many special arrangements with content providers that the rest of the world does not have.

In addition, Google Scholar only provides unstructured lists of search results, ten at a time. This works well when you know what you are looking for. But if you want to get an overview of research topic this way, it takes weeks, if not months before you know the most important topics, publication venues, and authors. This is time that many people do not have — especially when disaster hits in the form of a public health crisis.

At Open Knowledge Maps we want to change that. We want to provide the missing link between accessibility and discoverability. Our goal is to illuminate the dark knowledge. Instead of lists, we use knowledge maps for discovery. Knowledge maps provide an instant overview of a field by showing the main areas of the field at a glance, and papers related to each area. This makes it possible to easily identify useful, pertinent information. Based on this idea and software, we offer an openly accessible service on our website that lets you create knowledge maps for research topics in any discipline.

In contrast to Google Scholar, we are developing Open Knowledge Maps as a truly open infrastructure. By that, we mean a reusable infrastructure, a public good. This includes that all of our software is developed as open source under a permissive license. The knowledge maps themselves are licensed under a Creative Commons Attribution license and can be freely shared and modified, and the underlying data are in the public domain (CC0). We are also actively collaborating with the open science ecosystem. We are developing this infrastructure together with our partners such as BASE, which is our main data provider, rOpenSci, who provide the data clients for easy access to data sources, or, whose annotation client we have integrated into our maps.

As a result of our open approach, our software can be applied to other data sources and federated infrastructures such as the European Open Science Cloud (EOSC). One example for this is VIPER, the Visual Project Explorer. Using the same open source software, we created a system that provides a visual overview of the projects indexed in the EOSC via OpenAIRE. It enables funders, institutions, researchers, and other societal stakeholders to systematically explore a project’s output, and to understand its reception in different areas.

But our open approach does not stop there: we are working towards participatory development. We aim to create a respectful, sustainable and inclusive space for everyone involved in exploration and discovery of scientific knowledge. To achieve this goal, we are seeking input from people from all over the world, who are passionate about better and more open solutions for discovery. Therefore, we have started a community outreach program, the Enthusiasts program — and we couldn’t be happier with the results. Together, the enthusiasts have reached more than one hundred people, in six cities, and on four continents. Not only have they helped to spread the word on open discovery and OKMaps, but they have also collected valuable feedback. This feedback, in combination with input from a variety of other sources, informs our roadmap.

All of this makes Open Knowledge Maps an infrastructure that is community-driven and community-owned. It also ensures that Open Knowledge Maps is also developed according to user needs. As a result, an enthusiastic community has formed around Open Knowledge Maps. In the first two and a half years, we have had more than half a million users on our website, and more than one hundred thousand maps have been created. We hear many encouraging stories of people from around the world, who are now able to get an overview research topics much faster than before and discover new relationships and findings that were previously hidden from them.

We are now looking to combine the participatory approach with a sustainability model, similar to the Open Library of the Humanities (OLH): organizations become sustaining members of Open Knowledge Maps by contributing a yearly membership fee. In exchange, they can vote on features and data sources that should be integrated in Open Knowledge Maps.

In terms of drawbacks, funding is the moot point for us. So far, we have come by on a tiny budget: for a team of eleven, we have funding that would usually be barely enough for a single person. We could create Open Knowledge Maps only thanks to thousands of hours put in by the awesome volunteers on the team. It goes without saying that this is not sustainable and that the things we can do on a pure volunteer basis are limited. Unfortunately, funding for nonprofit organisations is scarce, especially when it comes to open source services and frontends — but they are the way researchers engage with open science. By leaving this market to proprietary and closed solutions, we are limiting innovation in how researchers — and the rest of the world — interacts with scientific knowledge.

In addition, this approach leaves governance and ownership in the hands of commercial entities, which are driven by stakeholder value and do not always have the best interest of the academic community and the society at large in mind. A situation that rings all to familiar for those involved in the struggle of moving from closed to open access publications.

In an ecosystem of open infrastructures, innovation thrives, because they can all build on top of each other’s work. There are also no lock-in effects that we see with closed offerings — if an organisation does not work out in the way the community expects it to, the community can take it somewhere else. Therefore, truly open infrastructures are the strongest drivers of innovation in scholarly infrastructures today.

Therefore my call to funding agencies would be to invest in truly open infrastructures, especially interfaces and services — this is how we make the open science revolution a reality and turn the light on dark knowledge.



Citation format: The Chicago Manual of Style, 17th Edition

Kraker, Peter. “Illuminating Dark Knowledge,” 2018.


Peter Kraker is founder and chairman of Open Knowledge Maps.


Dahn, Bernice, Vera Mussah, and Cameron Nutt. ‘Opinion | Yes, We Were Warned About Ebola’. The New York Times, 21 December 2017, sec. Opinion.

Jeschke, Jonathan, Sophie Lokatis, Isabelle Bartram, and Klement Tockner. ‘Knowledge in the Dark: Scientific Challenges and Ways Forward’. Accessed 3 December 2018.

Ware, Mark, and Michael Mabe. ‘The STM Report’, March 2015.

Lariviere, Vincent, Yves Gingras, and Eric Archambault. ‘The Decline in the Concentration of Citations, 1900-2007’. ArXiv:0809.5250 [Physics], 30 September 2008.

Peters, Isabella, Peter Kraker, Elisabeth Lex, Christian Gumpenberger, and Juan Gorraiz. ‘Research Data Explored: An Extended Analysis of Citations and Altmetrics | SpringerLink’. Accessed 3 December 2018.

Brownson, Ross C., Matthew W. Kreuter, Barbara A. Arrington, and William R. True. ‘From the Schools of Public Health’. Public Health Reports 121, no. 1 (1 January 2006): 97–103.

‘European Framework Programme: Archives | CORDIS | European Commission’. Accessed 27 November 2018. .

‘About Google Scholar’. Accessed 27 November 2018.

Maps, Open Knowledge. ‘Open Knowledge Maps – A Visual Interface to the World’s Scientific Knowledge’. Open Knowledge Maps. Accessed 27 November 2018.

Open Knowledge Maps – Github. Accessed 3 December 2018.

‘BASE (Bielefeld Academic Search Engine): Basic Search’. Accessed 27 November 2018.

‘ROpenSci’. Accessed 27 November 2018.

‘Hypothes.Is’. Hypothesis (blog). Accessed 27 November 2018.

Maps, Open Knowledge. ‘VIPER – The Visual Project Explorer’. VIPER. Accessed 27 November 2018.

‘European Open Science Cloud (EOSC) | Open Science – Research and Innovation – European Commission’. Accessed 27 November 2018.

‘OpenAIRE | Explore’. OpenAIRE – Explore. Accessed 27 November 2018.

‘Community – Open Knowledge Maps’. Accessed 27 November 2018.

‘Open Library of Humanities’. Accessed 27 November 2018.