{"id":2955,"date":"2020-06-23T12:36:38","date_gmt":"2020-06-23T10:36:38","guid":{"rendered":"https:\/\/genr.interpunct.dev\/?p=2955"},"modified":"2020-06-29T22:16:53","modified_gmt":"2020-06-29T20:16:53","slug":"openvirus-knowledge-in-the-hands-of-citizens","status":"publish","type":"post","link":"https:\/\/genr.interpunct.dev\/openvirus-knowledge-in-the-hands-of-citizens\/","title":{"rendered":"#openVirus \u2013 Knowledge in the Hands of Citizens"},"content":{"rendered":"\n

Image: Photo by Vincent Ghilione<\/a> on Unsplash<\/a><\/p>\n\n\n\n

openVirus is innovating new types of search for research literature using data mining technologies to enable citizens to make use of scientific knowledge. The COVID-19 pandemic has created a variety of crises \u2014 health and economic being the most obvious \u2014 but serious issues are occurring in education, social cohesion, transport, manufacturing, and supply chains, etc. The vast majority citizens working in these areas are locked out of accessing scientific literature \u2014 as an example if a doctor had a question about ‘social distancing’ on a publishers site like Taylor and Francis they would find only 5% (21,919) of research papers as Open Access (Murray-Rust 2020<\/a>), the rest (426,613) are paywalled.<\/p>\n\n\n\n

It is worth noting that the right to data mine paywalled research is permitted under EU copyright directives 1<\/a><\/sup>, although publishers are hostile to upholding this legal right and are known to take punitive action to prevent it \u2014 like completely disconnecting paying clients. With the COVID-19 crisis Open Science is now on the public’s radar and all stakeholders involved in scholarly communications are going to need to make themselves relevant to the situation as science’s own crisis coming down the line of ‘designing new systems’. (Thaney 2020<\/a>)<\/p>\n\n\n\n

openVirus works by speedily downloading papers as full-text from open repositories (EuropePMC, bioriv and medrxiv, DOAJ, EThOS, Redalyc (MX), etc.) at an average rate of fifty papers a second, then searching those papers on your local machine with ‘dictionaries’, that you build or use from others based on Wikidata’s 50 million items. Searches can be pinpointed on parts of a document for example graphs or conclusions and the indexing using Wikidata allows for semantic queries, e.g., if you had a question about COVID-19 infection rates and altitude Wikidata can return all city names over 2000 meters with a population over 50,000. New types of search are important as it enables scientific knowledge to be put into action, allowing someone \u2014 a citizen outside of academia \u2014 to share research related to an idea they are working on with others, which importantly is still linked and identified with its source \u2014 say EuropePMC.<\/p>\n\n\n\n

\"\"
Image: Schematic of ContentMine software<\/figcaption><\/figure><\/div>\n\n\n\n

The project uses a software framework called ContentMine<\/a> as a foundation for data mining, interfacing Wikidata to add semantic enrichment, and a variety of other frameworks for dedicated tasks. The technology is being rapidly developed and is designed to be put in the hands of the public, but also serves a variety purposes for a wide set of communities: research repositories looking to service clients, or researchers needing to speed-up scoping on literature reviews.<\/p>\n\n\n\n

At the start of the COVID-19 pandemic openVirus sprang into action as an open research project on GitHub<\/a> and Slack<\/a>. Currently there are thirty-eight members working globally and there is an open invitation for anyone to get involved. In April openVirus took part in the #EUvsVirus<\/a> global hackathon to look at innovation for the pandemic on healthcare issues. In the three days of the hackathon the team made significant developments in its hackathon submission<\/a>: (openVirus 2020) establishing openVirus as the first system to annotate the scientific literature corpus with Wikidata; bring on board thesis analysis, working on full-text indexing of the UK’s PhD theses EThOS; adding Ferret<\/a> scraping system; interfacing DOAJ and searching four million abstracts; and to move on Containerization for the system. The EUvsVirus hackathon was important for understanding the wide breadth of innovation challenges posed by the pandemic and which makes it clear that Open Science systems need to accelerate innovation to meet these needs, from: health \u2013 lack of skilled caregivers; business \u2013 efficient team work; social cohesion \u2013 supporting arts & entertainment; remote education \u2013 e-learning methods & tools; family life during remote working & education, and; digital finance \u2013 speeding-up access to financial support.<\/p>\n\n\n\n

Andy Jackson of the British Library Web Archive team posted a blogpost ‘Searching eTheses for the openVirus project<\/a>‘ (Jackson 2020) on a contribution he made to openVirus in response to the issue that libraries may already hold knowledge that could be made available and help in the crises. Andy took up this challenge and applied the UK Web Archiving<\/a> software tools to analysing the British Libraries holding of UK Theses EThOS<\/a> of over half a million documents. Legally these cannot be redistributed, but data mining to generate statistical summaries of the contents of the documents is permitted \u2014 for example word frequencies \u2014 showing the likely relevance of a document. An API was made to access the theses and encapsulated in a Jupyter Notebook<\/a>.<\/p>\n\n\n\n

\n

Our digital libraries and archives may hold crucial clues and content about how to help with the #covid19<\/a> outbreak: particularly this is the case with scientific literature. Now is the time for institutional bravery around access! #DHgoesVIRAL<\/a> 12\/20 https:\/\/t.co\/jop1qkj1kV<\/a><\/p>— melissa terras (@melissaterras) April 2, 2020<\/a><\/blockquote>