Making a ‘Pre-Publishing’ Research Workflow Open Source

Being open & transparent saves time & improves research

Image: Before – After. ‘Being open & transparent saves time & improves research: The Grackle Project’ and ‘Making a ‘Pre-Publishing’ Research Workflow Open Source’ modification of slide 25 from keynote presentation from the 13th Munin Conference (Tromsø, Norway) by Dr. Corina Logan, “We won’t be… ‘Bullied into Bad Science'”, 28.11.2018, https://osf.io/sy9f7/ | See section ‘Failing to make the complete workflow Open Source’ for annotations

At the Munin conference on scholarly publishing in Norway at the end of November (2018) Dr. Corina Logan presented a keynote “We won’t be… ‘Bullied into Bad Science'”. While following on the livestream GenR offered, as an experiment, to convert Dr. Logan’s ‘pre-publishing’ workflow to use only Open Source tools. As a collaborative open-work using Cryptpad we have been able to replace ‘most’ tools and for the exceptions chart a way to make a totally Free and Open Source Software workflow. In this short experiment, once again a
recurring issue has been encountered, that the basic provision of an infrastructural pillar of ‘modern research literacy’ has been overlooked—namely Open Source software—in this case the provision of ‘simple tools for authoring’. This is only the start of the work and your invited to chip in on the pad — EDITME!.

How to make workflows Open Source

The researcher workflow are addressing covers the stages of research from ‘hypotheses’ to ‘papers’ and we have chosen to strictly keep only the tasks listed in the original slide. This decision is for pragmatic reasons and to respond to the very clearly articulated ‘needs’ of the researcher group.

To give some background to the keynote presentation “We won’t be… ‘Bullied into Bad Science'”. It gained its name from a campaign by a group of researcher at Cambridge University in the UK who failed in their lobbying for the university to support Open Science policies, and to stop the university spending its budget on overpriced journal subscriptions. Out of these failures the ‘Bullied into Bad Science’ campaign was started to support early career researcher in adopting Open Scholarship research practices. One such example of good practice is the slide GenR has picked out of the presentation called ‘Being open & transparent saves time & improves research’ which shows ‘free to use’ tools for a ‘pre-publishing workflow’.

The Bullied Into Bad Science campaign is an initiative by early career researchers (ECRs) for early career researchers who aim for a fairer, more open and ethical research and publication environment. 

(‘Bullied Into Bad Science’ 2017)

Presentation Video


Video: Keynote presentation given at the 13th Munin Conference (Tromsø, Norway), Corina Logan, Max Planck Institute for Evolutionary Anthropology, Germany. https://mediasite.uit.no/Mediasite/Play/c50871d34ac44518a9bcf21a06a05e161d?playFrom=13000

The results of the GenR collaborative experiment to improve the workflow and make it Open Source (AKA FOSS, Free and Open Source Software) showed up two interesting and unexpected insights.

  • Firstly, that only one substantial tool stands in the way of making the workflow fully Open Source and that’s Google Sheets. And even here a route to building a replacement was found which could be modelled on two existing components ‘Ethercalc’ and the ‘R’ Google Sheets connector ‘googlesheets’. Work on documenting what is need to resolve this issue will start (soon) on GenR’s GitLab account.
  • Secondly, that research libraries and universities need to be the ones plugging the gap—providing and supporting these basic ‘modern research literacy’ infrastructure components. Not all the gaps need filling. But as we know if the researcher isn’t supplied with the right infrastructure then they’ll bring it along themselves by using anything and everything from the Net that gets the job done, with the result that research is lost and reproducibility inevitably drops. For institutions this means providing services and supporting niche development, for the likes of: GitLab, Mattermost, OwnCloud, data repositories like Zenodo clones, and Ethercalc + R (when it’s built), etc.

The new ‘all’ Open Source workflow slide

Copies of the slide are available on the GenR GitLab repository as ‘Open Document Presentation’ format, usable in LibreOffice or PowerPoint. https://gitlab.com/gen-r/research-workflow/


Image credit: ‘Making a ‘Pre-Publishing’ Research Workflow Open Source’ modification of slide 25 from the keynote presentation given at the 13th Munin Conference (Tromsø, Norway) by Dr. Corina Logan, “We won’t be… ‘Bullied into Bad Science'”, 28.11.2018, https://osf.io/sy9f7/ | See section ‘Failing to make the complete workflow Open Source’ for annotations

A new slide has been made for the ‘Grackle Project’ workflow which shows the replacement Open Source options. Not all parts of the workflow could be made fully Open Source and the reasons for this are explained in detail below. It is also worth pointing out that all of the Open Source tools included are selected under a quality criteria which looks to ensure that tools are accessible to the widest number of researchers, meaning: those outside of institutional support; and being aimed at a general skill level, where little time is demanded for additional learning.

The Open Source tools

These are the Open Source tools that have been selected for the workflow. Many other options and combinations of tools could also be used. 

In addition, the tool set recommended needs further rounds of field testing to validate the quality mark we would like to achieve for being widely available, and easy to master.

Table of workflow, tasks, Open Source tools, and ‘readiness’ ranking

The full spreadsheet can be found here. WordPress formatting would not allow a large table display.

Workflow TaskTools – free to useTools – Open Source
preregister onlineGitHubGitLab(.com or CE)
MarkdownMarkdown
GitLab Markdown https://docs.gitlab.com/ee/user/markdown.html
peer review onlinePeer Community in Ecologybuilt using:  Web2py
collect data (on laptops)
Prim8 Software (free to use)N/a NB: Prim8 free to use. Not FOSS! 
http://www.prim8software.com/


Prim8 free to use. Not FOSS! 
online spreadsheetGoogle SheetsEtherCalc
+ API improvements + below https://github.com/audreyt/ethercalc/blob/master/API.mdn
+ R package – googlesheets https://cran.r-project.org/web/packages/googlesheets/
offline spreadsheet LibreOffice/Calc https://www.libreoffice.org/discover/calc/
Coordination (category header of sub-tasks)
scholarly online social file management/workspaceOSFOSF or GitLab(.com or CE)
online image and photo collaborative managementGoogle PhotosOwnCloud – gallery
online shared calendarGoogle calOwnCloud – calendar
chat and voice with file sharingWhatsAppSignal
task management onlineTrelloGitLab(.com or CE)
online team/partners coordination/commsSlackMattermost
3D printing / 
Laser cutting
F*FreeCAD
F*FreeCAD https://www.freecadweb.org/
 
GitHub
GitLab Web https://gitlab.comGitLab – Community Edition install https://about.gitlab.com/install/
Code experimentsPsychoPy
PsychoPy http://www.psychopy.org/
Backup data/ repository (long term preservation?)KEEPERZenodo, or
Software Heritage
PrePrintbioRxivbioRxiv n/a ? status unknown
Out of scope – only existing tasks from original workflow being coveredCan be included in a later extension to workflow ‘pre-hypotheses’
Discovery Open Knowledge Maps
BASE
https://base-search.net
CORE
 
https://core.ac.uk

Table: workflow, tasks, Open Source tools, and ‘readiness’ ranking

Failing to make the complete workflow Open Source 

Not all tasks in the workflow have a workable Open Source tool available (Dec 2018) covering the complete ‘pre-publishing researcher workflow’ for the generally skilled researcher. The three tools cannot be replaced interestingly show up different reasons for Open Source options not being available.

The tools are (also * annotations to slide) :

  • Google Sheets – complexity. Real-time collaborative online spreadsheet editing and programmatic API access with R is a complex task to solve and require large amounts of resources that neither the public sector or private finance have been willing to support.
  • Prim8 – specialization. Prim8 is a specialized piece of software for logging primate behavior in the field. Open Source has a burden of time and resources to make & maintain, which is usually not available.
  • BioRxiv – bespoke platforms and software distribution. Quite often than not web services platforms are single instances and not meant for replication so Open Source licensing is not applied, as Open Source licensing is intended for software that will be distributed and reused. Under license terms like the GPL you can modify software and not release your code if the software are not distributed for others to use. Additionally a scholarly discipline will tend rely on a specific point of distribution, so alternatives being setup will not help the situation.

But things being Open Source means failing to complete the workflow with all tools being Open Source is not the end of the story, instead it’s the start of a new beginning. Tools can be reverse engineered, or comparable tools can be built upon and modified. This is what is proposed below for the Google Sheets case. For other issues like ‘specialization’ or ‘platforms’ these are more institutional issues of infrastructural support and where libraries and universities as infrastructural providers should step in to finance long term maintenance, become providers, or fund moving to Open Source, etc.

Google Sheets – Online collaborative spreadsheets: Extending Ethercalc to work with ‘R’

For this specific ‘research workflow’ the available Open Source ‘web-based collaborative spreadsheets’ do not meet our level of readiness to be recommended for use. This is not to say that Open Source tools like Ethercalc cannot be used for many other situations needing spreadsheet functionality.

Our specific workflow requirements are that such a spreadsheet tool can be programmatically accessed for use with ‘R’ statistical software. This means that an API is available and that a software package is available for ‘R’ to interact with the API.

The ‘web-based collaborative spreadsheet’ software that we identified as being the leader in its field and of a very high quality is called Ethercalc, which also has an API.

Issues of reproducibility should be considered. These could be addresses with the use of Git in the data workflow portion. A solution to consider could be Git and Rstudio, with an addition of a way to interact with the spreadsheet. 

(Jon Tennant)

Recommendation

Online collaborative spreadsheet: Extending Ethercalc to work with ‘R’. Ethercalc should be evaluated as a candidate to extend it to being usable computationally with ‘R’. Ethercalc: https://ethercalc.net/ 

The work would involve:

  1. An evaluation of Ethercalc’s functionality to perform as a collaborative online spreadsheet with ‘R’ data. Ethercalc: https://ethercalc.net/ ;
  2. Assess how an ‘R’ package such as ‘googlesheets’ could be adapted or used as a blueprint to provide the same functionality for Ethercalc;
  3. If the evaluation showed that a workable system could be made, then the full work should be carried out. See: googlesheets: https://cran.r-project.org/web/packages/googlesheets/ and https://github.com/jennybc/googlesheets/

Notes on technical pointers for using Ethercalc API https://twitter.com/gittaca/status/1073091616657813504

Definitions

Below are listed some consideration that helped frame this mini-experiment:

  • Researcher – Who are the researcher that this workflow is designed for? In terms of skills the researcher should have: awareness of using data with R; we want to avoid having any lengthy upskilling, and; that the workflow is workable for researchers who do not have institutional support, meaning our Open Source workflow tools must be easily available to public. 
  • Workflow – The original workflow of the ‘Grackle Project’ sets the limits for this round of trials. This means we only consider task described in slide 25. At a later stage it would be interesting to encode the workflow, for example in e.g., joints.js Bisage http://resources.jointjs.com/ – Business Process Model and Notation (BPMN). It is worth noting that this ‘pre-publishing’ workflow is very useful as it is a base-level workflow that many types of researcher use.
  • Readiness: Open Source / FOSS quality – We have looked to set a quality ranking for the Open Source tools being recommended. The approach that has been taken is to rate the tools in context of this specific workflow only. A ranking of 1-5 has been set, with 5 being the highest ranking. Our assessment for the ranking does not have a set of fixed criteria (yet), but instead looks to place the software as meeting our researcher requirements: accessible to all, and not having a high learning curve. 

Source material

Original material

Slides: https://osf.io/sy9f7/
DOI: https://doi.org/10.7557/5.4596
Video of presentation: https://mediasite.uit.no/Mediasite/Play/c50871d34ac44518a9bcf21a06a05e161d?playFrom=13000
Post presentation interview: https://twitter.com/ubtromso/status/1069546245549670400
Campaign: Bullied into Bad Science http://bulliedintobadscience.org/

GenR workflow

Collaborative document: https://pad.dyne.org/pad/#/1/edit/f1F8ONQkjg1HbGIgiO4D6g/1zZVhXslyZTFkeW2+UPEEVtb/
Repository: https://gitlab.com/gen-r/research-workflow/
Ethcalc + R plans: https://gitlab.com/gen-r/research-workflow/blob/master/Google-Sheets-replacement.md
Article: https://genr.eu/wp/making-research-workflow-open-source  

Authoring, contributors, acknowledgements and thanks

As of 13 Dec 2018

Note: This article is based on more detailed notes from an ongoing collaboration https://pad.dyne.org/pad/#/1/edit/f1F8ONQkjg1HbGIgiO4D6g/1zZVhXslyZTFkeW2+UPEEVtb/ 

Hence GenR is named as the lead contributor as the GenR editorial office is responsible for assertions made in the article. 

Lead contributor: GenR

Contributors: Peter Kraker; Katrin Leinweber; and Simon Worthington

Acknowledgements: Dr. Corina Logan; Jon Tennant; Danny Collin; Open Science MOOC; EtherCalc; Alex McLean; Patrick Bergel‏; Bianca Kramer; and others of the Twitter hive mind.

Thanks: Cryptpad and Dyne for providing the rich text collaborative editor. Free and Open Source by design!


DOI: 10.25815

/HAAS-2F56

Citation format: The Chicago Manual of Style, 17th Edition

Generation R. “Making a ‘Pre-Publishing’ Research Workflow Open Source,” 2018. https://doi.org/10.25815/HAAS-2F56.

References

All as Zotero collection #RFOSS https://www.zotero.org/groups/1838445/generation_r/items/collectionKey/7DFDSISI/tag/RFOSS


Bosman, Jeroen, and Bianca Kramer. ‘Commons-Compliant Tools/Platforms – Worksheet’. Google Docs, October 2018. https://docs.google.com/spreadsheets/d/1h0Aq6NYIeVnLDw33vx1SGnv1jbE2B7widbHhU7tpiUw/edit?usp=embed_facebook.

———. ‘Force2018-On-the-Positive-Side’. Google Docs, October 2018. https://docs.google.com/presentation/d/18XtTIvpjk6j9yTQ7CcE-TcSp_KNSzqFiD2q_t17OOXA/edit?usp=embed_facebook.

‘Bullied Into Bad Science’. Accessed 13 December 2018. http://bulliedintobadscience.org/.

‘Contributing to the Project — PsychoPy v3.0’. Accessed 12 December 2018. http://www.psychopy.org/about/contributing.html.

‘FreeCAD: Your Own 3D Parametric Modeler’. Accessed 12 December 2018. https://www.freecadweb.org/.

Gallery App for OwnCloud. JavaScript. 2014. Reprint, ownCloud, 2018. https://github.com/owncloud/gallery.

Logan, Corina. ‘I Won’t Be #BulliedIntoBadScience’. Septentrio Conference Series 0, no. 1 (20 November 2018). https://doi.org/10/gfpgv5.

Logan, Corina, Laurent Gatto, Ross Mounce, Stephen Eglen, Adrian Currie, and Lauren Maggio. ‘We Won’t Be… “Bullied into Bad Science”’, 2018. https://doi.org/10/gfkzf8.

Morley, Alexander. Contribute to Alexmorley/Meta-Open-Database Development by Creating an Account on GitHub, 2018. https://github.com/alexmorley/meta-open-database.

‘PCI Ecology’. Accessed 12 December 2018. https://ecology.peercommunityin.org/.

‘Prim8 Software’. Accessed 12 December 2018. http://www.prim8software.com/?page_id=27.

‘Research Software Directory’. Accessed 11 December 2018. https://www.research-software.nl/.

‘Web2Py’. Accessed 12 December 2018. http://www.web2py.com/.

Gen R

Posted by Gen R

Generation R editorial büro

Leave a Reply

Your email address will not be published. Required fields are marked *