Open science is linked with many values such as accountability, integrity, trust, reproducibility and of course, openness. From the beginning of the research process, publishing is key to inform the community and society about the findings and the winding road that led to them. Saying this, accessibility must be mentioned as a necessary condition and another core value of open science.
To promote Open Science and its values, Hamburg University of Technology (TUHH) and Hamburg State and University Library (SUB) are developing an Open Access framework for digital publishing. We use open source tools that every scholar, library and publisher should be able to use and adapt to his and her needs. The joint project of Modern Publishing is part of the program of Hamburg Open Science.
The Scholarly Writing and Publishing Framework
In our team we trust in the Unix philosophy that says “One job, one tool”. Like in Unix/Linux best practices we chain our tools together to build a modular pipeline that can be extended and configured as you like. Distinct parts can be replaced and adapted to one’s needs. This modular architecture is open to participation and inspiration from the community. We call it a Scholarly Writing and Publishing Framework.
In short, the system can be described as: Authors write collaboratively in Markdown, a GitLab pipeline running with Docker converts the text with pandoc/pandoc-scholar to various output formats like PDF, HTML and XML. Reviewers annotate the article before submission by using Hypothesis. The article is submitted and published in an OJS instance and again can be annotated and discussed with Hypothesis.
Let’s have a look at these tools in detail:
Markdown. In many open science and open source ecosystems the Markdown syntax for scientific writing has already become the default. Markdown can quickly be learned and can be found in many environments of the web. Markdown makes single-source publishing possible as it can be converted to many different formats with Static Site Generators like Jekyll, Hugo or pandoc.1
Pandoc. Pandoc is a converter for a large number of document types. It works on the command-line and converts Markdown to LaTeX, PDF, HTML, XML, office formats and various wiki dialects. To get an impression of its concept, you can try it online. For our framework pandoc is the first choice as it meets the requirements for scientific writing including citations, footnotes, figures and reference management.
pandoc-scholar. For scholarly writing the correct mention of authors and affiliation is crucial. To equip pandoc with this requirement, Albert Krewinkel and Robert Winkler wrote pandoc-scholar that we also use in our project (Krewinkel & Winkler, 2017).2 Krewinkel and Winkler also intend to reduce the expense of article processing with pandoc-scholar and can convert Markdown to PDF, LaTeX, HTML, JATS XML, epub and other formats. This way we can generate at the same time good-looking and machine readable versions of scientific articles from a single source.
GitLab. Many scientists already appreciate collaboration on GitHub, a sharing platform for software, data and text. GitLab is quite similar to GitHub, but open source software and free to host on a server of your choice. Besides the advantage of working together on code, data and text, GitLab excels with the feature of building digital artifacts with configurable pipelines. For the Scholarly Writing and Publishing Framework we make extensive use of these pipelines with Docker.
Docker is – in a nutshell – a technology that lets you build virtual machines as small as possible and as powerful as necessary. We use Docker to run pandoc/pandoc-scholar on Markdown files in GitLab pipelines to build scientific articles on the server, not on the authors’ clients.
Hypothesis. For (open) peer review of articles before and potentially also after submission we use the annotation tool Hypothesis.
Open Journal Systems. An instance of Open Journal Systems (OJS) is run by Hamburg University Press at SUB. Our objective is to foster the foundation and migration of peer reviewed scholarly journals hosted by Hamburg University Press. We support scholars in experiencing state-of-the-art requirements for open access publishing such as ORCID or Crossref DOI, metrics, appropriate metadata as well as information on publishing ethics or good scholarly practise, but also services on indexing and sustainability. The API Magazine, a student journal, is the first one published this year.
Take a look at an example pipeline
In a file called
.gitlab-ci.yml a GitLab pipeline can be configured. Here lies the source of power and creativity for the framework, as you can link software as you like to build what you need. To give you an example of a very simple pipeline configuration:
image: pandoc/latex build: script: - pandoc draft.md -o draft.pdf - pandoc draft.md -o draft.html artifacts: paths: - "*.pdf" - "*.html"
The first line declares the Docker image to be used. Then, in the
script part, any number of command lines can operate on the GitLab repository files. In this example, the file
draft.md contains a scientific article that is converted by pandoc to PDF and HTML. The
artifacts list enumerates all file types generated to be saved for further use before the Docker container is destroyed.
With a configuration like this every change in
draft.md leads to a new start of the pipeline to generate the desired artifacts. Thus, GitLab becomes a universal Content Management System (CMS) with great flexibility what to build. Artifacts then can be published with OJS.
Potentials and challenges of the framework
In agreement with Herrmann (2003) we understand the Scholarly Writing and Publishing Framework as a socio-technical system. This helps us to keep in mind that complex technology has to be seen in the context of “[…] organisational, technical, educational and cultural structures and interactions” (Herrmann, 2003, p. 60) and also points us to the needs and expectations of the users.
Speaking from a technical perspective, we see various potentials in this system. To mention some of the more important:
- We do not decide on the text editor to use. While Visual Studio Code3 is a good choice for scientific writing with Markdown, Zettlr could also be an editor you might like. A quick way to multi-author collaboration in a low-threshold Markdown writing environment can also be the browser tool HackMD and its FLOSS sister CodiMD.
- Based on the Docker image used, the pipeline can be configured to generate more complex artifacts like books (article in German), websites or even web applications that make use of the Markdown text. We use e.g. the framework also for our project blog where we replace pandoc with Hugo in the pipeline.
- With a complete FLOSS stack and software that is based on healthy developer communities we stay independent, modular and open for change.
Speaking from a social perspective, we also see various potentials and challenges in this system:
- Depending on your background, getting started with the framework means a more or less steep learning curve. We think it’s a worthwhile investment spending the time, as we chain together concepts and tools that are for themselves helpful and valuable in other research and education contexts as well.
- Involving authors and editors in our work, we learn about the writing habits of authors and usability aspects of new publishing systems. The framework supports collaborative writing processes and open peer review, thus supporting open science in its core sense.
- We know that the characteristics of the framework heavily relate to software development culture. This is good to easily onboard writers that come from a natural or computer science background. For scholars from other disciplines like the humanities and social sciences we think writing texts in a software developer environment could flatten the learning curve also for these tools. Furthermore, it supports adopting to the new paradigm of opening up the research process.
By saying this and following Herrmann, we assume that developing the framework underlies the characteristics described as “reciprocal indispensability”, “reciprocal forming” and “ubiquitous self-description” (Herrmann, 2003, p. 63). In short this means, that the social and technical system are joined together like two sides of a piece of paper and influence each other in the process of development and usage.
The technical stack depicted here will be presented and released as a modular bundle and open source in the context of Open-Access-Days 2020. Till the end of 2020 we will concentrate on optimizing the JATS XML output of pandoc/pandoc-scholar to fit the requirements of OJS. We also will join the discourse of using or developing a meta data set for authors and affiliations. Also, we are optimizing the templates, write a documentation for others to understand what we’re doing and test the framework with authors and editors.
- Modern Publishing project website
Grandesso, P. (2018, March 23). A pandoc-based layout workflow for scholarly journals [Personal blog]. http://pierog.it/en/2018/03/markdown-workflow/
Herrmann, T. (2003). Learning and Teaching in Socio-technical Environments. Informatics and the Digital Society: Social, Ethical and Cognitive Issues, 59–71. https://doi.org/10.1007/978-0-387-35663-1_6
Krewinkel, A., & Winkler, R. (2017). Formatting Open Science: Agilely Creating Multiple Document Formats for Academic Manuscripts with Pandoc Scholar. PeerJ Computer Science, 3, e112. https://doi.org/10.7717/peerj-cs.112
- Markdown e.g. is the way to write in Jupyter Notebooks or document your projects on GitHub. Grandesso (2018) also points to the advantages of writing and publishing with Markdown and inspired our work.↩︎
- We are happy and lucky to have Albert Krewinkel in our team to further develop pandoc-scholar for the needs of the framework. All of his contributions will be open source. Public money, public code.↩︎
- In our team we prefer the unbranded and telemetry-free FLOSS version VSCodium. Check out the extensions Pandoc Citer and Markdown Preview Enhanced for writing conveniently with Markdown.↩︎
Posted by Daniel Jackson | Jan 17, 2020
Researcher and developer Daniel Jackson shares his experiences of using flat file web technologies that can take the headaches out of running a research website by reducing maintenance tasks, lowering costs, avoid security headaches, and helping with archiving and keeping a site online long-term.