Open Science #Barcamp: Software Citation

Image: Barcamp Open Science, Berlin, March 2018. Photo credits: Bettina Ausserhofer. All photos are also published at Wiki Commons under the CC BY 4.0 license.

A report from the barcamp session on software citation at Barcamp Open Science, Berlin, March 2018.

The Barcamp Open Science organized by the Leibniz Research Alliance Science 2.0 and hosted by Wikimedia Deutschland was designed as a pre-event before the two day Open Science Conference. The Barcamp offers a space for discussion, for developing new ideas and knowledge exchange on experiences and best practices in Open Science for researchers and practitioners from various backgrounds, with an emphasis on bringing together novices and experts.

Research software is starting to be recognized as a valid research product and tool that calls for independent and proper attribution, and credit within the academic merit-system. Still there is no consistent procedure, let alone standard for software citation that allows the management of research software aligned with the FAIR principles of research data. The question underlying the discourse can be described as threefold. Why to cite software, what to cite exactly, and how to cite it.

Developments that regard clarification of the first two aspects can be found in the work of the FORCE11 Software Citation Working Group, which started in early 2015, and completed its work in late 2016 with publishing the Software Citation Principles (Smith AM et al. 2016). Separately, the Research Software Working Group within the Alliance of German Science Organisations was established in 2016. Their work resulted in the publication of the Recommendations on the Development, Use and Provision of Research Software (Katerbow et al. 2018). The most important software types for these recommendations consist of self-developed research software, software applications for research, and infrastructure software and services. Following the ending of the FORCE11 Software Citation Working Group a new follow-on working group was setup the Software Citation Implementation Working Group which was set up to tackle the implementation of the Software Citation Principles by developing guidelines and testing implementation scenarios. Thus tackling the aspect of how to cite software by facilitating the development of consistent procedures, tools and suitable workflows.

The barcamp session on software citation was proposed with the aim to emphasize the importance of the topic in general and to introduce current developments in software citation. The questions that were under discussion were based on the Software Citation Principles, but with a stronger focus on the pragmatic implementation of software citation from a developer and user perspective. In addition, some already existing tools were presented and discussed. The session was attended by around fifteen people affiliated with various research projects and institutions such as Fraunhofer FOKUS, TIB, Core Unit Systemmedizin, Bielefeld University Library, Humboldt University Berlin, and others. When asked about the role of software within their research all participants reported that they often come into contact with software within their research either as users or as developer themselves. So, discussing why to cite software it seemed that the overall consensus was that software citation leads to an improved overall research practice, integrity and reproducibility on one hand and makes for better credit for the work of developers on the other, even if they are not publishing a paper or article alongside the software. This also coincides with the examinations and findings of the various working groups of FORCE11, and of the Alliance of German Science Organisations. Whereas the aspect of what to cite exactly led to more discussion. First off, it depends on what software, software package, language, library or other component is used in the process of research and how the findings depend on it. Determining what to cite turns out to be difficult regarding the multiple dependencies inherent in software leading also to concerns of transitive credit—how to attribute micro-contributions over time. The topic of transitive credit as well as a first step towards a solution has previously been addressed (Katz D.S. 2014)(Katz D.S. & Smith A.M. 2015). However it is perceived to be most sufficient to cite the software itself as opposed to an affiliated article describing the software. This leaves the question of how to describe software; precisely which metadata fields would be mandatory. As for the barcamp session the metadata fields that led to more elaborate discussion were author, title, version information and persistent identifier as they were perceived as a minimal prerequisite for describing software entities. Overlapping with matters of transitive credit—authorship has been discussed—questioning attribution to: contributor, testers, and managers alike as an issue that needs to be addressed further. As for persistent identifier questions that arose and partly could not be answered within the scope of the session were if a repository URL or a commit hash make for a sufficient identifier, also is it suitable to assign an identifier for every release or every commit?

As for questions of how to cite software there have already been efforts made to develop services that support researchers with citing software. During the session Stephan Druskat introduced the Citation File Format in context of discussing dissemination of software metadata. It makes for a sufficient and user-centric way to store software metadata within the codes root directory in a so-called CITATION file. Including mandatory fields the developers preference for citing the software can be passed on directly to the user. Another service that was discussed more extensively was the coupling of Github with Zenodo. The services both parties have built allows for software that is developed on GitHub to get a backup at the CERN based Zenodo, as well as a DOI minted by Zenodo to make the repository citable. Having a minted DOI and providing CITATION files makes for a sufficient way citing software independent of specifics. However certain communities are engaged as well in making their specific language, software, or library citable. Prominently the R Community provides a citation function for R and R packages—though this is not in alignment with the Software Citation Principles since the function only offers citation information for a manual as opposed to the software itself. For the Python language DueCredit offers a framework facilitating citing Python and Python packages. From a user perspective it is may not sufficient to harvest the citation information yourself especially if there are different software entities involved in the research process and metadata are disseminated through different channels and formats. The CiteAs platform offers a service where it performs web-based searches based on the input of a URL, DOI, or arxivID to create a citation string with additional information of citation provenance. The aim of CiteAs is to provide citation information for all kinds of research products.

The community is still working on a standard metadata schema but as for now the existing services provide first access points and an improvement software citation. So go on, cite the software you use and make your software citable. And if you are still not convinced visit research-software.org and shouldacite.

References

Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86 https://doi.org/10.7717/peerj-cs.86

Katerbow, Matthias, & Feulner, Georg. (2018, March 16). Recommendations on the development, use and provision of Research Software. Zenodo. http://doi.org/10.5281/zenodo.1172988

Katz, D.S., (2014). Transitive Credit as a Means to Address Social and Technological Concerns Stemming from Citation and Attribution of Digital Products. Journal of Open Research Software . 2 ( 1 ) , p . e20 . DOI: http://doi.org/10.5334/jors.be

Katz, D.S. & Smith, A.M., (2015). Transitive Credit and JSON-LD. Journal of Open Research Software . 3 ( 1 ) , p . e7 . DOI: http://doi.org/10.5334/jors.by


POD cast interview with Sophia Dörner from the barcamp session. Thanks to Open Science Radio.


DOI: 10.25815/2F2X-NS46

Citation format: The Chicago Manual of Style, 17th Edition

Dörner, Sophia. ‘Open Science #Barcamp: Software Citation’, 2018. https://doi.org/10.25815/2F2X-NS46.


Sophia Dörner

Posted by Sophia Dörner

Sophia is in her bachelor's degree in library and information science with a second degree in musicology. She currently works as a student assistant in the in the IT Department of the Cluster of Excellence Image Knowledge Gestaltung at Humboldt-Universität zu Berlin.

Leave a Reply

Your email address will not be published. Required fields are marked *