Software is an important research product. It embeds knowledge. Thus, it should be cited like all other scientific products. But how do you cite the software that you use in your research?
Until a more suitable system for software citation exists, we need to leverage the one in place for papers and books. But finding the metadata needed for citation is usually harder for software than for papers. Therefore authors must provide it visibly and accessibly.
Providing software citation metadata
Robin Wilson has suggested one way of doing this in a blog post: Include a file containing the relevant information in the code repository. Name the file CITATION — in analogy to LICENSE and README — so that users can find it easily, and cite the software correctly.
The next step is to make the CITATION file machine-readable and open it up for further use cases, by using a standard format. This way, its contents can be used easily
- in the software itself, e.g., in About dialogs, cite() commands, data output;
- in repositories, publishing platforms, software directories, e.g., to display a formatted reference;
- by indexers, publishers, reference managers, etc., for further processing.
But as one of the most important use cases for CITATION files is still discovery by users, it is necessary to preserve human-readability.
Using the correct software citation metadata
In order to unlock the full potential of software citation, including the enablement of research reproducibility, the citation metadata must adhere to a number of principles: the software citation principles (Smith et al. 2016). According to these, the available metadata for a software must at least include:
- A unique identifier
- The software name
- The author(s)
- The version number
- The release date
- The location or repository of the software
The Citation File Format
Emanating from a lightning talk (Druskat 2017), the prerequisites of a useful format for CITATION files have been discussed at the WSSSPE5.1 workshop, and the results are published in a blog post. It suggests that such a format must fulfill at least two criteria:
- human- and machine-readability;
- adherence to the software citation principles.
The Citation File Format (CFF) fulfills these criteria. It is implemented as a key-value map in YAML, a “human friendly data serialization standard for all programming languages”. It supports the software citation principles by requiring central keys (see above), and allowing for the fallback solutions also defined in Smith et al. (2016). Additionally, it is self-explanatory by providing a free text message field which states the purpose of the metadata and directions for its use.
A simple valid CITATION.cff file looks something like this:
cff-version: 1.0.3 message: "If you use this software, cite it with below metadata." authors: - family-names: Druskat given-names: Stephan title: My Research Tool version: 1.0.4 doi: 10.5281/zenodo.1234 date-released: 2017-12-18 repository-code: https://github.com/sdruskat/my-research-tool
The Citation File Format also provides keys for other metadata, such as license, different repository types, keywords, contact information, etc. Additionally, it allows the specification of secondary references which users should also cite under specific circumstances. These references can be scoped to define when they should be cited.
message: If you use My Research Tool (MRT), please cite the software AND the outline paper. ... references: - type: article scope: Cite this paper to reference the general concepts of the tool. authors: - family-names: Maus given-names: Ketty title: "My Research Tool: A 100% accuracy syntax parser for all languages" year: 2099 journal: Journal of Hard Science Fiction volume: 42 issue: "13" doi: 10.9999/hardscifi-lang.42132
The Citation File Format as the first step in the software citation workflow
Given its human-friendly, yet machine-actionable, properties, the Citation File Format represents a suitable entry point to the software citation workflow, and depositing a CITATION.cff file in a repository should be the first step.
Further downstream in the software citation workflow, the metadata will have to be exchanged between different actors without human intervention, e.g., when it is harvested from long-term storage or publication platforms by indexing services. At this point, automated linking and resolving capabilities become more important than human-readability or fine-grained recording of secondary references. For these purposes, the metadata should be converted into a linked data exchange format: CodeMeta JSON-LD.
The Citation File Format is compatible with CodeMeta, and in order to facilitate smooth transfer between the two formats, the Citation File Format community develop a number of software tools, among them a converter which can transform CITATION.cff files to CodeMeta JSON-LD. These tools can be used — for example in the build and release processes — to make sure that the software citation metadata is preserved and re-usable across the complete workflow.
Development and outlook
The Citation File Format is supported by different projects. It is, for example, one of the source formats for CiteAs.org, a citation resolver for different metadata formats, and the Netherlands eScience Center has adopted it for its software projects. Its integration into the wider software citation workflow is also worked on in the scope of the FORCE11 Software Citation Implementation WG.
The current version of the Citation File Format — the “Core Module” — is centred around providing the necessities for software citation. Development of further modules is planned for the future in order to support a wider range of software metadata for, e.g., data requirements and transitive credit.
To support developers, users, and integration efforts, and to foster uptake, software tools for CFF have been, and continue to be, developed by the community. They include validation schemas, a reader library, a multi-format converter, a DOI and a GitHub/GitLab resolver, and Ruby tooling. A web application is currently in development, based on work started during the SSI Collaborations Workshop 2018 Hack Day. And on 5 September, the first ever Citation File Format Hack Day will take place in Birmingham, co-locating with the 3rd Conference of Research Software Engineers.
The Citation File Format itself, as well as all tooling, is maintained openly on GitHub and welcomes contributions!
Druskat, Stephan. 2017. ‘Track 2 Lightning Talk: Should CITATION Files Be Standardized?’ In Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1), edited by Neil Chue Hong, Stephan Druskat, Robert Haines, Caroline Jay, Daniel S. Katz, and Shoaib Sufi. figshare. https://doi.org/10.6084/m9.figshare.3827058
Smith, Arfon M., Daniel S. Katz, Kyle E. Niemeyer, and FORCE11 Software Citation Working Group. 2016. ‘Software Citation Principles’. PeerJ Computer Science 2 (e86). https://doi.org/10.7717/peerj-cs.86
Stephan Druskat is a founder and the project lead of the Citation File Format project
Citation format: The Chicago Manual of Style, 17th Edition
Druskat, Stephan. ‘Software Citation for Humans: The Citation File Format’. Leibniz Research Alliance Science 2.0, 2018. https://doi.org/10.25815/P3DH-HZ85.