Software Citation for Humans: The Citation File Format

Image: This NeXT workstation was used by Tim Berners-Lee as the first Web server on the World Wide Web in 1991. It is shown here as displayed in 2005 at Microcosm, the public science museum at CERN. Creative Commons Attribution-Share Alike 3.0 Unported. Wikimedia Commons.

Software is an important research product. It embeds knowledge. Thus, it should be cited like all other scientific products. But how do you cite the software that you use in your research?

Until a more suitable system for software citation exists, we need to leverage the one in place for papers and books. But finding the metadata needed for citation is usually harder for software than for papers. Therefore authors must provide it visibly and accessibly.

Providing software citation metadata

Robin Wilson has suggested one way of doing this in a blog post: Include a file containing the relevant information in the code repository. Name the file CITATION — in analogy to LICENSE and README — so that users can find it easily, and cite the software correctly.

The next step is to make the CITATION file machine-readable and open it up for further use cases, by using a standard format. This way, its contents can be used easily

  • in the software itself, e.g., in About dialogs, cite() commands, data output;
  • in repositories, publishing platforms, software directories, e.g., to display a formatted reference;
  • by indexers, publishers, reference managers, etc., for further processing.

But as one of the most important use cases for CITATION files is still discovery by users, it is necessary to preserve human-readability.

Using the correct software citation metadata

In order to unlock the full potential of software citation, including the enablement of research reproducibility, the citation metadata must adhere to a number of principles: the software citation principles (Smith et al. 2016). According to these, the available metadata for a software must at least include:

  • A unique identifier
  • The software name
  • The author(s)
  • The version number
  • The release date
  • The location or repository of the software

The Citation File Format

Emanating from a lightning talk (Druskat 2017), the prerequisites of a useful format for CITATION files have been discussed at the WSSSPE5.1 workshop, and the results are published in a blog post. It suggests that such a format must fulfill at least two criteria:

  1. human- and machine-readability;
  2. adherence to the software citation principles.

The Citation File Format (CFF) fulfills these criteria. It is implemented as a key-value map in YAML, a “human friendly data serialization standard for all programming languages”. It supports the software citation principles by requiring central keys (see above), and allowing for the fallback solutions also defined in Smith et al. (2016). Additionally, it is self-explanatory by providing a free text message field which states the purpose of the metadata and directions for its use.

A simple valid CITATION.cff file looks something like this:

cff-version: 1.0.3
message: "If you use this software, cite it with below metadata."
authors:
  - family-names: Druskat
    given-names: Stephan
title: My Research Tool
version: 1.0.4
doi: 10.5281/zenodo.1234
date-released: 2017-12-18
repository-code: https://github.com/sdruskat/my-research-tool

The Citation File Format also provides keys for other metadata, such as license, different repository types, keywords, contact information, etc. Additionally, it allows the specification of secondary references which users should also cite under specific circumstances. These references can be scoped to define when they should be cited.

message: If you use My Research Tool (MRT), please cite the software AND the outline paper.
    ...
references:
  - type: article
    scope: Cite this paper to reference the general concepts of the tool.
    authors:
      - family-names: Maus
        given-names: Ketty
    title: "My Research Tool: A 100% accuracy syntax parser for all languages"
    year: 2099
    journal: Journal of Hard Science Fiction
    volume: 42
    issue: "13"
    doi: 10.9999/hardscifi-lang.42132

The Citation File Format as the first step in the software citation workflow

Given its human-friendly, yet machine-actionable, properties, the Citation File Format represents a suitable entry point to the software citation workflow, and depositing a CITATION.cff file in a repository should be the first step.

Further downstream in the software citation workflow, the metadata will have to be exchanged between different actors without human intervention, e.g., when it is harvested from long-term storage or publication platforms by indexing services. At this point, automated linking and resolving capabilities become more important than human-readability or fine-grained recording of secondary references. For these purposes, the metadata should be converted into a linked data exchange format: CodeMeta JSON-LD.

The Citation File Format is compatible with CodeMeta, and in order to facilitate smooth transfer between the two formats, the Citation File Format community develop a number of software tools, among them a converter which can transform CITATION.cff files to CodeMeta JSON-LD. These tools can be used — for example in the build and release processes — to make sure that the software citation metadata is preserved and re-usable across the complete workflow.

Development and outlook

The Citation File Format is supported by different projects. It is, for example, one of the source formats for CiteAs.org, a citation resolver for different metadata formats, and the Netherlands eScience Center has adopted it for its software projects. Its integration into the wider software citation workflow is also worked on in the scope of the FORCE11 Software Citation Implementation WG.

The current version of the Citation File Format — the “Core Module” — is centred around providing the necessities for software citation. Development of further modules is planned for the future in order to support a wider range of software metadata for, e.g., data requirements and transitive credit.

To support developers, users, and integration efforts, and to foster uptake, software tools for CFF have been, and continue to be, developed by the community. They include validation schemas, a reader library, a multi-format converter, a DOI and a GitHub/GitLab resolver, and Ruby tooling. A web application is currently in development, based on work started during the SSI Collaborations Workshop 2018 Hack Day. And on 5 September, the first ever Citation File Format Hack Day will take place in Birmingham, co-locating with the 3rd Conference of Research Software Engineers.

The Citation File Format itself, as well as all tooling, is maintained openly on GitHub and welcomes contributions!

References

Druskat, Stephan. 2017. ‘Track 2 Lightning Talk: Should CITATION Files Be Standardized?’ In Proceedings of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1), edited by Neil Chue Hong, Stephan Druskat, Robert Haines, Caroline Jay, Daniel S. Katz, and Shoaib Sufi. figshare. https://doi.org/10.6084/m9.figshare.3827058

Smith, Arfon M., Daniel S. Katz, Kyle E. Niemeyer, and FORCE11 Software Citation Working Group. 2016. ‘Software Citation Principles’. PeerJ Computer Science 2 (e86). https://doi.org/10.7717/peerj-cs.86

Disclosure

Stephan Druskat is a founder and the project lead of the Citation File Format project


DOI: 10.25815/p3dh-hz85

Citation format: The Chicago Manual of Style, 17th Edition

Druskat, Stephan. ‘Software Citation for Humans: The Citation File Format’. Leibniz Research Alliance Science 2.0, 2018. https://doi.org/10.25815/P3DH-HZ85.


Stephan Druskat

Posted by Stephan Druskat

Stephan is a Research Software Engineer, working in Linguistics and Digital Humanities. He currently work as a researcher in the linguistic research project MelaTAMP in the Department of German Studies and Linguistics, at Humboldt-Universität zu Berlin, where he creates software that enables contrastive research on the TAMP (tense, aspect, modality, polarity) systems of seven Oceanic languages.

Leave a Reply

Your email address will not be published. Required fields are marked *