Replicability requires that the context, the procedures and the data of a scientific experiment be disclosed in sufficient detail for a third party group to repeat that experiment in order to confirm or contest its findings.
There are a number of threats to replicability. Some of them are technical, some social. Among the technical threats, we can count:
- Insufficient information about procedures (e.g. missing information about temperature) will prevent precise replication.
- Withholding of raw data will prevent statistical analysis by peers.
- Withholding of computer code will not allow to see whether discrepancies in findings might be related to bugs.
- Absence of explanatory text will not prevent replication in the strict sense, but it will make it hard for third parties to understand why a particular setup was chosen in the first place.
The social threats are more subtle. If an experiment is very nicely motivated, has a good description, and all raw data and computational procedures are available, but only behind a paywall, this will bar some groups from trying to replicate the findings. The publication ecosystem is thus also a component which influences replicability. A publication ecosystem which is built on making access to information scarce for financial gain (“reader-pays”, the traditional subscription model) is in itself an enemy of replicability.
Open Access, Open Data, and more broadly Open Science have the set goal to overcome these legacy publication systems, which have the restriction of access as their core business model. But which newer setups work, and which ones don’t? We are thus back to a new empirical question, this time of an entrepreneurial nature: how to
In a way, a publishing platform compares quite well with an experiment: you have a context (your subfield), some procedures (workflows, toolchains), and computer code/software. All this together yields (business) data. The logical step is to make these available for replication. This is what has been done in the OpenAire-project Full disclosure: replicable strategies for book publications supplemented with empirical data. This project was run by Language Science Press. LangSci released the following items:
- their business model from 2015
- their business data from 2017 including expenditures, sales, downloads etc.
- a spreadsheet to calculate earnings and expenditures based on 100 variables such as cost of labour, length of books, time spent on typesetting a page, setup costs for print-on-demand, etc.
- a cookbook with best practices, lessons learned, and other insights gained in the course of the project since 2014
The business model from 2015 analyses the publishing landscape and identifies four target groups:
- libraries, and
- research institutions.
These are matched with five revenue streams:
- print copies,
- author fees,
- institutional memberships,
- individual memberships, and
The 2015 business model contains some projections about the earnings to expect from each of those sources. The document released in 2018 contains annotations and evaluations of these projections. Basically, of the four streams, only institutional memberships met the expectations; the performance of the other revenue streams was way below par. We thus have an elaborate theoretical model, which made predictions, and we have the business data to evaluate those predictions. We furthermore have the computational tools to adjust the model: the spreadsheet. The context is given by the “cookbook”, which contains the “softer” environment variables, like community building, prestige, or dissemination strategies.
There is no need to replicate Language Science Press in the strict sense since one community-run publisher per subfield should do. But the model, procedures, data, and tools could be replicated for other fields, from Archaeology to Zoology. Obviously, the context will be different, hence we cannot expect identical results, but if those other projects also release their data, we will have a growing pool of empirical information about how to further publication models which do not stand in the way of replicability as do the legacy models: these will be replicable publication models to ensure replicability in science.
Nordhoff, Sebastian. Cookbook for Open Access Books. Berlin: Language Science Press, 2018.
Language Science Press, ed. ‘Full Disclosure: Replicable Strategies for Book Publications Supplemented with Empirical Data’. OpenAire project, 11 June 2018. https://github.com/langsci/opendata.
Neylon, Cameron. ‘Principles for Open Scholarly Infrastructures’. Science in the Open (blog), 23 February 2015. http://cameronneylon.net/blog/principles-for-open-scholarly-infrastructures/.
Nordhoff, Sebastian. Language Science Press Business Model. Berlin: Language Science Press, 2018. https://doi.org/10.5281/zenodo.1286972.
Language Science Press, ed. ‘Business Data 2017: Language Science Press’. 2018. Reprint, Language Science Press, 12 July 2018. https://github.com/langsci/opendata.
Langauge Science Press. ‘Costs and Revenue of a Community-Based Publisher for 5 Years of Langauge Science Press’. 2018. Reprint, Language Science Press, 12 July 2018. https://github.com/langsci/opendata/tree/master/calculations.