How preprint is it?

Preprint servers come in a variety of shades. Some have argued that this shouldn’t be the case and that one size should fit all, but in my view a variety of offerings is an advantage and not a hindrance. After all, I have never heard a claim that there should be only one journal for every paper published and the same should apply to preprints. At the same time, with such an array of choices, how do you tell what any given preprint server offers? This is a large part of my motivation for setting up this website and blog. Authors and readers need good information about what preprints are and what they offer.

There have been discussions around setting minimum standards for preprint servers, such as from ASAPbio. Initially, it sounds like a good idea: set a standard that everyone has to follow and then you know what you get. The big drawback is that it stifles creativity and innovation, and would be expensive to police. Not all preprint servers look like one: take Zenodo, for instance, or FigShare: their primary purpose is to share data, not papers, but they also host a large number of unpublished research papers that are considered preprints. Overleaf and Authorea are online writing tools, but host public drafts of research papers that can also be considered preprints. If requirements for preprint servers came in that were too stringent, either these platforms would need to make modifications or they would fall outside of the system. Small preprint servers and university repositories could find the cost of compliance prohibitive and authors might find themselves having to post the same work in several places. There is still tremendous scope for innovating how research, beyond the PDF article, and any set of well-intentioned guidelines that assume the status quo will hold back progress.

Then there is the question of who is going to implement any guidelines? This could quickly become a very complex matter. It is likely to be a field-specific body, so preprint servers that operate in more than one field could find that they need to comply with multiple guidelines. In addition, where will funding come from to verify that each preprint server complies and who will handle complaints? This sounds like an expensive an error-prone process, or one that ineffective.

What is the alternative? First, there is the view that we are just taking this too seriously. After all, preprints are unvalidated work that is posted online at the authors’ whim. This is problematic once you start to see preprints as part of the wider research infrastructure. For example, funders want guarantees about preprints if they are going to accept them in grant applications, as many preprint advocates wish and some funders have started to do. For preprints to be part of the trend towards open science, it requires methods to be included and data to be made available.

I suggest something based on the journal initiative “How open is it?”. Here there is no black and white in terms of what is or isn’t an open journal, but it recognises that there are certain characteristics that contribute to openness and measures journals according to it. The preprint context doesn’t exactly translate, in that there is no single goal to aim for (i.e. open/closed). However, we can identify characteristics of preprints that hold a strong interest for certain stakeholders, such as whether there is a screening process or if certain file types are available, such as XML. Are there any drawbacks of such a system? Of course, if the bar is set too high by an author’s funder, university and publisher, they could end up with limited or contradictory options. Stakeholders should be encouraged to determine the absolute minimum they require.

Here are my suggestions or what could go into a “How preprint is it?” guide:

Licensing

  • Open license that permits unlimited sharing and reuse (CC BY or equivalent)
  • Allows licenses that are open with limitations
  • Allows licenses that are closed/read only, or permits embargoes
  • Only offers closed/read-only licenses, may also allow embargoes
  • Articles are pay-walled

Long-term archiving

  • Has a robust long-term archiving strategy based on multiple copies
  • Articles are archived or mirrored external to the preprint server
  • Permits articles to be archived elsewhere
  • No archiving policy, or archiving not permitted

Formats

  • Submission and publication in any format permitted
  • Submission and publication from several defined formats.
  • Publication only in one format (e.g. PDF)
  • Submission and publication only in PDF format

Screening of new submissions

  • Has a transparent, clearly defined screening process, including content quality
  • Carries out content quality screening on all submissions, but criteria are not clear
  • Screens only some articles for content quality.
  • Basic checks only, author affiliation or formatting
  • No screening takes place

Machine-readable versions

  • All papers have a machine-readable version in a standard format
  • Most papers have a machine-readable versions, but formats/schema vary
  • Some papers have machine-readable versions
  • No papers have machine-readable versions

Commercial status

  • Community-led and owned project
  • Run by a non-profit organisation with community consultation
  • Run by a for-profit organisation with community consultation
  • Run entirely by a non-profit entity
  • Run entirely by a for-profit entity