Research Preprints

Discussing early research outputs

A short research project: (Where) do preprints fit in?

December 18, 2017 mrittman

How do preprints fit into the research ecosystem? This is the question motivating a short project I will carry. I am particularly interested in how links work between preprints and other early stage outputs, such as code, data, pre-registrations, and blog posts.


The question of what a preprint is and why they are important is a common one. It is mainly addressed from the point of view of two players: the authors and the funding bodies. Since they are the ones that typically make the choice about whether a preprint should be posted or not, it’s not a big surprise. However, a major beneficiary should be the research community at large.

Apart from setting behavioural norms, there isn’t much direct influence from the community about what does and doesn’t appear as a preprint. However, the process by which preprints are posted can have a major effect on how useful they are to the wider research community. If preprints are not visible, linked, sharable and reusable, the research community loses out.

Getting it right when posting preprints can bring huge benefits. The idea behind this project is that preprints can act as a hub for early research outputs. Early stage research outputs often only tell part of the story. There is also code, data or pre-registered analysis: in isolation they leave a large number of questions unanswered. Where does someone go to fit all of the parts together? A preprint can pull together all the aspects of early stage outputs into a hypothesis-driven investigation with a clear logical structure, and in a way that can be reproduced and understood by others.

The approach

To find how and whether preprints can link early stage outputs, I will come at the question from three different angles:

What can be done?

I will look at preprint server submission systems to see whether it is possible to add links to various types of early stage output.

What is being done?

I will check published preprints to see whether authors are making use of the options available for linking their research to other outputs, or whether they find ways around if they are not available.

What could be done

I will survey those running preprint servers about their attitudes towards linking data, code and other outputs to preprints.

The method

These are the categories of early stage research output I will look at:

  • Supplementary data (published directly with the preprint)
  • Data
  • Computer code
  • Previous versions of the same work
  • Later versions of the same work
  • Registered controlled trials
  • Pre-registered methods/analysis
  • Database accession IDs (e.g. Protein databank)
  • Website/blog
  • Social media accounts

The survey will consist of the following questions:

  • Name, email address (optional)
  • Preprint server
  • Main field(s) covered
  • Approximately how many preprints has the preprint server published in 2017?
  • When posting preprints, are authors able to add links to the following: [use the list above]
  • In the submission system
  • In the published version
  • No, we do not permit links to external material
  • By another method (please specify)
  • Further comments [text box]
  • What is your overall impression of how often authors use the options above: [Majority of preprints (>50%), some preprints (10-50%), rarely (<10%), never, n/a]
  • If you do not have the options above, are there any that you plan add to your platform by the end of 2018? [Have already, planned, no plan]
  • Do you have an online policy about links to data and other early stage research output? [URL, comments]
  • Do you have any further comments about preprints and links to early stage research outputs? [text box]


The sample sizes will be small, so the outputs will mainly be qualitative. I will look at differences between fields, and between old and new preprint servers.

Considering the broad fields: biology, physics, chemistry, mathematics (including staistics and computing), social sciences, humanities, engineering, and earth sciences/geography, I will look at whether authors have an option for linking their preprint to each type of additional output.


Do you want to get involved? In true preprint fashion, I’m looking for feedback on the project plan, so please either email me or make a comment below. I’m also looking for collaborators, so get in touch if you are interested.
The results will be presented in a flash talk and poster at the Open Science Conference in Berlin, March 2018. All data collected will be made public and there will, of course, be a preprint.

How preprint is it?

October 2, 2017 mrittman

Preprint servers come in a variety of shades. Some have argued that this shouldn’t be the case and that one size should fit all, but in my view a variety of offerings is an advantage and not a hindrance. After all, I have never heard a claim that there should be only one journal for every paper published and the same should apply to preprints. At the same time, with such an array of choices, how do you tell what any given preprint server offers? This is a large part of my motivation for setting up this website and blog. Authors and readers need good information about what preprints are and what they offer.

There have been discussions around setting minimum standards for preprint servers, such as from ASAPbio. Initially, it sounds like a good idea: set a standard that everyone has to follow and then you know what you get. The big drawback is that it stifles creativity and innovation, and would be expensive to police. Not all preprint servers look like one: take Zenodo, for instance, or FigShare: their primary purpose is to share data, not papers, but they also host a large number of unpublished research papers that are considered preprints. Overleaf and Authorea are online writing tools, but host public drafts of research papers that can also be considered preprints. If requirements for preprint servers came in that were too stringent, either these platforms would need to make modifications or they would fall outside of the system. Small preprint servers and university repositories could find the cost of compliance prohibitive and authors might find themselves having to post the same work in several places. There is still tremendous scope for innovating how research, beyond the PDF article, and any set of well-intentioned guidelines that assume the status quo will hold back progress.

Then there is the question of who is going to implement any guidelines? This could quickly become a very complex matter. It is likely to be a field-specific body, so preprint servers that operate in more than one field could find that they need to comply with multiple guidelines. In addition, where will funding come from to verify that each preprint server complies and who will handle complaints? This sounds like an expensive an error-prone process, or one that ineffective.

What is the alternative? First, there is the view that we are just taking this too seriously. After all, preprints are unvalidated work that is posted online at the authors’ whim. This is problematic once you start to see preprints as part of the wider research infrastructure. For example, funders want guarantees about preprints if they are going to accept them in grant applications, as many preprint advocates wish and some funders have started to do. For preprints to be part of the trend towards open science, it requires methods to be included and data to be made available.

I suggest something based on the journal initiative “How open is it?”. Here there is no black and white in terms of what is or isn’t an open journal, but it recognises that there are certain characteristics that contribute to openness and measures journals according to it. The preprint context doesn’t exactly translate, in that there is no single goal to aim for (i.e. open/closed). However, we can identify characteristics of preprints that hold a strong interest for certain stakeholders, such as whether there is a screening process or if certain file types are available, such as XML. Are there any drawbacks of such a system? Of course, if the bar is set too high by an author’s funder, university and publisher, they could end up with limited or contradictory options. Stakeholders should be encouraged to determine the absolute minimum they require.

Here are my suggestions or what could go into a “How preprint is it?” guide:


  • Open license that permits unlimited sharing and reuse (CC BY or equivalent)
  • Allows licenses that are open with limitations
  • Allows licenses that are closed/read only, or permits embargoes
  • Only offers closed/read-only licenses, may also allow embargoes
  • Articles are pay-walled

Long-term archiving

  • Has a robust long-term archiving strategy based on multiple copies
  • Articles are archived or mirrored external to the preprint server
  • Permits articles to be archived elsewhere
  • No archiving policy, or archiving not permitted


  • Submission and publication in any format permitted
  • Submission and publication from several defined formats.
  • Publication only in one format (e.g. PDF)
  • Submission and publication only in PDF format

Screening of new submissions

  • Has a transparent, clearly defined screening process, including content quality
  • Carries out content quality screening on all submissions, but criteria are not clear
  • Screens only some articles for content quality.
  • Basic checks only, author affiliation or formatting
  • No screening takes place

Machine-readable versions

  • All papers have a machine-readable version in a standard format
  • Most papers have a machine-readable versions, but formats/schema vary
  • Some papers have machine-readable versions
  • No papers have machine-readable versions

Commercial status

  • Community-led and owned project
  • Run by a non-profit organisation with community consultation
  • Run by a for-profit organisation with community consultation
  • Run entirely by a non-profit entity
  • Run entirely by a for-profit entity

Preprint ethics

September 4, 2017 mrittman

A few weeks ago there was some discussion in the Twittersphere about the ethical side of preprints. The focus was on what should happen when a preprint is found to be flawed and whether the same kind of retraction process as for articles can and should be applied.

Partly as a result of this, preprints were put on the agenda of the Committee for Publication Ethics (COPE), who hosted a discussion about preprint ethics and put out a public call for comments.

I was also recently invited to discuss the ethics of preprints at a meeting of the ISMTE along with Jennifer Lin from Crossref. I promised on Twitter that I would share the slides, but in the event it was a Q&A session, so you get this blog post instead.

What’s the fuss about preprint ethics?

Are preprint ethics just a discussion of ‘what-if’ scenarios or hypothetical musings? Not really. Preprints without methods sections have been published, and quickly updated. An author recently forwarded me an email about a preprint where a journal had rejected their article after peer review and production because it had appeared as a preprint: a frustrating and entirely avoidable scenario. I also see other requests to withdraw preprints for a variety of reasons, mainly from authors but occasionally also from third parties.

An approach to preprint ethics

To get a handle on preprint ethics, I find it useful to put issues into a few categories (this isn’t intended to be an exhaustive list):

1. Issues that are the same for preprints and journal articles: data manipulation, authorship, plagiarism (and self-plagiarism), copyright infringement, salami slicing, excessive self-citation, citation cartels…

2. Issues that are (more) specific to preprints: incomplete methods or data, multiple posting of preprints.

3. Issues specific to preprint servers: minimum standards of content/screening procedures, who should handle complaints, lack of universal ethical standards, limited resources to investigate complaints.

4. Issues where preprints meet articles: how to handle different versions, directing readers to the most reliable version, first reporting of results, citation of preprints in journal articles (how and when), copyright ownership and licensing.

There’s just a couple of specific issues I’d like to briefly address here. I will likely cover some others in future posts: there is a lot to discuss and currently little consensus.


Removal of preprints once they are online is an area where there remains significant questions. Should they be removed at all? After all, they come with no guarantee of quality. Also, once online they are immediately picked up by various search engines and downloaded: removing them doesn’t make them go away.

In terms of removing preprints for quality issues, in my view it comes down to how authoritative a preprint is seen to be: a result of its standing in the community. It’s pretty easy to argue that misleading works considered to have some degree of authority should be removed. As preprints are in the grey area between discovery and validated knowledge, this is open to interpretation and likely to vary from discipline to discipline. Another argument is that, unlike journal articles, preprints can be rapidly updated or modified by the authors, so many identified problems can be fixed with a new version. Allowing comments on preprints can help in cases where the authors don’t wish to update: the reader can read the preprint and comment and draw their own conclusions.

Once a preprint has been removed, there is the question of how it is done. Should it simply disappear? Crossref advises against this for preprints with a doi. If preprints are meant to be citable, then there should at least be a statement saying what the preprint was and that it has been removed. Should a reason be given for removal? I don’t know of a preprint server that does so. I suspect the reasons vary: they just don’t think it is necessary for tentative work, an unwillingness to put out statements that could be questioned, a wish to protect the authors on which they rely for submissions.


How, when and whether preprints should be cited has been covered elsewhere (one excellent post here). I just want to make the point that citations are a double-edged sword. Someone involved in the early stages of ArXiv recalled to me that citations were one of the reasons that physicists started to use the service: extracting the bibliography and putting it on a forerunner of INSPIRE meant that by using a preprint service, physicists could improve their profile in the field. This, like most research reward systems, leads both to increased use and the possibility for gaming. The cross-subject equivalent is Google Scholar and I have seen preprints submitted with substantial numbers of self-citations which appear to be an effort to improve Google Scholar citations. If the research is good, should that be enough to reject a preprint? Where to set the barrier is currently far from clear, with individual preprint servers or repositories currently left to set their own policy.


There is a need for further discussion about some of the issues above. Some of it needs to be had between those running preprint servers and some within author communities, perhaps led by societies or funders. Publishers should expect authors to declare preprint versions of their article at submission and clearly note their policy for accepting preprints. I don’t think this is the last you will hear about these issues.

Reflections on Geneva: OAI10

July 7, 2017 mrittman

OAI10, hosted jointly by the University of Geneva and CERN, was a very thought-provoking conference. It gave me a great insight into how many in the scholarly ecosystem see open access and open science, especially librarians.

The most memorable session for me was the one on a transition to open access which, in reality, offered few solutions but outlined some of the principle ways of thinking of this issue and the drivers likely to determine the future transition to open access. My take-away was that any move away from APC-based gold open access is going to need a great deal of coordination between publishers, funders and universities. It also brought home how fundamental the transition of publishers from content owners to service providers is.

I was very pleased to have the opportunity to run an unconference session on preprints, ostensibly on how to integrate preprints into the research life cycle. The biggest benefit for me was to hear librarians speak on the subject of preprints. I hear a lot from funders and publishers, but very little from universities and libraries. The discussion went in a very different direction from what I expected. I realised that many librarians sit on both sides of the fence, as both consumers and producers of preprints. Institutional repositories, run by librarians, contain large numbers of preprints and working papers—collections managed by the university library. On the other hand, librarians must know when they can use preprints for staff and institutional evaluations and how to recommend them to readers.

There were a range of views expressed and there was by no means agreement on all topics, but the main subjects that came out of the discussion were:

  • What is a preprint? If we are going to develop policies for dealing with these objects we need some framework for deciding what counts.
  • More clarity is needed in terms of policies from funders, publishers and assessment bodies on how preprints can be used.
  • The prevailing view as that it is too early to adopt standards for preprints, however clear governance, links to journal article versions, CC BY licensing and interoperability are desirable.
  • It is unclear how to give advice to non-scholars on the use of preprints.

The last point dove-tailed nicely with a talk on the difficulties for NGOs, journalists and others to access peer reviewed literature, even via schemes designed specifically for that purpose. If non-specialists find a preprint version of an unavailable journal article, how can they rate its reliability or know whether it is similar to the published version?

Preprints were by no means the main theme of OAI10, although Jessica Polka’s excellent plenary talk in the last session sent everyone home with preprints on their minds. I still learned a great deal and am grateful to everyone who shared their views and experience with me.

Whither preprints?

June 15, 2017 mrittman

In recent times there has been a proliferation of preprint servers and a much larger uptake on the part of authors. Few objections have been raised to this trend and I wonder whether this is almost everyone in the scholarly bubble see preprints as fitting into their own world view: traditionalists see no threat to journals and supportive of the editorial process. Those seeking change see the potential for preprints to replace journals, or at least greatly alter the status quo.

In this post I want to spell out some of the possible future scenarios. The most likely future, of course, is that one of these visions will not dominate: different disciplines will come to their own conclusions and it should be up to research communities to decide.

The main issue in looking at different scenarios is how one moves a piece of research from the tentative/draft phase into the corpus of accepted literature, or whether such a distinction is even necessary.

Here are four possible scenarios for preprints in the future.

The status quo

Here, journals stay as the guardians of accepted research and operate as they do currently. Preprints are a tool for getting early feedback and making some results known ahead of time but are viewed as very much inferior to journal articles and not considered essential to the publication process.

This situation seems to prevail for physicists, even though they regularly post to arXiv: journal publication after peer review is still very important, especially when it comes to promotion and funding. Despite recent moves for acceptance of preprints in grant applications, e.g. by the NIH, there haven’t been similar announcements about job applications or promotion. The attitudes of universities and other research institutions is probably critical for moving away from the status quo or maintaining things as they are.

Preprints plus journals

In this scenario journals continue as they are but preprints are also recognised as first class research objects. Preprints can be cited, used in hiring and promotion decisions, and grant applications. They are read with a healthy amount of scepticism but cited where appropriate and fix the first reporting of research results (i.e., they provide scoop protection).

From what I can gather, this is the aim of ASAPbio. Gaining recognition from a broad range of institutions is a key element here and the major difference between this and the previous scenario. For publishers, journal publication goes on as usual, but it could lead them to modify how they solicit articles and give options for slower, more thorough peer review (see, e.g. this post from Kent Anderson at the Scholarly Kitchen).

Preprints disappear

It is not a scenario I hope for, but it is possible that in some fields preprints will never catch on. Either not enough is communicated about their benefits, or there may be specific areas with a specific objection. A field (although I struggle to think of one) where speed of publication is not important stands to gain less from wider use of preprints. Some may argue that fields where individual papers can have very large impact may require greater validation before making research public. Putting the lid on scandal like the vaccine-autism link controversy on a regular basis could cause headaches for scientists. There are good reasons I don’t think it would play out like that, but I’ll leave them for another discussion.

The main way to avoid this scenario is clear articulation of benefits and proper engagement with reasonable objections to preprints. So far, the discussion I’ve seen has been mostly high quality and polite, long may it remain that way!

Overlay journals

A compelling argument is the idea that peer review is costly and inefficient and preprints offer a low-cost and effective alternative. Some kind of validation, such as community-organised peer review, can be made after a preprint has been put online. This has already happened (e.g. Tim Gowers’ Discrete Analysis and Andrew Gelman proposes a super-arXiv overlay journal).

It is, of course, unlikely that publishers and editors would readily agree to such a move and a significant cultural shift needs to take place for this eventuality to prevail, even within a single field. The main objection from publishers is about protecting income: a shift from the $5000 per article average income now to something around $100 would put almost every publisher out of business based on current practices – both for-profit and non-profits. However, some fields with limited funding and where fee-based open access is viewed with scepticism may find this a very attractive proposal

A step further questions the necessity of journals at all. Why does the opinion of two or three reviewers and one editor provide a solid validation of research? Peer review has been frequently questioned, but this option would require a new way to validate and test results. I am not aware of any serious proposals in this direction, but I think it’s a space well worth watching in the coming few years.


Which is the best of these scenarios? Unlike the debate on open access, I don’t think this needs to become binary and polarized. Simply posting a preprint does not favour any of the above: the main differences are about how the rest of the research ecosystem values and validates a preprint. It is certainly possible that multiple scenarios can exist side-by-side. It is an exciting time for preprints and I hope that discussing these kinds of options moves higher up the agenda of decision-makers in scholarly research communication.

Open Preprints

May 19, 2017 mrittman

One of the things that most surprised me when putting together the list of preprint servers for this site is that a large number don’t explicitly list any licensing or copyright information, and some routinely use very restrictive licenses. Coming from a publishing background, this was very surprising.

Subscription publishing relies on content ownership and enforcement of strict copyright conditions: If the publisher doesn’t own the copyright, the articles could be distributed by anyone. The open access movement turned this logic on its head. Someone (often the authors or their funder) pays for value added services including peer review, copy editing, hosting and distribution, but anyone is allowed to distribute the final article. The copyright and licensing terms are among the most significant features distinguishing open access and legacy publishers.

Licensing for open access is very important. Most open access publishers have gravitated towards the creative commons licenses, and in particular CC BY. Although there are differing views, simply being free to read is generally accepted as insufficient for open access. It also requires rights for reuse, in whole or in part. This means that for the strictest definitions of open access, not even all creative commons licenses are sufficient.

The preprint paradigm developed before open access and at the very beginning of the internet, when licensing conditions were not such a contentious area. As a result, many preprints were not, and still aren’t, open access compatible. If no terms are stated, as with a number of preprints I’ve seen, the default is an all rights reserved license, which means distribution and reuse are not permitted.

For authors, the lack of open access for preprints doesn’t matter much and I don’t believe many know or care a great deal about it. I have never received a request at to use license other than CC BY for a preprint, and only on about three occasions in the last four years for open access journals (over about 70,000 articles). Most authors I speak to like the ideals of open access, even if they have issues with how it works in practice. It is the rest of the research community (ironically including many of the same people) that stands to lose out. Data mining, especially, becomes a legal minefield if reuse rights are not clear. Use of figures in lectures, blogs and journal articles is problematic. Simply sharing a copy with a few colleagues could be illegal. The tragic recent case of Diego Gomez shows that this is not just a hypothetical argument. The goal for preprints to widely disseminate work is limited by the lack of a clear license. Even more, if the increasing use of preprints has aspirations to be seen as part of the push for open science, the current haphazard approach to licensing isn’t going to work.

Some have justified offering a range of licenses on the grounds that it make preprints more inclusive and the time will come for moving forward on this issue. I think this overestimates the risks and underestimates the benefits of open access, and I am yet to see a timescale for the transition. Others seem to be unaware of the issue: I recently saw a definition declaring that all preprints are open access. To reach the full potential of preprints, they should be in step with open access and aim to be fully integrated into the growing calls for open and transparent science. At the very least, I would challenge those advocating for the use of preprints to decide which side of the fence they sit on.

Where do we go from here?

May 4, 2017 mrittman

In recent times there has been a proliferation of preprint servers and much larger uptake by researchers, particularly in biology. Few objections have been raised to the concept of preprints and I wonder whether this is because most see preprints as reinforcing into their own position: traditionalists see then as no threat to journals, and supportive of the editorial process via providing feedback ahead of submission. Those at the more radical end of the spectrum see the potential to overthrow the system and ask why we need journals any more.

In this post I want to lay out four possible future scenarios for preprints. Most likely, as currently, different disciplines will take different routes and all of the scenarios may co-exist. This is not a debate that needs to polarise and finish at one end point. It is up to research communities to decide what works for them.

The main issues up for discussion are how to move a tentative piece of work into the corpus of accepted literature, and how preprints fit into the research cycle. Here are four possible scenarios:

1. The status quo

Here, journals stay as the guardians of accepted research and operate as currently. Preprints are a tool for early announcement that bypasses the often slow review and editorial process, and allows researchers to get early feedback on their work. This is more or less how physics has worked for a long time. In fact, I have heard it said that there is no great need for open access physics journals as everything is on arXiv.

This is the likely scenario for the immediate future as it doesn’t rock the boat. Researchers are generally conservative about changes to publishing and uptake of preprints in new disciplines is likely to be slow.

The status of preprints here is somewhat below journal articles. They are a kind of untested grey matter. This scenario places a great deal of faith in the efficacy of peer review and editorial decision-making, which has been much criticized. It seems a missed opportunity if preprints do not contribute to assessing research at least to some extent.

2. Preprints disappear

This is a scenario that I don’t really want to think about, but is a possibility. If there is a lack of widespread uptake of preprints and they are not recognised as valuable by funding bodies or in research assessment, they will become a burden to researchers and will gradually disappear: no new preprints will be added. The current signs point against this scenario, but only in certain fields. It is possible that preprints will never gain traction in other fields. There may be arguments against preprints, for example where clinical recommendations or patent applications are involved.

The way to avoid this endpoint involves some lobbying of influential organisations, as well as reaching a critical use mass for preprints. Tangible benefits for moderate effort need to be demonstrated.

3. Overlay journals

An overlay journal is one that directly publishes preprints. The editorial process takes place once the preprint is online.

In this scenario, the lines between preprints and journal articles become blurred. There is an editorial process, but it focuses on assessing the preprint directly, not a separately submitted piece of work. F1000 are essentially running this system, and Tim Gowers has set up the journal Discrete Analysis as an overlay on arXiv.

This approach has the potential to dramatically reduce the cost of publishing, and bring transparency to the process and more control to authors. It also has the potential for established publishers to take control of the preprint process and tie authors into their platform. Which preprint server to post on could become almost as agonizing as which journal to submit to. I suspect there are unintended consequences also when preprints start to be used to assess individual performance and metrics applied.

An advantage of this approach, though, is that it puts preprints more concretely into the research cycle. This would be a good strategy to promote for supporters of open science.

4. No journals

For those opposed to the role of publishers as arbiters and gatekeepers of research dissemination, the idea of getting rid of journals altogether is quite appealing. In this scenario, preprints are published and readers can make their own mind up whether they are any good. Preprints replace journal articles. With flexibility to update preprints at any time, research should become more self-correcting than it is currently. Most researchers already rely on search engines and related algorithms to find work for them, so tagging an article as belonging to a specific journal is obselete.

On the other hand, how does someone new to the field rate articles in this scenario? Some papers will get a lot of attention, whereas a great deal of incorrect, uninteresting research, will be left untouched, unread and wrong. At least with journals ever article goes through a checking process, even if it has some flaws.

To conclude, each of these scenarios has strengths and weaknesses and, as I said above, there is no one-size-fits-all solution. The main question to ask is whether they strengthen research output and lead to effective, creative work.

Reflections from the Open Science Conference 2017

March 24, 2017 mrittman

I spent the first part of this week at the Open Science Conference in Berlin. As with many conference, it was a great opportunity to step back and take a different view of things away from the frenetic everyday tasks. I met a lot of interesting people and came away with more questions and ideas than answers. One of the things on my mind, of course, was preprints and how they fit into the current view of open science.

The overall impression I came away with was that open science is at a stage where no-one is quite sure what it is, but they think it’s a good idea. Indeed, the first question of the feedback questionnaire asked for a definition of open science. At the same time, with the assumption that open science and its constituent elements are a good thing, no-one is making a clear case for this and there was little that would persuade sceptics.

For better or worse, the organisors majored on a few aspects of open science: open data was a strong theme, open education resources had a lot of air time and I attended a session on open peer review. My favourite session was the one on alternative metrics and showed that the discussion in this area is moving on to the identification of more useful metrics.

So how did preprints fare? Actually, not so well. Preprints feature in the open science monitor from the EU commission ( and Niko Kriegskorte’s method of not reviewing unless a preprint exists was mentioned. There were also comments around the need to make research available as soon as possible and the usual gripes about slow publishers and citation metric addiction.

In conversations, also, there was general support for preprints but without a great deal of enthusiasm, and scepticism that new fields will take up preprinting on a regular basis just yet. I don’t have a good answer to why this is. Maybe preprints are simply not on the agenda. In Europe there are few advocates for preprints: just search for #preprint on Twitter and you will see that most of the activity takes place when America is online. Perhaps it is because most resources seem to be pointed at open data, which is a huge and important challenge.  It could also be that it is seen as an area that is established in some disciplines and growth will take care of itself.

Preprints seem to offer a relatively cheap and simple way of furthering the cause of open science. Estimates of putting a preprint online are in the range of 10 Euros (or USD) per paper, compared to 5000 US dollars for journal publications. Preprints also offer:

  • Rapid access to research, months or sometimes years in advance;
  • Transparency, with open review possible in a context where reviewers simply advise the authors without the judgment of an editorial decision looming in the background;
  • Papers that are free to read and often fully open access. This can help to circumvent paywalls and copy right transfer issues;
  • Putting control of reporting research into the hands of the researchers producing the work.

How can preprints start to sell themselves better, even within the conversation about open science? First, we need to make the case for the benefits of preprints and lower the barriers for participation as low as possible. Second, there needs to be better links to more established and discussed areas of open science, especially integration with data publishing. Those operating preprint platforms need to look at which standards and technologies can be integrated to link with other parts of the open science agenda.

I think the previous paragraph applies equally to how open science needs to embed itself into current research practice. To make open science and preprints the norm, there needs to be a convincing case that some extra effort brings tangible benefits. I’m hoping, at least, that at the next edition of the conference there will be a session dedicated to preprints.

What is a preprint?

March 16, 2017 mrittman

With the rise of new preprint servers, and especially multiple offerings in the same discipline, some effort should be put into thinking about what it is that makes a preprint a preprint. This post is my take on the issue.

A preprint is about three aspects: content, availability and timing.


The preprints we are concerned with are additions to the research literature. To state the obvious, any work following the scientific method should qualify. The question then is how widely should the net be thrown to include other article types? It should be uncontroversial to include research articles, reviews and essays which form the backbone of output in science and the humanities. However, the literature includes a lot more: editorials, opinions, comments and so on. In addition, the concept of micropublication has been suggested, i.e. publishing a single part of a traditional paper, such as only the methods, results or discussion. With the current publishing paradigm, one could suggest including anything that could be published in a journal could be made a preprint, but thus is unsatisfactory as the role of journals might unexpectedly change and journals have individual policies. It also excludes work at a more preliminary stage. I think it is useful to split the literature into 1) research: hypothesis driven investigation and 2) grey literature: informed conversation about research (written by researchers). Both could be considered for preprints, but highlighting the difference via an assigned article type should be done in practice.


A preprint should be available to anyone. I would qualify this by saying that open access is desirable but not necessary: A basic definition of a preprint should permit a broad range of copyright and licensing criteria. Would it be acceptable to paywall a preprint? I would argue strongly against this option and it seems quite pointless, but don’t think it should discount something as being classed as a preprint.


Preprints are about reporting work at the earliest possible stage. “pre” in the name is because they come before validation by the research community.

The primary mode of validation currently is peer review and journal publication, but the definition shouldn’t be restrictive. New processes of confirming results could emerge in the future and should be connected to preprints. Grey literature is usually not peer reviewed, so editorial review and publication is sufficient to count as validation.

Should postprints or accepted versions of papers be mixed in with preprints? The difference between what is a preprint and postprint should be about when the first version is put online. In practice, it is acceptable to update a preprint with an new version, including a peer-reviewed one. The lack of a fixed end point is, to my mind, a strength of preprints and should be permitted within a working definition. As outlined below, there are situations where it is critical to know if something has been peer reviewed, so there should be a differentiation between the two.


In summary, I would define a preprint as a piece of research made publicly available before it has been validated by the research community.

Not to be confused with the above, and a topic for a future post is the question of what is a preprint server. Not all preprints appear on a preprint server and not everything that appears on a preprint server is necessarily a preprint.

Does it matter?

A number of preprint servers don’t publish preprints strictly according to this definition, for example by allowing publication of an abstract without full text, or permitting uploads of post-prints or accepted versions. I don’t think most scholars care a great deal about this, but it is important in some circumstances. For preprint aggregators, funders, journalists, medical practitioners and in research assessments it is much more important to know what has not been peer-reviewed and whether something is simply an opinion as opposed to reporting research outcomes. For this reason, the distinction between research and grey literature, and preliminary and reviewed work should be made clear.

In the spirit of preprints, I would be interested in feedback on the definition and how it can be improved. Please comment or get in touch by some other means.

A List of Preprint Servers

March 9, 2017 mrittman

Last week I put a rough version of the list of preprint platforms live, responding to a request on Twitter from Jessica Polka. I’ve now filled in most of the gaps and put it into a Google sheet, which seems the best way to display the information at present. In the future I aim to use something more fancy that will span the page and can be filtered and sorted.

I hope it will be a useful resource to authors considering options for where to place their preprint and anyone interested in an overview of the state of preprints.

Putting the list together was an interesting exercise and revealing in several aspects. Here’s a few observations that I made.

Firstly, there are not that many preprint servers: the list runs to 19 at the moment. More than half of those listed (including OSF-based servers individually) started in the last year. When you compare it to the number to journals it is miniscule, even in disciplines where preprints have played a large role for many years. I intend to exclude institutional repositories, of which I suspect there will be a great many that post preprints. There are already lists of them elsewhere and authorship is limited to those affiliated with the institution.

A major lack in most preprint servers is long-term archiving. Excepting those based at CERN, I only found one with a statement about archiving on their website (CORE from MLA Commons). This should be a high priority for those operating preprint platforms, but there appear to be few clear solutions at present.

Also lacking is a business model that does not rely on backing by one or a handful of bodies. SSRN uses a model where institutions or readers pay for extra services. Authorea charges for use of their platform (although there is a free option). Funding from a larger organisation is fine, as long as institutions are willing to pay in the long term, but it relies to some extent on good will and some servers will likely look at alternative models in the coming years.

The background to preprint servers is varied, arising from libraries, publishers, societies, author services etc. Each puts an emphasis on different aspects and the rigour in submission checks, licensing information, information for authors, inclusion of non-preprint material and so on varies. In my experience, most authors don’t particularly pay attention to these aspects, but they may play a role in integrating preprints more formally into research evaluation. Funders and universities do care more about the details. A discussion on basic requirements for preprints and an interest in whether a consensus can be achieved was one of the motivating factors for my setting up Research Preprints.

Finally, the pervasiveness of the PDF is evident. The convenience for publication and human reading wins hands down. At the moment other formats don’t get a look in, which is fine in the short term. In the longer term, this could pose a significant challenge for text and data mining, especially when format is so varied.