Reflections on Geneva: OAI10

OAI10, hosted jointly by the University of Geneva and CERN, was a very thought-provoking conference. It gave me a great insight into how many in the scholarly ecosystem see open access and open science, especially librarians.

The most memorable session for me was the one on a transition to open access which, in reality, offered few solutions but outlined some of the principle ways of thinking of this issue and the drivers likely to determine the future transition to open access. My take-away was that any move away from APC-based gold open access is going to need a great deal of coordination between publishers, funders and universities. It also brought home how fundamental the transition of publishers from content owners to service providers is.

I was very pleased to have the opportunity to run an unconference session on preprints, ostensibly on how to integrate preprints into the research life cycle. The biggest benefit for me was to hear librarians speak on the subject of preprints. I hear a lot from funders and publishers, but very little from universities and libraries. The discussion went in a very different direction from what I expected. I realised that many librarians sit on both sides of the fence, as both consumers and producers of preprints. Institutional repositories, run by librarians, contain large numbers of preprints and working papers—collections managed by the university library. On the other hand, librarians must know when they can use preprints for staff and institutional evaluations and how to recommend them to readers.

There were a range of views expressed and there was by no means agreement on all topics, but the main subjects that came out of the discussion were:

  • What is a preprint? If we are going to develop policies for dealing with these objects we need some framework for deciding what counts.
  • More clarity is needed in terms of policies from funders, publishers and assessment bodies on how preprints can be used.
  • The prevailing view as that it is too early to adopt standards for preprints, however clear governance, links to journal article versions, CC BY licensing and interoperability are desirable.
  • It is unclear how to give advice to non-scholars on the use of preprints.

The last point dove-tailed nicely with a talk on the difficulties for NGOs, journalists and others to access peer reviewed literature, even via schemes designed specifically for that purpose. If non-specialists find a preprint version of an unavailable journal article, how can they rate its reliability or know whether it is similar to the published version?

Preprints were by no means the main theme of OAI10, although Jessica Polka’s excellent plenary talk in the last session sent everyone home with preprints on their minds. I still learned a great deal and am grateful to everyone who shared their views and experience with me.

Open Preprints

One of the things that most surprised me when putting together the list of preprint servers for this site is that a large number don’t explicitly list any licensing or copyright information, and some routinely use very restrictive licenses. Coming from a publishing background, this was very surprising.

Subscription publishing relies on content ownership and enforcement of strict copyright conditions: If the publisher doesn’t own the copyright, the articles could be distributed by anyone. The open access movement turned this logic on its head. Someone (often the authors or their funder) pays for value added services including peer review, copy editing, hosting and distribution, but anyone is allowed to distribute the final article. The copyright and licensing terms are among the most significant features distinguishing open access and legacy publishers.

Licensing for open access is very important. Most open access publishers have gravitated towards the creative commons licenses, and in particular CC BY. Although there are differing views, simply being free to read is generally accepted as insufficient for open access. It also requires rights for reuse, in whole or in part. This means that for the strictest definitions of open access, not even all creative commons licenses are sufficient.

The preprint paradigm developed before open access and at the very beginning of the internet, when licensing conditions were not such a contentious area. As a result, many preprints were not, and still aren’t, open access compatible. If no terms are stated, as with a number of preprints I’ve seen, the default is an all rights reserved license, which means distribution and reuse are not permitted.

For authors, the lack of open access for preprints doesn’t matter much and I don’t believe many know or care a great deal about it. I have never received a request at to use license other than CC BY for a preprint, and only on about three occasions in the last four years for open access journals (over about 70,000 articles). Most authors I speak to like the ideals of open access, even if they have issues with how it works in practice. It is the rest of the research community (ironically including many of the same people) that stands to lose out. Data mining, especially, becomes a legal minefield if reuse rights are not clear. Use of figures in lectures, blogs and journal articles is problematic. Simply sharing a copy with a few colleagues could be illegal. The tragic recent case of Diego Gomez shows that this is not just a hypothetical argument. The goal for preprints to widely disseminate work is limited by the lack of a clear license. Even more, if the increasing use of preprints has aspirations to be seen as part of the push for open science, the current haphazard approach to licensing isn’t going to work.

Some have justified offering a range of licenses on the grounds that it make preprints more inclusive and the time will come for moving forward on this issue. I think this overestimates the risks and underestimates the benefits of open access, and I am yet to see a timescale for the transition. Others seem to be unaware of the issue: I recently saw a definition declaring that all preprints are open access. To reach the full potential of preprints, they should be in step with open access and aim to be fully integrated into the growing calls for open and transparent science. At the very least, I would challenge those advocating for the use of preprints to decide which side of the fence they sit on.

Where do we go from here?

In recent times there has been a proliferation of preprint servers and much larger uptake by researchers, particularly in biology. Few objections have been raised to the concept of preprints and I wonder whether this is because most see preprints as reinforcing into their own position: traditionalists see then as no threat to journals, and supportive of the editorial process via providing feedback ahead of submission. Those at the more radical end of the spectrum see the potential to overthrow the system and ask why we need journals any more.

In this post I want to lay out four possible future scenarios for preprints. Most likely, as currently, different disciplines will take different routes and all of the scenarios may co-exist. This is not a debate that needs to polarise and finish at one end point. It is up to research communities to decide what works for them.

The main issues up for discussion are how to move a tentative piece of work into the corpus of accepted literature, and how preprints fit into the research cycle. Here are four possible scenarios:

1. The status quo

Here, journals stay as the guardians of accepted research and operate as currently. Preprints are a tool for early announcement that bypasses the often slow review and editorial process, and allows researchers to get early feedback on their work. This is more or less how physics has worked for a long time. In fact, I have heard it said that there is no great need for open access physics journals as everything is on arXiv.

This is the likely scenario for the immediate future as it doesn’t rock the boat. Researchers are generally conservative about changes to publishing and uptake of preprints in new disciplines is likely to be slow.

The status of preprints here is somewhat below journal articles. They are a kind of untested grey matter. This scenario places a great deal of faith in the efficacy of peer review and editorial decision-making, which has been much criticized. It seems a missed opportunity if preprints do not contribute to assessing research at least to some extent.

2. Preprints disappear

This is a scenario that I don’t really want to think about, but is a possibility. If there is a lack of widespread uptake of preprints and they are not recognised as valuable by funding bodies or in research assessment, they will become a burden to researchers and will gradually disappear: no new preprints will be added. The current signs point against this scenario, but only in certain fields. It is possible that preprints will never gain traction in other fields. There may be arguments against preprints, for example where clinical recommendations or patent applications are involved.

The way to avoid this endpoint involves some lobbying of influential organisations, as well as reaching a critical use mass for preprints. Tangible benefits for moderate effort need to be demonstrated.

3. Overlay journals

An overlay journal is one that directly publishes preprints. The editorial process takes place once the preprint is online.

In this scenario, the lines between preprints and journal articles become blurred. There is an editorial process, but it focuses on assessing the preprint directly, not a separately submitted piece of work. F1000 are essentially running this system, and Tim Gowers has set up the journal Discrete Analysis as an overlay on arXiv.

This approach has the potential to dramatically reduce the cost of publishing, and bring transparency to the process and more control to authors. It also has the potential for established publishers to take control of the preprint process and tie authors into their platform. Which preprint server to post on could become almost as agonizing as which journal to submit to. I suspect there are unintended consequences also when preprints start to be used to assess individual performance and metrics applied.

An advantage of this approach, though, is that it puts preprints more concretely into the research cycle. This would be a good strategy to promote for supporters of open science.

4. No journals

For those opposed to the role of publishers as arbiters and gatekeepers of research dissemination, the idea of getting rid of journals altogether is quite appealing. In this scenario, preprints are published and readers can make their own mind up whether they are any good. Preprints replace journal articles. With flexibility to update preprints at any time, research should become more self-correcting than it is currently. Most researchers already rely on search engines and related algorithms to find work for them, so tagging an article as belonging to a specific journal is obselete.

On the other hand, how does someone new to the field rate articles in this scenario? Some papers will get a lot of attention, whereas a great deal of incorrect, uninteresting research, will be left untouched, unread and wrong. At least with journals ever article goes through a checking process, even if it has some flaws.

To conclude, each of these scenarios has strengths and weaknesses and, as I said above, there is no one-size-fits-all solution. The main question to ask is whether they strengthen research output and lead to effective, creative work.

Reflections from the Open Science Conference 2017

I spent the first part of this week at the Open Science Conference in Berlin. As with many conference, it was a great opportunity to step back and take a different view of things away from the frenetic everyday tasks. I met a lot of interesting people and came away with more questions and ideas than answers. One of the things on my mind, of course, was preprints and how they fit into the current view of open science.

The overall impression I came away with was that open science is at a stage where no-one is quite sure what it is, but they think it’s a good idea. Indeed, the first question of the feedback questionnaire asked for a definition of open science. At the same time, with the assumption that open science and its constituent elements are a good thing, no-one is making a clear case for this and there was little that would persuade sceptics.

For better or worse, the organisors majored on a few aspects of open science: open data was a strong theme, open education resources had a lot of air time and I attended a session on open peer review. My favourite session was the one on alternative metrics and showed that the discussion in this area is moving on to the identification of more useful metrics.

So how did preprints fare? Actually, not so well. Preprints feature in the open science monitor from the EU commission ( and Niko Kriegskorte’s method of not reviewing unless a preprint exists was mentioned. There were also comments around the need to make research available as soon as possible and the usual gripes about slow publishers and citation metric addiction.

In conversations, also, there was general support for preprints but without a great deal of enthusiasm, and scepticism that new fields will take up preprinting on a regular basis just yet. I don’t have a good answer to why this is. Maybe preprints are simply not on the agenda. In Europe there are few advocates for preprints: just search for #preprint on Twitter and you will see that most of the activity takes place when America is online. Perhaps it is because most resources seem to be pointed at open data, which is a huge and important challenge.  It could also be that it is seen as an area that is established in some disciplines and growth will take care of itself.

Preprints seem to offer a relatively cheap and simple way of furthering the cause of open science. Estimates of putting a preprint online are in the range of 10 Euros (or USD) per paper, compared to 5000 US dollars for journal publications. Preprints also offer:

  • Rapid access to research, months or sometimes years in advance;
  • Transparency, with open review possible in a context where reviewers simply advise the authors without the judgment of an editorial decision looming in the background;
  • Papers that are free to read and often fully open access. This can help to circumvent paywalls and copy right transfer issues;
  • Putting control of reporting research into the hands of the researchers producing the work.

How can preprints start to sell themselves better, even within the conversation about open science? First, we need to make the case for the benefits of preprints and lower the barriers for participation as low as possible. Second, there needs to be better links to more established and discussed areas of open science, especially integration with data publishing. Those operating preprint platforms need to look at which standards and technologies can be integrated to link with other parts of the open science agenda.

I think the previous paragraph applies equally to how open science needs to embed itself into current research practice. To make open science and preprints the norm, there needs to be a convincing case that some extra effort brings tangible benefits. I’m hoping, at least, that at the next edition of the conference there will be a session dedicated to preprints.

What is a preprint?

With the rise of new preprint servers, and especially multiple offerings in the same discipline, some effort should be put into thinking about what it is that makes a preprint a preprint. This post is my take on the issue.

A preprint is about three aspects: content, availability and timing.


The preprints we are concerned with are additions to the research literature. To state the obvious, any work following the scientific method should qualify. The question then is how widely should the net be thrown to include other article types? It should be uncontroversial to include research articles, reviews and essays which form the backbone of output in science and the humanities. However, the literature includes a lot more: editorials, opinions, comments and so on. In addition, the concept of micropublication has been suggested, i.e. publishing a single part of a traditional paper, such as only the methods, results or discussion. With the current publishing paradigm, one could suggest including anything that could be published in a journal could be made a preprint, but thus is unsatisfactory as the role of journals might unexpectedly change and journals have individual policies. It also excludes work at a more preliminary stage. I think it is useful to split the literature into 1) research: hypothesis driven investigation and 2) grey literature: informed conversation about research (written by researchers). Both could be considered for preprints, but highlighting the difference via an assigned article type should be done in practice.


A preprint should be available to anyone. I would qualify this by saying that open access is desirable but not necessary: A basic definition of a preprint should permit a broad range of copyright and licensing criteria. Would it be acceptable to paywall a preprint? I would argue strongly against this option and it seems quite pointless, but don’t think it should discount something as being classed as a preprint.


Preprints are about reporting work at the earliest possible stage. “pre” in the name is because they come before validation by the research community.

The primary mode of validation currently is peer review and journal publication, but the definition shouldn’t be restrictive. New processes of confirming results could emerge in the future and should be connected to preprints. Grey literature is usually not peer reviewed, so editorial review and publication is sufficient to count as validation.

Should postprints or accepted versions of papers be mixed in with preprints? The difference between what is a preprint and postprint should be about when the first version is put online. In practice, it is acceptable to update a preprint with an new version, including a peer-reviewed one. The lack of a fixed end point is, to my mind, a strength of preprints and should be permitted within a working definition. As outlined below, there are situations where it is critical to know if something has been peer reviewed, so there should be a differentiation between the two.


In summary, I would define a preprint as a piece of research made publicly available before it has been validated by the research community.

Not to be confused with the above, and a topic for a future post is the question of what is a preprint server. Not all preprints appear on a preprint server and not everything that appears on a preprint server is necessarily a preprint.

Does it matter?

A number of preprint servers don’t publish preprints strictly according to this definition, for example by allowing publication of an abstract without full text, or permitting uploads of post-prints or accepted versions. I don’t think most scholars care a great deal about this, but it is important in some circumstances. For preprint aggregators, funders, journalists, medical practitioners and in research assessments it is much more important to know what has not been peer-reviewed and whether something is simply an opinion as opposed to reporting research outcomes. For this reason, the distinction between research and grey literature, and preliminary and reviewed work should be made clear.

In the spirit of preprints, I would be interested in feedback on the definition and how it can be improved. Please comment or get in touch by some other means.

A List of Preprint Servers

Last week I put a rough version of the list of preprint platforms live, responding to a request on Twitter from Jessica Polka. I’ve now filled in most of the gaps and put it into a Google sheet, which seems the best way to display the information at present. In the future I aim to use something more fancy that will span the page and can be filtered and sorted.

I hope it will be a useful resource to authors considering options for where to place their preprint and anyone interested in an overview of the state of preprints.

Putting the list together was an interesting exercise and revealing in several aspects. Here’s a few observations that I made.

Firstly, there are not that many preprint servers: the list runs to 19 at the moment. More than half of those listed (including OSF-based servers individually) started in the last year. When you compare it to the number to journals it is miniscule, even in disciplines where preprints have played a large role for many years. I intend to exclude institutional repositories, of which I suspect there will be a great many that post preprints. There are already lists of them elsewhere and authorship is limited to those affiliated with the institution.

A major lack in most preprint servers is long-term archiving. Excepting those based at CERN, I only found one with a statement about archiving on their website (CORE from MLA Commons). This should be a high priority for those operating preprint platforms, but there appear to be few clear solutions at present.

Also lacking is a business model that does not rely on backing by one or a handful of bodies. SSRN uses a model where institutions or readers pay for extra services. Authorea charges for use of their platform (although there is a free option). Funding from a larger organisation is fine, as long as institutions are willing to pay in the long term, but it relies to some extent on good will and some servers will likely look at alternative models in the coming years.

The background to preprint servers is varied, arising from libraries, publishers, societies, author services etc. Each puts an emphasis on different aspects and the rigour in submission checks, licensing information, information for authors, inclusion of non-preprint material and so on varies. In my experience, most authors don’t particularly pay attention to these aspects, but they may play a role in integrating preprints more formally into research evaluation. Funders and universities do care more about the details. A discussion on basic requirements for preprints and an interest in whether a consensus can be achieved was one of the motivating factors for my setting up Research Preprints.

Finally, the pervasiveness of the PDF is evident. The convenience for publication and human reading wins hands down. At the moment other formats don’t get a look in, which is fine in the short term. In the longer term, this could pose a significant challenge for text and data mining, especially when format is so varied.

Making Preprints Popular

One of the points I labelled as critical for preprints in my previous post was that they should be quickly adopted by various disciplines. Putting an e-print on arXiv is normal in a number of discplines. On the other hand, while increasing numbers of preprints are being made available in other disciplines, for example biology, they remain a small fraction of the overall number of papers published in the field.

Advocates of preprints should aim for the numbers to expand quickly. If use of preprints is not rapidly normalized and they are ignored by the majority of researchers it will become ever harder to drive continued interest. What strategies could advocates of preprints use and what are the end goals? This post focuses mainly on the former. There are, of course, different options. Here’s a few strategies that I think are viable.

Option 1: Field-by-field stakeholder adoption

Resources can be focused on one field, be it biology, engineering, physical chemistry etc. Widespread adoption can be achieved via buy-in from a relatively small number of important stakeholders. On the other hand, resistance from just one of these groups could cause doubt and confusion and stall the process. The strategy here, which ASAPbio seems to have followed, is rapid take-up in a short space of time and to incorporate preprints into the research infrastructure through acceptance by funders, publishers and institutions. This strategy can be thought of as a top-down approach where the involvement of organizations is key to persuading researchers of the acceptability of preprints.

Option 2: Broad adoption

In this strategy, an increasing minority of researchers from multiple disciplines start posting preprints. This is really a bottom-up approach, with change driven by the habits of individual users. It is immediately less disruptive than option 1, but over time institutions will need to find a way to incorporate the needs of those making use of preprints, particularly when it comes to assessing impact and citations. There is a risk for those who use preprints if the pace of adaptation is to slow, however, as their efforts to preprint would not be recognised. They could end up with a significant chunk of their work being discounted.

Option 3: Bring out the big guns

A few highly influential individuals and/or institutions showcase the use of preprints our make use of them and demonstrate the benefits. Rather than a long-term strategy, this is a kick-starter to get others involved and get preprints on the agenda. It’s a big carrot for others to look at and follow their lead.

The reality is that a combination of options is likely to be followed. I’d be interested to hear from early adopters of preprints in physics as to what the main drivers were.

These strategies could plausibly apply for a number of new ideas, but what issues are particular to preprints?

First, preprints have a base to start from and examples to follow. They have been proven influential in several disciplines and new fields should learn from ArXiv, SSRN and others.

Second is the question of how disruptive preprints will be to established systems. Currently they work alongside the normal publishing process. There are future scenarios where preprints become so important that journals are less vital than today, or even irrelevant. Is that possible or desirable? Could preprints even enhance and add value to journal publications? Are there any unintended consequences of preprint/journal interactions?

There are many who see the current preprints boom as a positive step, but it is worth considering what comes after the first step and where the destination should be. I suspect there are differing views and I would very much welcome them in comments below.

A Shopping List for Preprints

I started writing the first follow-up post to the introduction for Research Preprints a few days ago, before ASAPbio put out its call for running a central preprint service for the life sciences. My original intention was to list the things I think most urgently need to be addressed regarding preprints in the next five years. Here’s what I came up with, in no particular order:

  1. What is a preprint?
  2. Business models.
  3. Data and formats.
  4. Increase in the use of preprints and resolution of the citation dilemma.

Since the release of the ASAPbio call, I have become interested in how this list matches up with what they propose. First, a few words about the proposal. For those who are not familiar, ASAPbio advocates for wider use of preprints in the life sciences, and they have made an effort to engage many interested parties, including funders. One of their primary aims for some time has been to aggregate all life science preprints into one place online. In response to the recent announcement, there seemed to be confusion as to whether the proposed ‘Central Service’ should be a new preprint server or an aggregator. My reading is that the latter is the primary aim, but there is nothing to stop a successful proposal providng a mechanism to directly accept preprints from authors.

Going back to the list:

The first point is an existential question, and I was curious that the call from ASAPbio didn’t touch on it at all. Maybe my mathematics training from way back when makes me seek for a definition of everything. A definition of what constitutes a preprint seems rather important, especially when one wishes to aggregate preprints from various places: how to choose what to include and what to exclude. On the other hand, I’ve only seldom heard discussions of what does or should constitute a journal article. It seems that there’s a concensus that everyone sort of knows and that different publishers have different standards. That the same is true of preprints is perhaps no surprise, however the open access movement has often been criticised for having too many ill-defined definitions and it would be a shame for preprints to go the same way.

Moving on to businesss models, there is a contrast between probably the two most well-known preprint servers until now. arXiv is funded solely by supporting institutions as a charity. SSRN, on the other hand, ran for many years as a stand-alone enterprise, covering its costs by selling subscriptions, download fees, job advertisements, conference fees and so on. Last year SSRN was bought by Elsevier, a company that wouldn’t make an investment unless it was worthwhile, so there must be a commercial interest. Many baulk at the idea of preprints being in the hands of a profit-driven enterprise. At the same time, if there is potential revenue to be made from preprints, a sustainable, scalable model would be desirable. This is a point included as an aim in ASAPbio’s call: funding would be provided for five years, after which the service would be expected to find other revenue sources and become sustainable.

Data and formats feature surprisingly strongly in the ASAPbio call. XML conversion of preprints is a central requirement. This is not a technical formality and could put off a number of potentially interested parties, especially when considering the range of formats permitted by preprint servers. It calls for an innovative solution and some clever programming. Time will tell whether it can be delivered. It remains important, though. With the increasing volume of research output, researchers rely more on machine-based searches to deliver content to them and XML is clearly far superior to randomly formatted PDFs when it comes to text and data mining and for use as input to discovery tools.

Finally, how rapidly will preprints evolve? The citation dilemma I refer to above could become a key stumbling block for acceptance of preprints. Academic performance is measured by citations. If these are split between different versions of a preprint and a published version of the same paper, researcher could suffer, especially if preprints are not considered primary research material. In my view there needs to be a rapid expansion, acceptance and normalization of preprints. In other words, preprints need to find their niche and be recognized by funding bodies and promotion boards in addition to being used and recognized by a majority of researchers. Mathematics and physics, with long-time use of arXiv, probably have a contribution to make in this area. It remains to be seen whether other fields will benefit from their experience.

Gazing into a crystal ball is never an easy task, but if those involved with preprints can address the issues above there is a good chance that they will become a first stop for a great deal of research.

Welcome to Research Preprints

Welcome to Research Preprints. The aim is of this site is to be a collaborative space to discuss ideas about how preprints can contribute to dissemination of research and integrated into current practices.

And it needs your input.

I am looking for contributors. The platform should present views from researchers, those running preprint servers, publishers, librarians, indexing services, members of industries and services that rely on scholarly output and anyone else who feels they are a stakeholder. One-off and regular contributors are welcome. If you have a proposal for a post, please get in touch via the contact page.

I also invite you to comment on posts. Discussion should focus on the issues at hand and be constructive. Differing points of view are encouraged, but should be expressed with respect. Almost all participants in the discussion have the same goal: creation and dissemination of high quality research by the best possible means.

For those running preprint servers or aggregators, there is a page of this site dedicated to listing these services. Please provide a short description (up to 150 words).

The idea of putting research papers online before peer review is not new, however the last 12 months has seen an growth in interest from fields outside its traditional strongholds of maths, physics, business and economics.

Expansion in the adoption of preprints raises questions, though: What is a preprint, anyway? What kind of status should preprints have in relation to the peer-reviewed literature? Are preprints a stepping stone on the way to a peer-reviewed article or could they become a new way of assessing and testing research? How sceptical should a reader be of what they find in a preprint? What are the risks associated with making preliminary work widely available?

There are questions about implementation: Should there be minimum criteria for a preprint server? How can preprints be run sustainably and could they be financially independent? Is there a business model for preprints, and should there be?

There are also issues around acceptance: How do preprints fit into the research ecosystem? What benefits can authors take from preprints and do they increase or decrease the quality of reported research?

These are just a few questions this platform has been created to address and I am looking forward to finding out some of the answers.