A Shopping List for Preprints

I started writing the first follow-up post to the introduction for Research Preprints a few days ago, before ASAPbio put out its call for running a central preprint service for the life sciences. My original intention was to list the things I think most urgently need to be addressed regarding preprints in the next five years. Here’s what I came up with, in no particular order:

  1. What is a preprint?
  2. Business models.
  3. Data and formats.
  4. Increase in the use of preprints and resolution of the citation dilemma.

Since the release of the ASAPbio call, I have become interested in how this list matches up with what they propose. First, a few words about the proposal. For those who are not familiar, ASAPbio advocates for wider use of preprints in the life sciences, and they have made an effort to engage many interested parties, including funders. One of their primary aims for some time has been to aggregate all life science preprints into one place online. In response to the recent announcement, there seemed to be confusion as to whether the proposed ‘Central Service’ should be a new preprint server or an aggregator. My reading is that the latter is the primary aim, but there is nothing to stop a successful proposal providng a mechanism to directly accept preprints from authors.

Going back to the list:

The first point is an existential question, and I was curious that the call from ASAPbio didn’t touch on it at all. Maybe my mathematics training from way back when makes me seek for a definition of everything. A definition of what constitutes a preprint seems rather important, especially when one wishes to aggregate preprints from various places: how to choose what to include and what to exclude. On the other hand, I’ve only seldom heard discussions of what does or should constitute a journal article. It seems that there’s a concensus that everyone sort of knows and that different publishers have different standards. That the same is true of preprints is perhaps no surprise, however the open access movement has often been criticised for having too many ill-defined definitions and it would be a shame for preprints to go the same way.

Moving on to businesss models, there is a contrast between probably the two most well-known preprint servers until now. arXiv is funded solely by supporting institutions as a charity. SSRN, on the other hand, ran for many years as a stand-alone enterprise, covering its costs by selling subscriptions, download fees, job advertisements, conference fees and so on. Last year SSRN was bought by Elsevier, a company that wouldn’t make an investment unless it was worthwhile, so there must be a commercial interest. Many baulk at the idea of preprints being in the hands of a profit-driven enterprise. At the same time, if there is potential revenue to be made from preprints, a sustainable, scalable model would be desirable. This is a point included as an aim in ASAPbio’s call: funding would be provided for five years, after which the service would be expected to find other revenue sources and become sustainable.

Data and formats feature surprisingly strongly in the ASAPbio call. XML conversion of preprints is a central requirement. This is not a technical formality and could put off a number of potentially interested parties, especially when considering the range of formats permitted by preprint servers. It calls for an innovative solution and some clever programming. Time will tell whether it can be delivered. It remains important, though. With the increasing volume of research output, researchers rely more on machine-based searches to deliver content to them and XML is clearly far superior to randomly formatted PDFs when it comes to text and data mining and for use as input to discovery tools.

Finally, how rapidly will preprints evolve? The citation dilemma I refer to above could become a key stumbling block for acceptance of preprints. Academic performance is measured by citations. If these are split between different versions of a preprint and a published version of the same paper, researcher could suffer, especially if preprints are not considered primary research material. In my view there needs to be a rapid expansion, acceptance and normalization of preprints. In other words, preprints need to find their niche and be recognized by funding bodies and promotion boards in addition to being used and recognized by a majority of researchers. Mathematics and physics, with long-time use of arXiv, probably have a contribution to make in this area. It remains to be seen whether other fields will benefit from their experience.

Gazing into a crystal ball is never an easy task, but if those involved with preprints can address the issues above there is a good chance that they will become a first stop for a great deal of research.