October 2, 2017 mrittman
Preprint servers come in a variety of shades. Some have argued that this shouldn’t be the case and that one size should fit all, but in my view a variety of offerings is an advantage and not a hindrance. After all, I have never heard a claim that there should be only one journal for every paper published and the same should apply to preprints. At the same time, with such an array of choices, how do you tell what any given preprint server offers? This is a large part of my motivation for setting up this website and blog. Authors and readers need good information about what preprints are and what they offer.
There have been discussions around setting minimum standards for preprint servers, such as from ASAPbio. Initially, it sounds like a good idea: set a standard that everyone has to follow and then you know what you get. The big drawback is that it stifles creativity and innovation, and would be expensive to police. Not all preprint servers look like one: take Zenodo, for instance, or FigShare: their primary purpose is to share data, not papers, but they also host a large number of unpublished research papers that are considered preprints. Overleaf and Authorea are online writing tools, but host public drafts of research papers that can also be considered preprints. If requirements for preprint servers came in that were too stringent, either these platforms would need to make modifications or they would fall outside of the system. Small preprint servers and university repositories could find the cost of compliance prohibitive and authors might find themselves having to post the same work in several places. There is still tremendous scope for innovating how research, beyond the PDF article, and any set of well-intentioned guidelines that assume the status quo will hold back progress.
Then there is the question of who is going to implement any guidelines? This could quickly become a very complex matter. It is likely to be a field-specific body, so preprint servers that operate in more than one field could find that they need to comply with multiple guidelines. In addition, where will funding come from to verify that each preprint server complies and who will handle complaints? This sounds like an expensive an error-prone process, or one that ineffective.
What is the alternative? First, there is the view that we are just taking this too seriously. After all, preprints are unvalidated work that is posted online at the authors’ whim. This is problematic once you start to see preprints as part of the wider research infrastructure. For example, funders want guarantees about preprints if they are going to accept them in grant applications, as many preprint advocates wish and some funders have started to do. For preprints to be part of the trend towards open science, it requires methods to be included and data to be made available.
I suggest something based on the journal initiative “How open is it?”. Here there is no black and white in terms of what is or isn’t an open journal, but it recognises that there are certain characteristics that contribute to openness and measures journals according to it. The preprint context doesn’t exactly translate, in that there is no single goal to aim for (i.e. open/closed). However, we can identify characteristics of preprints that hold a strong interest for certain stakeholders, such as whether there is a screening process or if certain file types are available, such as XML. Are there any drawbacks of such a system? Of course, if the bar is set too high by an author’s funder, university and publisher, they could end up with limited or contradictory options. Stakeholders should be encouraged to determine the absolute minimum they require.
Here are my suggestions or what could go into a “How preprint is it?” guide:
- Open license that permits unlimited sharing and reuse (CC BY or equivalent)
- Allows licenses that are open with limitations
- Allows licenses that are closed/read only, or permits embargoes
- Only offers closed/read-only licenses, may also allow embargoes
- Articles are pay-walled
- Has a robust long-term archiving strategy based on multiple copies
- Articles are archived or mirrored external to the preprint server
- Permits articles to be archived elsewhere
- No archiving policy, or archiving not permitted
- Submission and publication in any format permitted
- Submission and publication from several defined formats.
- Publication only in one format (e.g. PDF)
- Submission and publication only in PDF format
Screening of new submissions
- Has a transparent, clearly defined screening process, including content quality
- Carries out content quality screening on all submissions, but criteria are not clear
- Screens only some articles for content quality.
- Basic checks only, author affiliation or formatting
- No screening takes place
- All papers have a machine-readable version in a standard format
- Most papers have a machine-readable versions, but formats/schema vary
- Some papers have machine-readable versions
- No papers have machine-readable versions
- Community-led and owned project
- Run by a non-profit organisation with community consultation
- Run by a for-profit organisation with community consultation
- Run entirely by a non-profit entity
- Run entirely by a for-profit entity
September 4, 2017 mrittman
A few weeks ago there was some discussion in the Twittersphere about the ethical side of preprints. The focus was on what should happen when a preprint is found to be flawed and whether the same kind of retraction process as for articles can and should be applied.
I was also recently invited to discuss the ethics of preprints at a meeting of the ISMTE along with Jennifer Lin from Crossref. I promised on Twitter that I would share the slides, but in the event it was a Q&A session, so you get this blog post instead.
What’s the fuss about preprint ethics?
Are preprint ethics just a discussion of ‘what-if’ scenarios or hypothetical musings? Not really. Preprints without methods sections have been published, and quickly updated. An author recently forwarded me an email about a preprint where a journal had rejected their article after peer review and production because it had appeared as a preprint: a frustrating and entirely avoidable scenario. I also see other requests to withdraw preprints for a variety of reasons, mainly from authors but occasionally also from third parties.
An approach to preprint ethics
To get a handle on preprint ethics, I find it useful to put issues into a few categories (this isn’t intended to be an exhaustive list):
1. Issues that are the same for preprints and journal articles: data manipulation, authorship, plagiarism (and self-plagiarism), copyright infringement, salami slicing, excessive self-citation, citation cartels…
2. Issues that are (more) specific to preprints: incomplete methods or data, multiple posting of preprints.
3. Issues specific to preprint servers: minimum standards of content/screening procedures, who should handle complaints, lack of universal ethical standards, limited resources to investigate complaints.
4. Issues where preprints meet articles: how to handle different versions, directing readers to the most reliable version, first reporting of results, citation of preprints in journal articles (how and when), copyright ownership and licensing.
There’s just a couple of specific issues I’d like to briefly address here. I will likely cover some others in future posts: there is a lot to discuss and currently little consensus.
Removal of preprints once they are online is an area where there remains significant questions. Should they be removed at all? After all, they come with no guarantee of quality. Also, once online they are immediately picked up by various search engines and downloaded: removing them doesn’t make them go away.
In terms of removing preprints for quality issues, in my view it comes down to how authoritative a preprint is seen to be: a result of its standing in the community. It’s pretty easy to argue that misleading works considered to have some degree of authority should be removed. As preprints are in the grey area between discovery and validated knowledge, this is open to interpretation and likely to vary from discipline to discipline. Another argument is that, unlike journal articles, preprints can be rapidly updated or modified by the authors, so many identified problems can be fixed with a new version. Allowing comments on preprints can help in cases where the authors don’t wish to update: the reader can read the preprint and comment and draw their own conclusions.
Once a preprint has been removed, there is the question of how it is done. Should it simply disappear? Crossref advises against this for preprints with a doi. If preprints are meant to be citable, then there should at least be a statement saying what the preprint was and that it has been removed. Should a reason be given for removal? I don’t know of a preprint server that does so. I suspect the reasons vary: they just don’t think it is necessary for tentative work, an unwillingness to put out statements that could be questioned, a wish to protect the authors on which they rely for submissions.
How, when and whether preprints should be cited has been covered elsewhere (one excellent post here). I just want to make the point that citations are a double-edged sword. Someone involved in the early stages of ArXiv recalled to me that citations were one of the reasons that physicists started to use the service: extracting the bibliography and putting it on a forerunner of INSPIRE meant that by using a preprint service, physicists could improve their profile in the field. This, like most research reward systems, leads both to increased use and the possibility for gaming. The cross-subject equivalent is Google Scholar and I have seen preprints submitted with substantial numbers of self-citations which appear to be an effort to improve Google Scholar citations. If the research is good, should that be enough to reject a preprint? Where to set the barrier is currently far from clear, with individual preprint servers or repositories currently left to set their own policy.
There is a need for further discussion about some of the issues above. Some of it needs to be had between those running preprint servers and some within author communities, perhaps led by societies or funders. Publishers should expect authors to declare preprint versions of their article at submission and clearly note their policy for accepting preprints. I don’t think this is the last you will hear about these issues.
June 15, 2017 mrittman
In recent times there has been a proliferation of preprint servers and a much larger uptake on the part of authors. Few objections have been raised to this trend and I wonder whether this is almost everyone in the scholarly bubble see preprints as fitting into their own world view: traditionalists see no threat to journals and supportive of the editorial process. Those seeking change see the potential for preprints to replace journals, or at least greatly alter the status quo.
In this post I want to spell out some of the possible future scenarios. The most likely future, of course, is that one of these visions will not dominate: different disciplines will come to their own conclusions and it should be up to research communities to decide.
The main issue in looking at different scenarios is how one moves a piece of research from the tentative/draft phase into the corpus of accepted literature, or whether such a distinction is even necessary.
Here are four possible scenarios for preprints in the future.
The status quo
Here, journals stay as the guardians of accepted research and operate as they do currently. Preprints are a tool for getting early feedback and making some results known ahead of time but are viewed as very much inferior to journal articles and not considered essential to the publication process.
This situation seems to prevail for physicists, even though they regularly post to arXiv: journal publication after peer review is still very important, especially when it comes to promotion and funding. Despite recent moves for acceptance of preprints in grant applications, e.g. by the NIH, there haven’t been similar announcements about job applications or promotion. The attitudes of universities and other research institutions is probably critical for moving away from the status quo or maintaining things as they are.
Preprints plus journals
In this scenario journals continue as they are but preprints are also recognised as first class research objects. Preprints can be cited, used in hiring and promotion decisions, and grant applications. They are read with a healthy amount of scepticism but cited where appropriate and fix the first reporting of research results (i.e., they provide scoop protection).
From what I can gather, this is the aim of ASAPbio. Gaining recognition from a broad range of institutions is a key element here and the major difference between this and the previous scenario. For publishers, journal publication goes on as usual, but it could lead them to modify how they solicit articles and give options for slower, more thorough peer review (see, e.g. this post from Kent Anderson at the Scholarly Kitchen).
It is not a scenario I hope for, but it is possible that in some fields preprints will never catch on. Either not enough is communicated about their benefits, or there may be specific areas with a specific objection. A field (although I struggle to think of one) where speed of publication is not important stands to gain less from wider use of preprints. Some may argue that fields where individual papers can have very large impact may require greater validation before making research public. Putting the lid on scandal like the vaccine-autism link controversy on a regular basis could cause headaches for scientists. There are good reasons I don’t think it would play out like that, but I’ll leave them for another discussion.
The main way to avoid this scenario is clear articulation of benefits and proper engagement with reasonable objections to preprints. So far, the discussion I’ve seen has been mostly high quality and polite, long may it remain that way!
A compelling argument is the idea that peer review is costly and inefficient and preprints offer a low-cost and effective alternative. Some kind of validation, such as community-organised peer review, can be made after a preprint has been put online. This has already happened (e.g. Tim Gowers’ Discrete Analysis and Andrew Gelman proposes a super-arXiv overlay journal).
It is, of course, unlikely that publishers and editors would readily agree to such a move and a significant cultural shift needs to take place for this eventuality to prevail, even within a single field. The main objection from publishers is about protecting income: a shift from the $5000 per article average income now to something around $100 would put almost every publisher out of business based on current practices – both for-profit and non-profits. However, some fields with limited funding and where fee-based open access is viewed with scepticism may find this a very attractive proposal
A step further questions the necessity of journals at all. Why does the opinion of two or three reviewers and one editor provide a solid validation of research? Peer review has been frequently questioned, but this option would require a new way to validate and test results. I am not aware of any serious proposals in this direction, but I think it’s a space well worth watching in the coming few years.
Which is the best of these scenarios? Unlike the debate on open access, I don’t think this needs to become binary and polarized. Simply posting a preprint does not favour any of the above: the main differences are about how the rest of the research ecosystem values and validates a preprint. It is certainly possible that multiple scenarios can exist side-by-side. It is an exciting time for preprints and I hope that discussing these kinds of options moves higher up the agenda of decision-makers in scholarly research communication.
May 19, 2017 mrittman
One of the things that most surprised me when putting together the list of preprint servers for this site is that a large number don’t explicitly list any licensing or copyright information, and some routinely use very restrictive licenses. Coming from a publishing background, this was very surprising.
Subscription publishing relies on content ownership and enforcement of strict copyright conditions: If the publisher doesn’t own the copyright, the articles could be distributed by anyone. The open access movement turned this logic on its head. Someone (often the authors or their funder) pays for value added services including peer review, copy editing, hosting and distribution, but anyone is allowed to distribute the final article. The copyright and licensing terms are among the most significant features distinguishing open access and legacy publishers.
Licensing for open access is very important. Most open access publishers have gravitated towards the creative commons licenses, and in particular CC BY. Although there are differing views, simply being free to read is generally accepted as insufficient for open access. It also requires rights for reuse, in whole or in part. This means that for the strictest definitions of open access, not even all creative commons licenses are sufficient.
The preprint paradigm developed before open access and at the very beginning of the internet, when licensing conditions were not such a contentious area. As a result, many preprints were not, and still aren’t, open access compatible. If no terms are stated, as with a number of preprints I’ve seen, the default is an all rights reserved license, which means distribution and reuse are not permitted.
For authors, the lack of open access for preprints doesn’t matter much and I don’t believe many know or care a great deal about it. I have never received a request at preprints.org to use license other than CC BY for a preprint, and only on about three occasions in the last four years for open access journals (over about 70,000 articles). Most authors I speak to like the ideals of open access, even if they have issues with how it works in practice. It is the rest of the research community (ironically including many of the same people) that stands to lose out. Data mining, especially, becomes a legal minefield if reuse rights are not clear. Use of figures in lectures, blogs and journal articles is problematic. Simply sharing a copy with a few colleagues could be illegal. The tragic recent case of Diego Gomez shows that this is not just a hypothetical argument. The goal for preprints to widely disseminate work is limited by the lack of a clear license. Even more, if the increasing use of preprints has aspirations to be seen as part of the push for open science, the current haphazard approach to licensing isn’t going to work.
Some have justified offering a range of licenses on the grounds that it make preprints more inclusive and the time will come for moving forward on this issue. I think this overestimates the risks and underestimates the benefits of open access, and I am yet to see a timescale for the transition. Others seem to be unaware of the issue: I recently saw a definition declaring that all preprints are open access. To reach the full potential of preprints, they should be in step with open access and aim to be fully integrated into the growing calls for open and transparent science. At the very least, I would challenge those advocating for the use of preprints to decide which side of the fence they sit on.
May 4, 2017 mrittman
In recent times there has been a proliferation of preprint servers and much larger uptake by researchers, particularly in biology. Few objections have been raised to the concept of preprints and I wonder whether this is because most see preprints as reinforcing into their own position: traditionalists see then as no threat to journals, and supportive of the editorial process via providing feedback ahead of submission. Those at the more radical end of the spectrum see the potential to overthrow the system and ask why we need journals any more.
In this post I want to lay out four possible future scenarios for preprints. Most likely, as currently, different disciplines will take different routes and all of the scenarios may co-exist. This is not a debate that needs to polarise and finish at one end point. It is up to research communities to decide what works for them.
The main issues up for discussion are how to move a tentative piece of work into the corpus of accepted literature, and how preprints fit into the research cycle. Here are four possible scenarios:
1. The status quo
Here, journals stay as the guardians of accepted research and operate as currently. Preprints are a tool for early announcement that bypasses the often slow review and editorial process, and allows researchers to get early feedback on their work. This is more or less how physics has worked for a long time. In fact, I have heard it said that there is no great need for open access physics journals as everything is on arXiv.
This is the likely scenario for the immediate future as it doesn’t rock the boat. Researchers are generally conservative about changes to publishing and uptake of preprints in new disciplines is likely to be slow.
The status of preprints here is somewhat below journal articles. They are a kind of untested grey matter. This scenario places a great deal of faith in the efficacy of peer review and editorial decision-making, which has been much criticized. It seems a missed opportunity if preprints do not contribute to assessing research at least to some extent.
2. Preprints disappear
This is a scenario that I don’t really want to think about, but is a possibility. If there is a lack of widespread uptake of preprints and they are not recognised as valuable by funding bodies or in research assessment, they will become a burden to researchers and will gradually disappear: no new preprints will be added. The current signs point against this scenario, but only in certain fields. It is possible that preprints will never gain traction in other fields. There may be arguments against preprints, for example where clinical recommendations or patent applications are involved.
The way to avoid this endpoint involves some lobbying of influential organisations, as well as reaching a critical use mass for preprints. Tangible benefits for moderate effort need to be demonstrated.
3. Overlay journals
An overlay journal is one that directly publishes preprints. The editorial process takes place once the preprint is online.
In this scenario, the lines between preprints and journal articles become blurred. There is an editorial process, but it focuses on assessing the preprint directly, not a separately submitted piece of work. F1000 are essentially running this system, and Tim Gowers has set up the journal Discrete Analysis as an overlay on arXiv.
This approach has the potential to dramatically reduce the cost of publishing, and bring transparency to the process and more control to authors. It also has the potential for established publishers to take control of the preprint process and tie authors into their platform. Which preprint server to post on could become almost as agonizing as which journal to submit to. I suspect there are unintended consequences also when preprints start to be used to assess individual performance and metrics applied.
An advantage of this approach, though, is that it puts preprints more concretely into the research cycle. This would be a good strategy to promote for supporters of open science.
4. No journals
For those opposed to the role of publishers as arbiters and gatekeepers of research dissemination, the idea of getting rid of journals altogether is quite appealing. In this scenario, preprints are published and readers can make their own mind up whether they are any good. Preprints replace journal articles. With flexibility to update preprints at any time, research should become more self-correcting than it is currently. Most researchers already rely on search engines and related algorithms to find work for them, so tagging an article as belonging to a specific journal is obselete.
On the other hand, how does someone new to the field rate articles in this scenario? Some papers will get a lot of attention, whereas a great deal of incorrect, uninteresting research, will be left untouched, unread and wrong. At least with journals ever article goes through a checking process, even if it has some flaws.
To conclude, each of these scenarios has strengths and weaknesses and, as I said above, there is no one-size-fits-all solution. The main question to ask is whether they strengthen research output and lead to effective, creative work.
March 24, 2017 mrittman
I spent the first part of this week at the Open Science Conference in Berlin. As with many conference, it was a great opportunity to step back and take a different view of things away from the frenetic everyday tasks. I met a lot of interesting people and came away with more questions and ideas than answers. One of the things on my mind, of course, was preprints and how they fit into the current view of open science.
The overall impression I came away with was that open science is at a stage where no-one is quite sure what it is, but they think it’s a good idea. Indeed, the first question of the feedback questionnaire asked for a definition of open science. At the same time, with the assumption that open science and its constituent elements are a good thing, no-one is making a clear case for this and there was little that would persuade sceptics.
For better or worse, the organisors majored on a few aspects of open science: open data was a strong theme, open education resources had a lot of air time and I attended a session on open peer review. My favourite session was the one on alternative metrics and showed that the discussion in this area is moving on to the identification of more useful metrics.
So how did preprints fare? Actually, not so well. Preprints feature in the open science monitor from the EU commission (https://ec.europa.eu/research/openscience/index.cfm?pg=home§ion=monitor) and Niko Kriegskorte’s method of not reviewing unless a preprint exists was mentioned. There were also comments around the need to make research available as soon as possible and the usual gripes about slow publishers and citation metric addiction.
In conversations, also, there was general support for preprints but without a great deal of enthusiasm, and scepticism that new fields will take up preprinting on a regular basis just yet. I don’t have a good answer to why this is. Maybe preprints are simply not on the agenda. In Europe there are few advocates for preprints: just search for #preprint on Twitter and you will see that most of the activity takes place when America is online. Perhaps it is because most resources seem to be pointed at open data, which is a huge and important challenge. It could also be that it is seen as an area that is established in some disciplines and growth will take care of itself.
Preprints seem to offer a relatively cheap and simple way of furthering the cause of open science. Estimates of putting a preprint online are in the range of 10 Euros (or USD) per paper, compared to 5000 US dollars for journal publications. Preprints also offer:
- Rapid access to research, months or sometimes years in advance;
- Transparency, with open review possible in a context where reviewers simply advise the authors without the judgment of an editorial decision looming in the background;
- Papers that are free to read and often fully open access. This can help to circumvent paywalls and copy right transfer issues;
- Putting control of reporting research into the hands of the researchers producing the work.
How can preprints start to sell themselves better, even within the conversation about open science? First, we need to make the case for the benefits of preprints and lower the barriers for participation as low as possible. Second, there needs to be better links to more established and discussed areas of open science, especially integration with data publishing. Those operating preprint platforms need to look at which standards and technologies can be integrated to link with other parts of the open science agenda.
I think the previous paragraph applies equally to how open science needs to embed itself into current research practice. To make open science and preprints the norm, there needs to be a convincing case that some extra effort brings tangible benefits. I’m hoping, at least, that at the next edition of the conference there will be a session dedicated to preprints.
March 16, 2017 mrittman
With the rise of new preprint servers, and especially multiple offerings in the same discipline, some effort should be put into thinking about what it is that makes a preprint a preprint. This post is my take on the issue.
A preprint is about three aspects: content, availability and timing.
The preprints we are concerned with are additions to the research literature. To state the obvious, any work following the scientific method should qualify. The question then is how widely should the net be thrown to include other article types? It should be uncontroversial to include research articles, reviews and essays which form the backbone of output in science and the humanities. However, the literature includes a lot more: editorials, opinions, comments and so on. In addition, the concept of micropublication has been suggested, i.e. publishing a single part of a traditional paper, such as only the methods, results or discussion. With the current publishing paradigm, one could suggest including anything that could be published in a journal could be made a preprint, but thus is unsatisfactory as the role of journals might unexpectedly change and journals have individual policies. It also excludes work at a more preliminary stage. I think it is useful to split the literature into 1) research: hypothesis driven investigation and 2) grey literature: informed conversation about research (written by researchers). Both could be considered for preprints, but highlighting the difference via an assigned article type should be done in practice.
A preprint should be available to anyone. I would qualify this by saying that open access is desirable but not necessary: A basic definition of a preprint should permit a broad range of copyright and licensing criteria. Would it be acceptable to paywall a preprint? I would argue strongly against this option and it seems quite pointless, but don’t think it should discount something as being classed as a preprint.
Preprints are about reporting work at the earliest possible stage. “pre” in the name is because they come before validation by the research community.
The primary mode of validation currently is peer review and journal publication, but the definition shouldn’t be restrictive. New processes of confirming results could emerge in the future and should be connected to preprints. Grey literature is usually not peer reviewed, so editorial review and publication is sufficient to count as validation.
Should postprints or accepted versions of papers be mixed in with preprints? The difference between what is a preprint and postprint should be about when the first version is put online. In practice, it is acceptable to update a preprint with an new version, including a peer-reviewed one. The lack of a fixed end point is, to my mind, a strength of preprints and should be permitted within a working definition. As outlined below, there are situations where it is critical to know if something has been peer reviewed, so there should be a differentiation between the two.
In summary, I would define a preprint as a piece of research made publicly available before it has been validated by the research community.
Not to be confused with the above, and a topic for a future post is the question of what is a preprint server. Not all preprints appear on a preprint server and not everything that appears on a preprint server is necessarily a preprint.
Does it matter?
A number of preprint servers don’t publish preprints strictly according to this definition, for example by allowing publication of an abstract without full text, or permitting uploads of post-prints or accepted versions. I don’t think most scholars care a great deal about this, but it is important in some circumstances. For preprint aggregators, funders, journalists, medical practitioners and in research assessments it is much more important to know what has not been peer-reviewed and whether something is simply an opinion as opposed to reporting research outcomes. For this reason, the distinction between research and grey literature, and preliminary and reviewed work should be made clear.
In the spirit of preprints, I would be interested in feedback on the definition and how it can be improved. Please comment or get in touch by some other means.
March 9, 2017 mrittman
Last week I put a rough version of the list of preprint platforms live, responding to a request on Twitter from Jessica Polka. I’ve now filled in most of the gaps and put it into a Google sheet, which seems the best way to display the information at present. In the future I aim to use something more fancy that will span the page and can be filtered and sorted.
I hope it will be a useful resource to authors considering options for where to place their preprint and anyone interested in an overview of the state of preprints.
Putting the list together was an interesting exercise and revealing in several aspects. Here’s a few observations that I made.
Firstly, there are not that many preprint servers: the list runs to 19 at the moment. More than half of those listed (including OSF-based servers individually) started in the last year. When you compare it to the number to journals it is miniscule, even in disciplines where preprints have played a large role for many years. I intend to exclude institutional repositories, of which I suspect there will be a great many that post preprints. There are already lists of them elsewhere and authorship is limited to those affiliated with the institution.
A major lack in most preprint servers is long-term archiving. Excepting those based at CERN, I only found one with a statement about archiving on their website (CORE from MLA Commons). This should be a high priority for those operating preprint platforms, but there appear to be few clear solutions at present.
Also lacking is a business model that does not rely on backing by one or a handful of bodies. SSRN uses a model where institutions or readers pay for extra services. Authorea charges for use of their platform (although there is a free option). Funding from a larger organisation is fine, as long as institutions are willing to pay in the long term, but it relies to some extent on good will and some servers will likely look at alternative models in the coming years.
The background to preprint servers is varied, arising from libraries, publishers, societies, author services etc. Each puts an emphasis on different aspects and the rigour in submission checks, licensing information, information for authors, inclusion of non-preprint material and so on varies. In my experience, most authors don’t particularly pay attention to these aspects, but they may play a role in integrating preprints more formally into research evaluation. Funders and universities do care more about the details. A discussion on basic requirements for preprints and an interest in whether a consensus can be achieved was one of the motivating factors for my setting up Research Preprints.
Finally, the pervasiveness of the PDF is evident. The convenience for publication and human reading wins hands down. At the moment other formats don’t get a look in, which is fine in the short term. In the longer term, this could pose a significant challenge for text and data mining, especially when format is so varied.
February 23, 2017 mrittman
One of the points I labelled as critical for preprints in my previous post was that they should be quickly adopted by various disciplines. Putting an e-print on arXiv is normal in a number of discplines. On the other hand, while increasing numbers of preprints are being made available in other disciplines, for example biology, they remain a small fraction of the overall number of papers published in the field.
Advocates of preprints should aim for the numbers to expand quickly. If use of preprints is not rapidly normalized and they are ignored by the majority of researchers it will become ever harder to drive continued interest. What strategies could advocates of preprints use and what are the end goals? This post focuses mainly on the former. There are, of course, different options. Here’s a few strategies that I think are viable.
Option 1: Field-by-field stakeholder adoption
Resources can be focused on one field, be it biology, engineering, physical chemistry etc. Widespread adoption can be achieved via buy-in from a relatively small number of important stakeholders. On the other hand, resistance from just one of these groups could cause doubt and confusion and stall the process. The strategy here, which ASAPbio seems to have followed, is rapid take-up in a short space of time and to incorporate preprints into the research infrastructure through acceptance by funders, publishers and institutions. This strategy can be thought of as a top-down approach where the involvement of organizations is key to persuading researchers of the acceptability of preprints.
Option 2: Broad adoption
In this strategy, an increasing minority of researchers from multiple disciplines start posting preprints. This is really a bottom-up approach, with change driven by the habits of individual users. It is immediately less disruptive than option 1, but over time institutions will need to find a way to incorporate the needs of those making use of preprints, particularly when it comes to assessing impact and citations. There is a risk for those who use preprints if the pace of adaptation is to slow, however, as their efforts to preprint would not be recognised. They could end up with a significant chunk of their work being discounted.
Option 3: Bring out the big guns
A few highly influential individuals and/or institutions showcase the use of preprints our make use of them and demonstrate the benefits. Rather than a long-term strategy, this is a kick-starter to get others involved and get preprints on the agenda. It’s a big carrot for others to look at and follow their lead.
The reality is that a combination of options is likely to be followed. I’d be interested to hear from early adopters of preprints in physics as to what the main drivers were.
These strategies could plausibly apply for a number of new ideas, but what issues are particular to preprints?
First, preprints have a base to start from and examples to follow. They have been proven influential in several disciplines and new fields should learn from ArXiv, SSRN and others.
Second is the question of how disruptive preprints will be to established systems. Currently they work alongside the normal publishing process. There are future scenarios where preprints become so important that journals are less vital than today, or even irrelevant. Is that possible or desirable? Could preprints even enhance and add value to journal publications? Are there any unintended consequences of preprint/journal interactions?
There are many who see the current preprints boom as a positive step, but it is worth considering what comes after the first step and where the destination should be. I suspect there are differing views and I would very much welcome them in comments below.
February 16, 2017 mrittman
I started writing the first follow-up post to the introduction for Research Preprints a few days ago, before ASAPbio put out its call for running a central preprint service for the life sciences. My original intention was to list the things I think most urgently need to be addressed regarding preprints in the next five years. Here’s what I came up with, in no particular order:
- What is a preprint?
- Business models.
- Data and formats.
- Increase in the use of preprints and resolution of the citation dilemma.
Since the release of the ASAPbio call, I have become interested in how this list matches up with what they propose. First, a few words about the proposal. For those who are not familiar, ASAPbio advocates for wider use of preprints in the life sciences, and they have made an effort to engage many interested parties, including funders. One of their primary aims for some time has been to aggregate all life science preprints into one place online. In response to the recent announcement, there seemed to be confusion as to whether the proposed ‘Central Service’ should be a new preprint server or an aggregator. My reading is that the latter is the primary aim, but there is nothing to stop a successful proposal providng a mechanism to directly accept preprints from authors.
Going back to the list:
The first point is an existential question, and I was curious that the call from ASAPbio didn’t touch on it at all. Maybe my mathematics training from way back when makes me seek for a definition of everything. A definition of what constitutes a preprint seems rather important, especially when one wishes to aggregate preprints from various places: how to choose what to include and what to exclude. On the other hand, I’ve only seldom heard discussions of what does or should constitute a journal article. It seems that there’s a concensus that everyone sort of knows and that different publishers have different standards. That the same is true of preprints is perhaps no surprise, however the open access movement has often been criticised for having too many ill-defined definitions and it would be a shame for preprints to go the same way.
Moving on to businesss models, there is a contrast between probably the two most well-known preprint servers until now. arXiv is funded solely by supporting institutions as a charity. SSRN, on the other hand, ran for many years as a stand-alone enterprise, covering its costs by selling subscriptions, download fees, job advertisements, conference fees and so on. Last year SSRN was bought by Elsevier, a company that wouldn’t make an investment unless it was worthwhile, so there must be a commercial interest. Many baulk at the idea of preprints being in the hands of a profit-driven enterprise. At the same time, if there is potential revenue to be made from preprints, a sustainable, scalable model would be desirable. This is a point included as an aim in ASAPbio’s call: funding would be provided for five years, after which the service would be expected to find other revenue sources and become sustainable.
Data and formats feature surprisingly strongly in the ASAPbio call. XML conversion of preprints is a central requirement. This is not a technical formality and could put off a number of potentially interested parties, especially when considering the range of formats permitted by preprint servers. It calls for an innovative solution and some clever programming. Time will tell whether it can be delivered. It remains important, though. With the increasing volume of research output, researchers rely more on machine-based searches to deliver content to them and XML is clearly far superior to randomly formatted PDFs when it comes to text and data mining and for use as input to discovery tools.
Finally, how rapidly will preprints evolve? The citation dilemma I refer to above could become a key stumbling block for acceptance of preprints. Academic performance is measured by citations. If these are split between different versions of a preprint and a published version of the same paper, researcher could suffer, especially if preprints are not considered primary research material. In my view there needs to be a rapid expansion, acceptance and normalization of preprints. In other words, preprints need to find their niche and be recognized by funding bodies and promotion boards in addition to being used and recognized by a majority of researchers. Mathematics and physics, with long-time use of arXiv, probably have a contribution to make in this area. It remains to be seen whether other fields will benefit from their experience.
Gazing into a crystal ball is never an easy task, but if those involved with preprints can address the issues above there is a good chance that they will become a first stop for a great deal of research.