Your name
Your affiliation
Your email
Keep your name private?
Updates via email?
We will not share your contact information with any other entity. Your email address will only be used to verify your signature and to email you with updates related to the Peer Reviewers’ Openness Initiative, if requested.
Recent Signatories
David Moreau from University of Auckland
[anonymous] from TOU
Ege Kamber from Brock University
KANNAN JAKKAN from ANI PHARMACEUTICALS INC
Prosper Eguono Ovuoraye from Federal University of Petroleum Resources Effurun, Nigeria
Lukas Röseler from University of Münster, Münster Center for Open Science
João Da Silva Junior from Universidade Federal do Rio de Janeiro
Ramsha Shafqat from Thammasat University, Thailand
Jesse Niebaum from University of California Davis
[anonymous] from gzhu
I can’t stress how important it will be to have the proper tools for automatic archiving of data. There are server-side solutions, but very few client-side solutions. The bigger sites, Figshare, OSF, need better APIs and more development of client-side applications. Hopefully, this agenda can spur that development.
This is a very good initiative, however, here are some of my concerns for open access to raw data:
1. Considering plagiarism of papers is well known, how can we ensure against plagiarism of data?
2. What prevents a researcher with a contrary view from taking the raw data, manipulating it, reaching an opposing conclusion, and presenting it as his/her “independent replication”? Wouldn’t this be a desktop replication rather than a real independent replication?
3. This will open up a new industry of providing a complete study–with raw data–for a price without doing any study at all! As it is you can purchase term papers and theses online for a price!
4. Who will monitor the work for such plagiarism?
5. Who will be the arbiter for disputes arising for claims of data “stealing”?
These seem to portend a far more chaotic situation than what exists now.
6. Researchers interested in examining the raw data can have access to it by liaising with the initial researcher, so given the above concerns, what is the real benefit of open access to raw data?
7. How can we ensure that another researcher examining the data is as competent as the initial researcher to understand the nuances of the subject matter to enable adequate analysis of the data? Analysis of the statistics is just one part of making sense of the data.
In my view these are serious concerns that need to be addressed before plunging into this venture and making open access to raw data an industry norm.
Hi Sonali,
1. A person who is willing to steal data would be much better off inventing it wholesale. With data public, it is easily verifiable that it has been stolen. The work that it would take to download, understand, and analyse someone else’s data for the sake of fraud would simply not be worth it.
2. See (1).
3. Maybe, but the fact that an industry exists for fraudulent papers doesn’t stop us from writing them.
4. This initiative is not designed to deal with or handle fraud; it is designed to give incentive to honest people.
5. This would be a matter of scientific fraud, and currently mechanisms exist to deal with that.
6. My own personal experience, and plenty of research, says this is not true in general. As to the benefits, this is discussed in the paper linked on the first page.
7. The same way you can assess whether the original authors’ analysis makes sense: by looking at their analysis critically. If you have the data, anyone can assess the analysis.
Hi Richard
Thanks for your response.
My biggest concern in this is fraud — not everyone will hold the honest standards that we hope they will, especially when the data is available to a global community.
It will be easier to do “desktop” replications than real replications!
The author/funding agency should have some control over who or for what purpose the data is being used. Some way to keep track of what’s happening with the data.
Their capability of understanding the matter, i.e. do they “grock” the subject matter, to be able to use the data effectively is also important.
Pitching for cooperation and collaboration between researchers and across disciplines alongside this initiative would probably also be required.
Jose Duarte’s point, raised in another comment, on open access to the final paper is also extremely relevant.
I think I am stepping into this dialogue quite late, but may I ask what has prompted the development of such an agenda? I am sure many visiting this site will be interested in a brief background on this. A page on and links to preregistration sites will also be useful.
Additionally, a list of topics within the field that are in need for replication/review of data would also be of great assist, particularly for young researchers looking for ideas that could interest them.
Best
Sonali
Dear Sonali
A person who wants to commit fraud can do so now. If they use publicly available data, their fraud is more easy to detect – the tracks are out there. Transparency helps solve the problem of fraud.
If people want to use my hard-earned databases to further knowledge with less effort on their part – they are more than welcome. That is how knowledge grows. Each person makes it easier for the next.
If there are different ways of understanding the data and what it says, we need to know. We cannot take for granted that the first person to analyze some data grocks it better or has better analytic skills than the second. Nor is that relevant. What is relevant is only what analyses are best for drawing out what those data mean. The only way of getting to the bottom of that is having publicly available data so that different analyses can be debated.
best
Zoltan
One (imo) justified concern about open data upon publication is that experimental data can be expensive to gather, and a group may plan to produce multiple papers examining different aspects of the same dataset. If they were required to make all data available upon the first publication, this would be a major disincentive for some groups to comply.
I see the solution here as an embargo period, agreed upon at time of first paper. The authors archive their raw data with a third party, and the data are set to be released at the expiry of the embargo period. Both Data Dryad and Zenodo allow this. A reasonable period in most cases might be one year.
Is the Initiative open to this? If so, it could be made more explicit as an option for authors to comply.
The Initiative is set up so that authors can give any reason they like for *not* opening their data (see Initiative paper, page 6-7); so authors are free to say “We will not open the data immediately because of the substantial cost incurred in collecting it.” The key is that authors should explicitly make this justification in (or with) the paper, and hopefully provide a link where the data can be found when they *do* release it. My personal opinion is that I do not like embargoes in general, but the Initiative is compatible with them as long as the authors state that that’s what they’re doing, and why.
Tom Wallis raises a reasonable concern. Clearly, those collecting data deserve a first crack at publishing their analyses, and in many cases there may be several papers worth of analyses in the data set.
Therefore, I would propose a maximum of a 3 year embargo on releasing all the data after first publication of any results. That allows a three year head start (actually, longer, since they have all the time prior to first publication). If the data set is so rich that it can’t be adequately analyzed within those three years, it may be even more important to share it with others so more researchers can dig into it.
The option for up to a three year embargo on the data, however, does not include “Stimulus materials, experimental instructions and programs, survey questions, and other similar materials.” Those should be made available immediately upon first publication.
How a question is asked, and the order in which questions are asked, instructions, and participant selection methodology are all critical and relevant to interpreting the soundness of the methodology and the published conclusions.
The initiative is very good for researchers who work and live in first world countries but what about second and third world countries, especially in third world, who will prevents the data from the stealing and using in wrong way?? One example, the data could be used in PhD thesis and let the students to graduate without effort, since many thesis in that countries do not publish.
Nothing prevents a person willing to lie or hide information from doing so. Such a person could just as easily generate fake data.
By giving researchers free access to a wide variety of data sets, I would think this initiative would make it EASIER for third world graduate students to find good data for their dissertations and/or just for practice.
Remember, publishing data allows others to both replicate previous published findings (good, but not necessarily interesting if no new insights are gained, just confirmed) but also to test alternative hypotheses or hypotheses that were simply not examined or considered by the original researchers.
I welcome the initiative having witnessed flawed research being published for which the authors refused to release the full and final data after being questioned about potential errors in the analysis. To compound the issue the journal in which the work is published refused to publish a comment outlining the issues in the analysis in full knowledge the authors would not release the data. This initiative provides a baseline from which to open discussion about when and why data should be made available instead of simply ignoring it. It may also provide a platform from which to discuss self censorship by journals of inconvenient results. A small pond such as Australia creates a habitat for nepotism and protectionism for which open data may work to combat.