Many journals now have open data policies but they are sparingly enforced. So many scientists do not submit data. The question is: what drives them not to submit? Is it laziness? Is it a desire to keep the data to themselves? Or is it something more sinister? After all, the open data rules were, in part, to allow for replication experiments to ensure that the reported results were accurate.
Robert Trivers reports on an interesting study by Wicherts, Bakker, and Mlenar that correlates disclosure of data with the statistical strength of results in psychological journals.
Here is where they got a dramatic result. They limited their research to two of the four journals whose scientists were slightly more likely to share data and most of whose studies were similar in having an experimental design. This gave them 49 papers. Again, the majority failed to share any data, instead behaving as a parody of academics. Of those asked, 27 percent failed to respond to the request (or two follow-up reminders)—first, and best, line of self-defense, complete silence—25 percent promised to share data but had not done so after six years and 6 percent claimed the data were lost or there was no time to write a codebook. In short, 67 percent of (alleged) scientists avoided the first requirement of science—everything explicit and available for inspection by others.
Was there any bias in all this non-compliance? Of course there was. People whose results were closer to the fatal cut-off point of p=0.05 were less likely to share their data. Hand in hand, they were more likely to commit elementary statistical errors in their own favor. For example, for all seven papers where the correctly computed statistics rendered the findings non-significant (10 errors in all) none of the authors shared the data. This is consistent with earlier data showing that it took considerably longer for authors to respond to queries when the inconsistency in their reported results affected the significance of the results (where responses without data sharing!). Of a total of 1148 statistical tests in the 49 papers, 4 percent were incorrect based only on the scientists’ summary statistics and a full 96 percent of these mistakes were in the scientists’ favor. Authors would say that their results deserved a ‘one-tailed test’ (easier to achieve) but they had already set up a one-tailed test, so as they halved it, they created a ‘one-half tailed test’. Or they ran a one-tailed test without mentioning this even though a two-tailed test was the appropriate one. And so on. Separate work shows that only one-third of psychologists claim to have archived their data—the rest make reanalysis impossible almost at the outset! (I have 44 years of ‘archived’ lizard data—be my guest.) It is likely that similar practices are entwined with the widespread reluctance to share data in other “sciences” from sociology to medicine. Of course this statistical malfeasance is presumably only the tip of the iceberg, since in the undisclosed data and analysis one expects even more errors.
It’s correlation but it is troubling. The issue is that authors present results selectively and sadly this is not picked up in peer review processes. Of course, it goes without saying that even with open data, it takes effort to replicate and then publish alternative results and conclusions.