Hal Pashler, a psychologist at UCSD, tweeted my post about the “end of history” study, and this led me to his interesting paper, “Is the Replicability Crisis Overblown?” (with Christine Harris.) Like all papers whose title is a rhetorical question, it comes down in favor of “no.”
Among other things, Pashler and Harris are concerned about the widespread practice of “conceptual replication,” in which rather than reproduce an existing experiment you try to find a similar effect in an adjacent domain. What happens when you don’t find anything?
Rarely, it seems to us, would the investigators themselves believe they have learned much of anything. We conjecture that the typical response of an investigator in this (not uncommon) situation is to think something like “I should have tried an experiment closer to the original procedure—my mistake.” Whereas the investigator may conclude that the underlying effect is not as robust or generalizable as had been hoped, he or she is not likely to question the veracity of the original report. As with direct replication failures, the likelihood of being able to publish a conceptual replication failure in a journal is very low. But here, the failure will likely generate no gossip—there is nothing interesting enough to talk about here. The upshot, then, is that a great many failures of conceptual replication attempts can take place without triggering any general skepticism of the phenomenon at issue.
The solutions are not very sexy but are pretty clear — create publication venues for negative results and direct replication, and give researchers real credit for them. Gary Marcus has a good roundup in his New Yorker blog of other structural changes that might lower the error rate of lab science. Marcus concludes:
In the long run, science is self-correcting. Ptolemy’s epicycles were replaced by Copernicus’s heliocentric system. The theory that stomach ulcers were caused by spicy foods has been replaced by the discovery that many ulcers are caused by a bacterium. A dogma that primates never grew new neurons held sway for forty years, based on relatively little evidence, but was finally chucked recently when new scientists addressed older questions with better methods that had newly become available.
but Pashler and Harris are not so sure:
Is there evidence that this sort of slow correction process is actually happening? Using Google Scholar we searched <“failure to replicate”, psychology> and checked the first 40 articles among the search returns that reported a nonreplication. The median time between the original target article and the replication attempt was 4 years, with only 10% of the replication attempts occurring at lags longer than 10 years (n = 4). This suggests that when replication efforts are made (which, as already discussed, happens infrequently), they generally target very recent research. We see no sign that long-lag corrections are taking place.
It cannot be doubted that there are plenty of published results in the mathematical literature that are wrong. But the ones that go uncorrected are the ones that no one cares about.
It could be that the self-correction process is most intense, and thus most effective, in areas of science which are most interesting, and most important, and have the highest stakes, even as errors are allowed to persist elsewhere. That’s the optimistic view, at any rate.