Matthew Hankins and others on Twitter are making fun of scientists who twist themselves up lexically in order to report results that fail the significance test, using phrases like “approached but did not quite achieve significance” and “only just insignificant” and “tantalisingly close to significance.”
But I think this fun-making is somewhat misplaced! We should instead be jeering at the conventional dichotomy that a result significant at p < .05 is “a real effect” and one that scores at p = .06 is “no effect.”
The lexically twisted scientists are on the side of the angels here, insisting that a statistically insignificant finding is usually much better described as “not enough evidence” than “no evidence,” and should be mentioned, in whatever language the journal allows, not mulched.
Completely agree – not a fan of the ‘all or nothing’ interpretation at all and would much rather the authors let the data speak for itself (themselves?). But I don’t think you can have it both ways and call it unambiguously ‘significant’ when p0.05.
It’s weird to pretend that a smaller number isn’t more convincing, even if it doesn’t meet a standard that we chose in the first place because we have a base 10 number system. (Not that it isn’t important to choose in advance what your cutoff for significance is.) Imagine all the studies that would be considered significant if we used base 8 and .04 (corresponding to .0625 in decimal, if I’m not mistaken) was our p-value of choice.
Evelyn: Note that for a normal random variable p=0.05 is very close to two standard deviations. The real standard is “two sigma” which has nothing to do with base 10.
Personally, I don’t believe nineteen of every twenty effects reported to be significant at p=0.05 indeed turn out to be significant in larger studies.
More to the point, what the authors should report (in addition to the p-value) is how many roughly independent quantities were examined in the data. For example, the probability that five uncorrelated Gaussian variates all lie within two standard deviations of the mean is only 80% (so in an experiment extracting 5 different indicators from the data theres is a 20% of “discovery” even if there is no effect at all).
Rereading my reply, I’d like to apologize for my tone. I’m sorry. Wish these were editable …
No offense taken. :)
I did forget about the 2 sigma thing, and it’s definitely true that many observed correlations go away due to publication bias and/or the look-elsewhere effect. But the point that the p-value isn’t a binary on-off switch is a good one, I think.
Matthew Hankins makes the important point that “You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval”.
This is an excellent excuse to link to Jacob Cohen’s highly entertaining 1994 article The Earth is Round (p < .05) in American Psychologist. Cohen takes apart “the ritual of null hypothesis significance testing”, also discussed here.
http://xkcd.com/882
Just had a nice blog to blog conversation last week with jack west, the author of a patient information website, cancergrace.org about this topic. The scary part for me is in medical research, where people take the 0.05 threshold as an excuse to not bother reading/thinking about the trial. How many drugs are we NOT using because their trials reported p=0.06 that are useful drugs?! Freaks me out a bit.
He specifically asked me to analyze a recent trial, which reported p=0.06, which I did here
http://cancerconnector.blogspot.com/2013/05/memantine-and-wbrt.html
What’s funny is the responses I got from people aghast that I could recommend a drug that ‘didn’t even reach significance’. Argh!