It’s scrolled off the bottom of the page now, but there’s an amazing comment thread going on under my post on “The End of History Illusion,” the Science paper that got its feet caught in a subtle but critical statistical error.

Commenter Deinst has been especially good, digging into the paper’s dataset (kudos to the authors for making it public!) and finding further reasons to question its conclusions. In this comment, he makes the following observation: Quoidbach et al believe there’s a general trend to underestimate future changes in “favorites,” testing this by studying people’s predictions about their favorite movies, food, music, vacation, hobbies, and their best friends, averaging, and finding a slightly negative bias. What Deinst noticed is that the negative bias is almost entirely driven by people’s unwillingness to predict that they might change their best friend. On four of the six dimensions, respondents predicted *more* change than actually occurred. That sounds much more like “people assign positive moral value to loyalty to friends” than “people have a tendency across domains to underestimate change.”

But here I want to complicate a bit what I wrote in the post. Neither Quoidbach’s paper nor my post directly addresses the question: what do we mean by a “rational prediction?” Precisely: if there is an outcome which, given the knowledge I have, is a random variable Y, what do I do when asked to “predict” the value of Y? In my post I took the “rational” answer to be EY. But this is not the only option. You might think of a rational person as one who makes the prediction most likely to be correct, i.e. the modal value of Y. Or you might, as Deinst suggests, think that rational people “run a simulation,” taking a random draw from Y and reporting *that* as the prediction.

Now suppose people do that last thing, exactly on the nose. Say X is my level of extraversion now, Y is my level of extraversion in 10 years, and Z is my prediction for the value of Y. In the model described in the first post, the value of Z depends only on the value of X; if X=a, it is E(Y|X=a). But in the “run a simulation” model, the joint distribution of X and Z is exactly the same as the joint distribution of X and Y; in particular, E(|Z-X|) and E(|Y-X|) agree.

I hasten to emphasize that there’s no evidence Quoidbach et al. have this model of prediction in mind, but it would give some backing to the idea that, absent an “end of history bias,” you could imagine the absolute difference in their predictor condition matching the absolute difference in the reporter condition.

There’s some evidence that people actually do use small samples, or even just one sample, to predict variables with unknown distributions, and moreover that doing so can actually maximize utility, under some hypotheses on the cognitive cost of carrying out a more fully Bayesian estimate.

Does that mean I think Quoidbach’s inference is OK? Nope — unfortunately, it stays wrong.

It seems very doubtful that we can count on people hewing exactly to the one-sample model.

Example: suppose one in twenty people radically changes their level of extraversion in a 10-year interval. What happens if you ask people to predict whether they themselves are going to experience such a change in the next 10 years? Under the one-sample model, 5% of people would say “yes.” Is this what would actually happen? I don’t know. Is it rational? Certainly it fails to maximize the likelihood of being right. In a population of fully rational Bayesians, everyone would recognize shifts like this as events with probabiity less than 50%, and everyone would say “no” to this question. Quoidbach et al. would categorize this result as evidence for an “end of history illusion.” I would not.

Now we’re going to hear from my inner Andrew Gelman. (Don’t you have one? They’re great!) I think the real problem with Quoidbach et al’s analysis is that they think their job is to falsify the null hypothesis. This makes sense in a classical situation like a randomized clinical trial. Your null hypothesis is that the drug has no effect. And your operationalization of the null hypothesis — the thing you literally measure — is that the probability distribution on “outcome for patients who get the drug” is the same as the one on “outcome for patients who don’t get the drug.” That’s reasonable! If the drug isn’t doing anything, and if we did our job randomizing, it seems pretty safe to assume those distributions are the same.

What’s the null hypothesis in the “end of history” paper? It’s that people predict the extent of personality change in an unbiased way, neither underpredicting nor overpredicting it.

But the operationalization is that the absolute difference of predictions, |Z-X|, is drawn from the same distribution as the difference of actual outcomes, |Y-X|, or at least that these distributions have the same means. As we’ve seen, even without any “end of history illusion”, there’s no good reason for this version of the null hypothesis to be true. Indeed, we have pretty good reason to believe it’s *not* true. A rejection of this null hypothesis tells us nothing about whether there’s an end of history illusion. It’s not clear to me it tells you anything at all.