More on the end of history: what is a rational prediction?

It’s scrolled off the bottom of the page now, but there’s an amazing comment thread going on under my post on “The End of History Illusion,” the Science paper that got its feet caught in a subtle but critical statistical error.

Commenter Deinst has been especially good, digging into the paper’s dataset (kudos to the authors for making it public!) and finding further reasons to question its conclusions.  In this comment, he makes the following observation:  Quoidbach et al believe there’s a general trend to underestimate future changes in “favorites,” testing this by studying people’s predictions about their favorite movies, food, music, vacation, hobbies, and their best friends, averaging, and finding a slightly negative bias.  What Deinst noticed is that the negative bias is almost entirely driven by people’s unwillingness to predict that they might change their best friend.  On four of the six dimensions, respondents predicted more change than actually occurred.  That sounds much more like “people assign positive moral value to loyalty to friends” than “people have a tendency across domains to underestimate change.”

But here I want to complicate a bit what I wrote in the post.  Neither Quoidbach’s paper nor my post directly addresses the question:  what do we mean by a “rational prediction?”  Precisely:  if there is an outcome which, given the knowledge I have, is a random variable Y, what do I do when asked to “predict” the value of Y?  In my post I took the “rational” answer to be EY.  But this is not the only option.  You might think of a rational person as one who makes the prediction most likely to be correct, i.e. the modal value of Y.  Or you might, as Deinst suggests, think that rational people “run a simulation,” taking a random draw from Y and reporting that as the prediction.

Now suppose people do that last thing, exactly on the nose.  Say X is my level of extraversion now, Y is my level of extraversion in 10 years, and Z is my prediction for the value of Y.  In the model described in the first post, the value of Z depends only on the value of X; if X=a, it is E(Y|X=a).  But in the “run a simulation” model, the joint distribution of X and Z is exactly the same as the joint distribution of X and Y; in particular, E(|Z-X|) and E(|Y-X|) agree.

I hasten to emphasize that there’s no evidence Quoidbach et al. have this model of prediction in mind, but it would give some backing to the idea that, absent an “end of history bias,” you could imagine the absolute difference in their predictor condition matching the absolute difference in the reporter condition.

There’s some evidence that people actually do use small samples, or even just one sample, to predict variables with unknown distributions, and moreover that doing so can actually maximize utility, under some hypotheses on the cognitive cost of carrying out a more fully Bayesian estimate.

Does that mean I think Quoidbach’s inference is OK?  Nope — unfortunately, it stays wrong.

It seems very doubtful that we can count on people hewing exactly to the one-sample model.

Example:  suppose one in twenty people radically changes their level of extraversion in a 10-year interval.  What happens if you ask people to predict whether they themselves are going to experience such a change in the next 10 years?  Under the one-sample model, 5% of people would say “yes.”  Is this what would actually happen?  I don’t know.  Is it rational?  Certainly it fails to maximize the likelihood of being right.  In a population of fully rational Bayesians, everyone would recognize shifts like this as events with probabiity less than 50%, and everyone would say “no” to this question.  Quoidbach et al. would categorize this result as evidence for an “end of history illusion.”  I would not.

Now we’re going to hear from my inner Andrew Gelman.  (Don’t you have one?  They’re great!)  I think the real problem with Quoidbach et al’s analysis is that they think their job is to falsify the null hypothesis.  This makes sense in a classical situation like a randomized clinical trial.  Your null hypothesis is that the drug has no effect.  And your operationalization of the null hypothesis — the thing you literally measure — is that the probability distribution on “outcome for patients who get the drug” is the same as the one on “outcome for patients who don’t get the drug.”  That’s reasonable!  If the drug isn’t doing anything, and if we did our job randomizing, it seems pretty safe to assume those distributions are the same.

What’s the null hypothesis in the “end of history” paper?   It’s that people predict the extent of personality change in an unbiased way, neither underpredicting nor overpredicting it.

But the operationalization is that the absolute difference of predictions, |Z-X|, is drawn from the same distribution as the difference of actual outcomes, |Y-X|, or at least that these distributions have the same means.  As we’ve seen, even without any “end of history illusion”, there’s no good reason for this version of the null hypothesis to be true.  Indeed, we have pretty good reason to believe it’s not true.  A rejection of this null hypothesis tells us nothing about whether there’s an end of history illusion.  It’s not clear to me it tells you anything at all.

Gay marriage and the null hypothesis

Two controversial topics in one post!

Several of the key factual findings in Judge Walker’s opinion are in the form of predictions, not facts. For example, Judge Walker finds that “permitting same-sex couples to marry will not . . . otherwise affect the stability of opposite-sex marriages.” But real predictions have confidence levels. You might think you’re going to get an “A” on an exam next week, but that’s not a fact. It’s just a prediction, and there’s a hidden confidence level: Maybe there’s an 80% chance you’ll get that grade, or a 60% chance. Judge Walker’s prediction-facts have no confidence levels, however. He doesn’t say that there is an 87% chance that permitting same-sex marriage will not affect the stability of opposite-sex marriages. He says that it is now a fact — with 100% certainty — that that will happen.

I think Kerr is incorrect about Walker’s meaning.  When we say, for instance, that a clinical trial shows that a treatment “has no effect” on a disease, we are certainly not saying that, with 100% certainty, the treatment will not change a patient’s condition in any way.  How could we be?  We’re saying, instead, that the evidence before us gives us no compelling reason to rule out “the null hypothesis” that the drug has no effect.  Elliott Sober writes well about this in Evidence and Evolution.  It’s unsettling at first — the meat and potatoes of statistical analysis is deciding whether or not to rule out the null hypothesis, which as a literal assertion is certainly false!  It’s not the case that not a single opposite-sex marriage, potential or actual, will be affected by the legality of same-sex marriage; Walker is making the more modest claim that the evidence we have doesn’t provide us any ability to meaningfully predict the size of that effect, or whether it will on the whole be positive or negative.

This doesn’t speak to Kerr’s larger point, which is that Walker’s finding of fact might not be relevant to the case — California can outlaw whatever it wants without any evidence that the outlawed thing causes any harm, as long as it has a “rational basis” for doing so.  The key ruling here seems to be Justice Kennedy’s in Heller v. Doe, which says:

A State, moreover, has no obligation to produce evidence to sustain the rationality of a statutory classification. “[A] legislative choice is not subject to courtroom factfinding and may be based on rational speculation unsupported by evidence or empirical data.”

and later:

True, even the standard of rationality as we so often have defined it must find some footing in the realities of the subject addressed by the legislation.

I’m in the dark about what Kennedy can mean here.  If speculation is unsupported by evidence, in what sense is it rational?  And what “footing in the realities of the subject” can it be said to have?

More confusing still:   in the present case, the legislation at issue comes from a referendum, not the legislature.  So we have no record to tell us what kind of speculation, rational or not, lies behind it — or, for that matter, whether the law is intended to serve a legitimate government interest at all.  Maybe there is no choice under the circumstances but for the “rational basis” test to be no test at all, and for the courts to defer completely to referenda, however irrational they may seem to the judge?

(Good, long discussion of related points, esp. “to what extend should judges try to read voters’ minds,” at Crooked Timber.)