## More on the end of history: what is a rational prediction?

It’s scrolled off the bottom of the page now, but there’s an amazing comment thread going on under my post on “The End of History Illusion,” the Science paper that got its feet caught in a subtle but critical statistical error.

Commenter Deinst has been especially good, digging into the paper’s dataset (kudos to the authors for making it public!) and finding further reasons to question its conclusions.  In this comment, he makes the following observation:  Quoidbach et al believe there’s a general trend to underestimate future changes in “favorites,” testing this by studying people’s predictions about their favorite movies, food, music, vacation, hobbies, and their best friends, averaging, and finding a slightly negative bias.  What Deinst noticed is that the negative bias is almost entirely driven by people’s unwillingness to predict that they might change their best friend.  On four of the six dimensions, respondents predicted more change than actually occurred.  That sounds much more like “people assign positive moral value to loyalty to friends” than “people have a tendency across domains to underestimate change.”

But here I want to complicate a bit what I wrote in the post.  Neither Quoidbach’s paper nor my post directly addresses the question:  what do we mean by a “rational prediction?”  Precisely:  if there is an outcome which, given the knowledge I have, is a random variable Y, what do I do when asked to “predict” the value of Y?  In my post I took the “rational” answer to be EY.  But this is not the only option.  You might think of a rational person as one who makes the prediction most likely to be correct, i.e. the modal value of Y.  Or you might, as Deinst suggests, think that rational people “run a simulation,” taking a random draw from Y and reporting that as the prediction.

Now suppose people do that last thing, exactly on the nose.  Say X is my level of extraversion now, Y is my level of extraversion in 10 years, and Z is my prediction for the value of Y.  In the model described in the first post, the value of Z depends only on the value of X; if X=a, it is E(Y|X=a).  But in the “run a simulation” model, the joint distribution of X and Z is exactly the same as the joint distribution of X and Y; in particular, E(|Z-X|) and E(|Y-X|) agree.

I hasten to emphasize that there’s no evidence Quoidbach et al. have this model of prediction in mind, but it would give some backing to the idea that, absent an “end of history bias,” you could imagine the absolute difference in their predictor condition matching the absolute difference in the reporter condition.

There’s some evidence that people actually do use small samples, or even just one sample, to predict variables with unknown distributions, and moreover that doing so can actually maximize utility, under some hypotheses on the cognitive cost of carrying out a more fully Bayesian estimate.

Does that mean I think Quoidbach’s inference is OK?  Nope — unfortunately, it stays wrong.

It seems very doubtful that we can count on people hewing exactly to the one-sample model.

Example:  suppose one in twenty people radically changes their level of extraversion in a 10-year interval.  What happens if you ask people to predict whether they themselves are going to experience such a change in the next 10 years?  Under the one-sample model, 5% of people would say “yes.”  Is this what would actually happen?  I don’t know.  Is it rational?  Certainly it fails to maximize the likelihood of being right.  In a population of fully rational Bayesians, everyone would recognize shifts like this as events with probabiity less than 50%, and everyone would say “no” to this question.  Quoidbach et al. would categorize this result as evidence for an “end of history illusion.”  I would not.

Now we’re going to hear from my inner Andrew Gelman.  (Don’t you have one?  They’re great!)  I think the real problem with Quoidbach et al’s analysis is that they think their job is to falsify the null hypothesis.  This makes sense in a classical situation like a randomized clinical trial.  Your null hypothesis is that the drug has no effect.  And your operationalization of the null hypothesis — the thing you literally measure — is that the probability distribution on “outcome for patients who get the drug” is the same as the one on “outcome for patients who don’t get the drug.”  That’s reasonable!  If the drug isn’t doing anything, and if we did our job randomizing, it seems pretty safe to assume those distributions are the same.

What’s the null hypothesis in the “end of history” paper?   It’s that people predict the extent of personality change in an unbiased way, neither underpredicting nor overpredicting it.

But the operationalization is that the absolute difference of predictions, |Z-X|, is drawn from the same distribution as the difference of actual outcomes, |Y-X|, or at least that these distributions have the same means.  As we’ve seen, even without any “end of history illusion”, there’s no good reason for this version of the null hypothesis to be true.  Indeed, we have pretty good reason to believe it’s not true.  A rejection of this null hypothesis tells us nothing about whether there’s an end of history illusion.  It’s not clear to me it tells you anything at all.

## In defense of Nate Silver and experts

Cathy goes off on Nate Silver today, calling naive his account of well-meaning people saying false things because they’ve made math mistakes.  In Cathy’s view, people say false things because they’re not well-meaning and are trying to screw you — or, sometimes, because they’re well-meaning but their incentives are pointed at something other than accuracy.  Read the whole thing, it’s more complicated than this paraphrase suggests.

Cathy, a fan of and participant in mass movements, takes special exception to Silver saying:

This is neither the time nor the place for mass movements — this is the time for expert opinion. Once the experts (and I’m not one of them) have reached some kind of a consensus about what the best course of action is (and they haven’t yet), then figure out who is impeding that action for political or other disingenuous reasons and tackle them — do whatever you can to remove them from the playing field. But we’re not at that stage yet.

Cathy’s take:

…I have less faith in the experts than Nate Silver: I don’t want to trust the very people who got us into this mess, while benefitting from it, to also be in charge of cleaning it up. And, being part of the Occupy movement, I obviously think that this is the time for mass movements.

From my experience working first in finance at the hedge fund D.E. Shaw during the credit crisis and afterwards at the risk firm Riskmetrics, and my subsequent experience working in the internet advertising space (a wild west of unregulated personal information warehousing and sales) my conclusion is simple: Distrust the experts.

I think Cathy’s distrust is warranted, but I think Silver shares it.  The central concern of his chapter on weather prediction is the vast difference in accuracy between federal hurricane forecasters, whose only job is to get the hurricane track right, and TV meteorologists, whose very different incentive structure leads them to get the weather wrong on purpose.  He’s just as hard on political pundits and their terrible, terrible predictions, which are designed to be interesting, not correct.

Cathy wishes Silver would put more weight on this stuff, and she may be right, but it’s not fair to paint him as a naif who doesn’t know there’s more to life than math.  (For my full take on Silver’s book, see my review in the Globe.)

As for experts:  I think in many or even most cases deferring to people with extensive domain knowledge is a pretty good default.  Maybe this comes from seeing so many preprints by mathematicians, physicists, and economists flushed with confidence that they can do biology, sociology, and literary study (!) better than the biologists, sociologists, or scholars of literature.  Domain knowledge matters.  Marilyn vos Savant’s opinion about Wiles’s proof of Fermat doesn’t matter.

But what do you do with cases like finance, where the only people with deep domain knowledge are the ones whose incentive structure is socially suboptimal?  (Cathy would use saltier language here.)  I guess you have to count on mavericks like Cathy, who’ve developed the domain knowledge by working in the financial industry, but who are now separated from the incentives that bind the insiders.

But why do I trust what Cathy says about finance?

Because she’s an expert.

Is Cathy OK with this?

## If you can’t say anything intelligent about probability, don’t say anything at all

This, from Politico’s Dylan Byers, is infuriating:

Prediction is the name of Silver’s game, the basis for his celebrity. So should Mitt Romney win on Nov. 6, it’s difficult to see how people can continue to put faith in the predictions of someone who has never given that candidate anything higher than a 41 percent chance of winning (way back on June 2) and — one week from the election — gives him a one-in-four chance, even as the polls have him almost neck-and-neck with the incumbent.

Why?  Why is it difficult to see that?  Does Dylan Byers not know the difference between saying something is unlikely to happen and declaring that it will not happen?

Silver cautions against confusing prediction with prophecy. “If the Giants lead the Redskins 24-21 in the fourth quarter, it’s a close game that either team could win. But it’s also not a “toss-up”: The Giants are favored. It’s the same principle here: Obama is ahead in the polling averages in states like Ohio that would suffice for him to win the Electoral College. Hence, he’s the favorite,” Silver said.

For all the confidence Silver puts in his predictions, he often gives the impression of hedging. Which, given all the variables involved in a presidential election, isn’t surprising. For this reason and others — and this may shock the coffee-drinking NPR types of Seattle, San Francisco and Madison, Wis. — more than a few political pundits and reporters, including some of his own colleagues, believe Silver is highly overrated.

Hey!  That’s me!  I live in Madison, Wisconsin!  I drink coffee!  Wait, why was that relevant again?

To sum up:  Byers thinks Nate Silver is overrated because he “hedges” — which is to say, he gives an accurate assessment of what’s going on instead of an inaccurate one.

This makes me want to stab my hand with a fork.

I’m happy that Ezra Klein at the Post decided to devote a big chunk of words to explaining just how wrong this viewpoint is, so I don’t have to.  You know what, though, I’ll bet Ezra Klein drinks coffee.

Tagged , ,

## Nate Silver is the Kurt Cobain of statistics

Or so I argue in today’s Boston Globe, where I review Silver’s excellent new book.  I considered trying to wedge a “The Signal and The Noise” / “The Colour and the Shape” joke in there too, but it was too labored.

Concluding graf:

Prediction is a fundamentally human activity. Just as a novel is no less an expression of human feeling for being composed on a laptop, the forecasts Silver studies — at least the good ones — are expressions of human thought and belief, no matter how many theorems and algorithms forecasters bring to their aid. The math serves as a check on our human biases, and our insight serves as a check on the computer’s bugs and blind spots. In Silver’s world, math can’t replace or supersede us. Quite the contrary: It is math that allows us to become our wiser selves.

Tagged , , ,