## More on the end of history: what is a rational prediction?

It’s scrolled off the bottom of the page now, but there’s an amazing comment thread going on under my post on “The End of History Illusion,” the Science paper that got its feet caught in a subtle but critical statistical error.

Commenter Deinst has been especially good, digging into the paper’s dataset (kudos to the authors for making it public!) and finding further reasons to question its conclusions.  In this comment, he makes the following observation:  Quoidbach et al believe there’s a general trend to underestimate future changes in “favorites,” testing this by studying people’s predictions about their favorite movies, food, music, vacation, hobbies, and their best friends, averaging, and finding a slightly negative bias.  What Deinst noticed is that the negative bias is almost entirely driven by people’s unwillingness to predict that they might change their best friend.  On four of the six dimensions, respondents predicted more change than actually occurred.  That sounds much more like “people assign positive moral value to loyalty to friends” than “people have a tendency across domains to underestimate change.”

But here I want to complicate a bit what I wrote in the post.  Neither Quoidbach’s paper nor my post directly addresses the question:  what do we mean by a “rational prediction?”  Precisely:  if there is an outcome which, given the knowledge I have, is a random variable Y, what do I do when asked to “predict” the value of Y?  In my post I took the “rational” answer to be EY.  But this is not the only option.  You might think of a rational person as one who makes the prediction most likely to be correct, i.e. the modal value of Y.  Or you might, as Deinst suggests, think that rational people “run a simulation,” taking a random draw from Y and reporting that as the prediction.

Now suppose people do that last thing, exactly on the nose.  Say X is my level of extraversion now, Y is my level of extraversion in 10 years, and Z is my prediction for the value of Y.  In the model described in the first post, the value of Z depends only on the value of X; if X=a, it is E(Y|X=a).  But in the “run a simulation” model, the joint distribution of X and Z is exactly the same as the joint distribution of X and Y; in particular, E(|Z-X|) and E(|Y-X|) agree.

I hasten to emphasize that there’s no evidence Quoidbach et al. have this model of prediction in mind, but it would give some backing to the idea that, absent an “end of history bias,” you could imagine the absolute difference in their predictor condition matching the absolute difference in the reporter condition.

There’s some evidence that people actually do use small samples, or even just one sample, to predict variables with unknown distributions, and moreover that doing so can actually maximize utility, under some hypotheses on the cognitive cost of carrying out a more fully Bayesian estimate.

Does that mean I think Quoidbach’s inference is OK?  Nope — unfortunately, it stays wrong.

It seems very doubtful that we can count on people hewing exactly to the one-sample model.

Example:  suppose one in twenty people radically changes their level of extraversion in a 10-year interval.  What happens if you ask people to predict whether they themselves are going to experience such a change in the next 10 years?  Under the one-sample model, 5% of people would say “yes.”  Is this what would actually happen?  I don’t know.  Is it rational?  Certainly it fails to maximize the likelihood of being right.  In a population of fully rational Bayesians, everyone would recognize shifts like this as events with probabiity less than 50%, and everyone would say “no” to this question.  Quoidbach et al. would categorize this result as evidence for an “end of history illusion.”  I would not.

Now we’re going to hear from my inner Andrew Gelman.  (Don’t you have one?  They’re great!)  I think the real problem with Quoidbach et al’s analysis is that they think their job is to falsify the null hypothesis.  This makes sense in a classical situation like a randomized clinical trial.  Your null hypothesis is that the drug has no effect.  And your operationalization of the null hypothesis — the thing you literally measure — is that the probability distribution on “outcome for patients who get the drug” is the same as the one on “outcome for patients who don’t get the drug.”  That’s reasonable!  If the drug isn’t doing anything, and if we did our job randomizing, it seems pretty safe to assume those distributions are the same.

What’s the null hypothesis in the “end of history” paper?   It’s that people predict the extent of personality change in an unbiased way, neither underpredicting nor overpredicting it.

But the operationalization is that the absolute difference of predictions, |Z-X|, is drawn from the same distribution as the difference of actual outcomes, |Y-X|, or at least that these distributions have the same means.  As we’ve seen, even without any “end of history illusion”, there’s no good reason for this version of the null hypothesis to be true.  Indeed, we have pretty good reason to believe it’s not true.  A rejection of this null hypothesis tells us nothing about whether there’s an end of history illusion.  It’s not clear to me it tells you anything at all.

## Do we really underestimate how much we’ll change? (or: absolute value is not linear!)

Let’s say I present you with a portfolio of five stocks,  and ask you to predict each stock’s price one year from now.  You know the current prices, and you know stocks are pretty volatile, but absent any special reason to think five companies are more likely to have good years than bad ones, you write down the current price as your best prediction for all five slots.

Then I write a paper accusing you of suffering from an “end of financial history illusion.”  After all, on average you predicted that the stock values won’t change at all over six months — but in reality, stock prices change a lot!  If I compute how much each of the five stock prices changed over the last six months, and average those numbers, I get something pretty big.  And yet you, you crazy thing, seem to believe that the stock prices, having arrived at their current values, are to be fixed in place forever more.

And yet the same computation, applied to five personality traits instead of five stocks, got published in Science.  Quoidbach, Gilbert, and Wilson write:

In study 1, we sought to determine whether people underestimate the extent to which their personalities will change in the future. We recruited a sample of 7519 adults ranging in age from 18 to 68 years [mean (M) = 40 years, standard deviation (SD) = 11.3 years, 80% women] through the Web site of a popular television show and asked them to complete the Ten Item Personality Inventory (1), which is a standard measure of the five trait dimensions that underlie human personality (i.e., conscientiousness, agreeableness, emotional stability, openness to experience, and extraversion). Participants were then randomly assigned either to the reporter condition (and were asked to complete the measure as they would have completed it 10 years earlier) or the predictor condition (and were asked to complete the measure as they thought they would complete it 10 years hence). We then computed the absolute value of the difference between participants’ ratings of their current personality and their reported or predicted personality and averaged these across the five traits to create a measure of reported or predicted change in personality.

This study is getting a lot of press:  it was written up in the New York Times (why, oh why, is it always John Tierney?), USA Today, and Time, and even made it to Mathbabe.

Unfortunately, it’s wrong.

The difference in predictions is not the predicted difference

The error here is just the same as in the story of the stocks.  The two quantities

• The difference between the predicted future value and the current value
• The predicted difference between the future value and the current value

sound like the same thing.  But they’re not the same thing.  Life’s noncommutative that way sometimes. Quoidbach et al are measuring the former quantity and referring to it as if it’s the latter.

You can see the difference even in a very simple model.  Let’s say the ways a stock works is that, over six months, there’s a 30% chance it goes up a dollar, a 25% chance it goes down a dollar, and a 45% chance it stays the same.  And let’s say you know this.  Then your estimated expected value of the stock price six months from now is “price now + 5 cents,” and the first number — the size of difference between your predicted value and the current value is 5 cents.

But what’s the second number?  In your model, the difference between the future price and the current price has a 55% chance of being a dollar and a 45% chance of being zero.  So your prediction for the size of the difference is 55 cents — 11 times as much!

If you measure the first quantity and say you’ve measured the second, you’re gonna have a bad time.

In the “predictor” condition of the paper, a rational respondent quizzed about a bunch of stocks will get a score of about 5 cents.  What about the “reporter” condition?  Then the respondent’s score will be the average value of the difference between the price six months ago and the price now; this difference will be a dollar 55% of the time and zero 45% of the time, so the scores in the reporter condition will average 55 cents.

To sum up:  completely rational respondents with full information ought to display the behavior observed by Quoidbach et al — precisely the behavior the authors adduce as evidence that their subjects are in the grips of a cognitive bias!

To get mathy with it for a minute — if Y is the value of a variable at some future time, and X is the value now, the two quantities are

• |E(Y-X)|
• E(|Y-X|)

Those numbers would be the same if absolute value were a linear function.  But absolute value isn’t a linear function.  Unless, that is, you know a priori that Y -X was positive.  In other words, if people knew for certain that over a decade they’d get less extraverted, but didn’t know to what extent, you might expect to see the same scores appearing in the predictor and reporter conditions.  But this is not, in fact, something people know about themselves.

I always think I’m right but I don’t think I’m always right

The study I’ve mentioned isn’t the only one in the paper.  Here’s another:

[In study 3]…we recruited a new sample of 7130 adults ranging from 18 to 68 years old (M = 40.2 years, SD = 11.1 years, 80% women) through the same Web site and asked them to report their favorite type of music, their favorite type of vacation, their favorite type of food, their favorite hobby, and the name of their best friend. Participants were then randomly assigned either to the reporter condition (and were asked to report whether each of their current preferences was the same as or different than it was 10 years ago) or the predictor condition (and were asked to predict whether each of their current preferences would be the same or different 10 years from now). We then counted the number of items on which participants responded “different” and used this as a measure of reported or predicted changes in preference.

Let’s say I tend to change my favorite music (respectively vacation, food, hobby, and friend) about once every 25 years, so that there’s about a 40% chance that in a given ten-year period I’ll make a change.  And let’s say I know this about myself, and I’m free from cognitive biases.  If you ask me to predict whether I’ll have the same or different favorite food in ten years, I’ll say “same” — after all, there’s a 60-40 chance that’s correct!  Ditto for the other four categories.

Once again, Quoidbach et al refer to the number of times I answer “different” as “a measure of predicted changes in preference.”  But it isn’t — or rather, it has nothing to say about the predicted number of changes.  If you ask me “How many of the five categories do you think I’ll change in the next ten years?” I’ll say “two.”  While if you ask me, for each of the five categories in turn, “Do you think you’ll change this in the next ten years?” I’ll say no, five times straight.  This is not a contradiction and it is not a failure of rationality and it is not a cognitive bias.  It is math, done correctly.

(Relevant philosophical maxim about groundedness of belief:  ”I always think I’m right, but I don’t think I’m always right.”  We correctly recognize that some subset of things we currently believe are wrong, but each particular belief we take as correct.  Update:  NDE in comments reminds me that WVO Quine is the source of the maxim.)

What kind of behavior would the authors consider rational in this case?  Presumably, one in which the proportion of “different” answers is the same in the prospective and retrospective conditions.  In other words, I’d score as bias-free if I answered

“My best friend and my favorite music will change, but my favorite food, vacation, and hobby will stay the same.”

This answer has a substantially smaller chance of being correct than my original one.  (108/3125 against 243/3125, if you’re keeping score at home.)  The author’s suggestion that it represents a less biased response is wrong.

Now you may ask:  why didn’t Quoidbach et al just directly ask people “to what extent do you expect your personality to change over the next ten years?” and compare that with retrospective report?  To their credit, they did just that — and there they did indeed find that people predicted smaller changes than they reported:

Third, is it possible that predictors in study 1 knew that they would change over the next 10 years, but because they did not know exactly how they would change, they did not feel confident predicting specific changes? To investigate this possibility, we replicated study 1 with an independent sample of 1163 adults (M = 38.4 years, SD = 12.1 years, 78% women) recruited through the same Web site. Instead of being asked to report or predict their specific personality traits, these participants were simply asked to report how much they felt they had “changed as a person over the last 10 years” and how much they thought they would “change as a person over the next 10 years.” Because some participants contributed data to both conditions, we performed a multilevel version of the analysis described in study 1. The analysis revealed the expected effect of condition (β = –0.74, P = 0.007), indicating that predictors aged a years predicted that they would change less over the next decade than reporters aged a + 10 years reported having changed over the same decade. This finding suggests that a lack of specific knowledge about how one might change in the future was not the cause of the effects seen in study 1.

This study, unlike the others, addresses the question the paper proposes to consider.  To me, it seems questionable that numerical answers to “how much will you change as a person in the next 10 years?” are directly comparable with numerical answers to “how much did you change as a person over the last 10 years?” but this is a question about empirical social science, not mathematics.  Even if I were a social scientist, I couldn’t really judge this part of the study, because the paragraph I just quoted is all we see of it — how the questions were worded, how they were scored, what the rest of the coefficients in the regression were, etc, are not available, either in the main body of the paper or the supplementary material.

[Update:  Commenter deinst makes the really important point that Quoidbach et al have made their data publicly available at the ICPSR repository, and that things like the exact wording of the questions, the scoring mechanism, are available there.]

Do we actually underestimate the extent to which we’ll change our personalities and preferences over time? It certainly seems plausible:  indeed, other researchers have observed similar effects, and the “changed as a person” study in the present paper is suggestive in this respect.

But much of the paper doesn’t actually address that question.   Let me be clear:  I don’t think the authors are trying to put one over.  This is a mistake — a somewhat subtle mistake, but a bad mistake, and one which kills a big chunk of the paper. Science should not have accepted the article in its current form, and the authors should withdraw it, revise it, and resubmit it.

Yes, I know this isn’t actually going to happen.