Separated at birth?
“…she had to stay with him at nursery school every morning for four months, or else he went into a violent frenzy of tears and tantrums. In first grade, he often vomited in the morning when he had to leave her. His violence on the playground approached danger to himself and others. When a neighbor took away from him a baseball bat with which he was about to hit a child on the head, his mother objected violently to the “frustration” of her child. She found it extremely difficult to discipline him herself…”
“…In a Westchester community whose school system is world famous, it was recently discovered that graduates with excellent high-school records did very poorly in college and did not make much of themselves afterwards. An investigation revealed a simple psychological cause. All during high school, the mothers literally had been doing their children’s homework and term papers. They had been cheating their sons and daughters out of their own mental growth…”
“Whereas in earlier years it had been possible to count on the strong motivation and initiative of students to conduct their own affairs, to form new organizations, to invent new projects either in social welfare, or in intellectual fields, it now became clear that for many studnets the responsibility for self-government was often a burden to bear rather than a right to be maintained… Students who were given complete freedom to manage their own lives and to make their own decisions often did not wish to do so… Students in college seem to find it increasingly difficult to entertain themselves, having become accustomed to depend upon arranged entertainment in which their role is simply to participate in the arrangements already made…”
“…a new and frightening passivity, softness, and boredom in American children… incapable of the effort, the endurance of pain and frustration, the discipline needed to compete on the baseball field, or get into college.”
Today’s overinvolved helicopter parents are robbing kids of the character-building experiences of failure and frustration they need, and raising a generation of incompetent narcissists!
Except of course all this is from Betty Friedan’s The Feminine Mystique, published in 1963. (The third passage is testimony from the president of Sarah Lawrence, the rest is Friedan herself.)
It’s amazing: you can open this book to just about any page and find material more relevant to contemporary life than 95% of “how we live now” articles published this month.
I like Franklin Bruno a lot, and I even know him a little bit, so naturally I was interested in obtaining the new Human Hearts album, Another. I could have paid for a download of this album at any point in the last several months, but I didn’t. I could have checked to see if I could listen to it free on Spotify, which I have open on my laptop most of the time, but I didn’t do that either. (Just checked now — it’s not there.) But the other day, when I walked into the record store on my block and saw it on the new releases shelf, I bought it. That’s one thing about physical stores — they give you a reason to buy the thing now, not at some other time, while the continuous and eternal availability of the record online meant that there was no moment at which my desire to hear the new Human Hearts album outweighed my desire to click on whatever else I was clicking on.
I presume there are theorists of this kind of thing.
Anyway, here are some favorite Franklin Bruno tracks.
“Going to Marrakesh,” by the Extra Glenns, which is Bruno together with John Darnielle of the Mountain Goats.
I wanted to link to “Coupon,” which is much noisier and messier than the two above, but I can’t find a publicly available sound file, so you’ll just have to imagine it!
It’s scrolled off the bottom of the page now, but there’s an amazing comment thread going on under my post on “The End of History Illusion,” the Science paper that got its feet caught in a subtle but critical statistical error.
Commenter Deinst has been especially good, digging into the paper’s dataset (kudos to the authors for making it public!) and finding further reasons to question its conclusions. In this comment, he makes the following observation: Quoidbach et al believe there’s a general trend to underestimate future changes in “favorites,” testing this by studying people’s predictions about their favorite movies, food, music, vacation, hobbies, and their best friends, averaging, and finding a slightly negative bias. What Deinst noticed is that the negative bias is almost entirely driven by people’s unwillingness to predict that they might change their best friend. On four of the six dimensions, respondents predicted more change than actually occurred. That sounds much more like “people assign positive moral value to loyalty to friends” than “people have a tendency across domains to underestimate change.”
But here I want to complicate a bit what I wrote in the post. Neither Quoidbach’s paper nor my post directly addresses the question: what do we mean by a “rational prediction?” Precisely: if there is an outcome which, given the knowledge I have, is a random variable Y, what do I do when asked to “predict” the value of Y? In my post I took the “rational” answer to be EY. But this is not the only option. You might think of a rational person as one who makes the prediction most likely to be correct, i.e. the modal value of Y. Or you might, as Deinst suggests, think that rational people “run a simulation,” taking a random draw from Y and reporting that as the prediction.
Now suppose people do that last thing, exactly on the nose. Say X is my level of extraversion now, Y is my level of extraversion in 10 years, and Z is my prediction for the value of Y. In the model described in the first post, the value of Z depends only on the value of X; if X=a, it is E(Y|X=a). But in the “run a simulation” model, the joint distribution of X and Z is exactly the same as the joint distribution of X and Y; in particular, E(|Z-X|) and E(|Y-X|) agree.
I hasten to emphasize that there’s no evidence Quoidbach et al. have this model of prediction in mind, but it would give some backing to the idea that, absent an “end of history bias,” you could imagine the absolute difference in their predictor condition matching the absolute difference in the reporter condition.
There’s some evidence that people actually do use small samples, or even just one sample, to predict variables with unknown distributions, and moreover that doing so can actually maximize utility, under some hypotheses on the cognitive cost of carrying out a more fully Bayesian estimate.
Does that mean I think Quoidbach’s inference is OK? Nope — unfortunately, it stays wrong.
It seems very doubtful that we can count on people hewing exactly to the one-sample model.
Example: suppose one in twenty people radically changes their level of extraversion in a 10-year interval. What happens if you ask people to predict whether they themselves are going to experience such a change in the next 10 years? Under the one-sample model, 5% of people would say “yes.” Is this what would actually happen? I don’t know. Is it rational? Certainly it fails to maximize the likelihood of being right. In a population of fully rational Bayesians, everyone would recognize shifts like this as events with probabiity less than 50%, and everyone would say “no” to this question. Quoidbach et al. would categorize this result as evidence for an “end of history illusion.” I would not.
Now we’re going to hear from my inner Andrew Gelman. (Don’t you have one? They’re great!) I think the real problem with Quoidbach et al’s analysis is that they think their job is to falsify the null hypothesis. This makes sense in a classical situation like a randomized clinical trial. Your null hypothesis is that the drug has no effect. And your operationalization of the null hypothesis — the thing you literally measure — is that the probability distribution on “outcome for patients who get the drug” is the same as the one on “outcome for patients who don’t get the drug.” That’s reasonable! If the drug isn’t doing anything, and if we did our job randomizing, it seems pretty safe to assume those distributions are the same.
What’s the null hypothesis in the “end of history” paper? It’s that people predict the extent of personality change in an unbiased way, neither underpredicting nor overpredicting it.
But the operationalization is that the absolute difference of predictions, |Z-X|, is drawn from the same distribution as the difference of actual outcomes, |Y-X|, or at least that these distributions have the same means. As we’ve seen, even without any “end of history illusion”, there’s no good reason for this version of the null hypothesis to be true. Indeed, we have pretty good reason to believe it’s not true. A rejection of this null hypothesis tells us nothing about whether there’s an end of history illusion. It’s not clear to me it tells you anything at all.
Dating is dead now, at the hand of Facebook, texting, “hanging out,” and “hooking up,” per the New York Times:
Blame the much-documented rise of the “hookup culture” among young people, characterized by spontaneous, commitment-free (and often, alcohol-fueled) romantic flings. Many students today have never been on a traditional date, said Donna Freitas, who has taught religion and gender studies at Boston University and Hofstra and is the author of the forthcoming book, “The End of Sex: How Hookup Culture is Leaving a Generation Unhappy, Sexually Unfulfilled, and Confused About Intimacy.”
Hookups may be fine for college students, but what about after, when they start to build an adult life? The problem is that “young people today don’t know how to get out of hookup culture,” Ms. Freitas said.
This generation is not very good at face-to-face relationships. The image that comes to mind is two students, sitting in the room they share, angrily texting each other, but not talking. They all want to have intimate relationships, they want to get married and have kids, but that’s hard to do if you don’t know how to talk with another person. Just under half of freshmen said they’d been on a date. Relationships often begin with two people meeting at a party and hooking up. Then the next day they check each other out on Facebook, and if they like what they see they might send a message saying they’re going to a party the next night — but not inviting the other person. And if they both show up, and hook up again, that might go on for a while, and then they’d consider posting on Facebook that they were in a relationship.
Oh, for the old days, before Facebook and the ubiquitous Internet, back in 1998, when everything was different, and when Arthur Levine — yep, the same guy — wrote:
One of the things traditional-age undergraduates have been most eager to escape from is intimate relationships. Traditional dating is largely dead on college campuses, replaced by group dating, in which men and women travel in unpartnered packs. Group dating is a practice that provides protection from deeper involvement and intimacy. One student at Southern Methodist University summed up the dating scene this way: “I don’t think there is much serious dating until people are seniors. I mean, people go out a lot but do not want serious relationsips. There is a lot of sex. College is about casual sex.”
Students talked a lot about sex. On a given night the typical pattern is to go to a bar or party off campus, get drunk, and end up back in someone’s room. One student explained, “People will stand in the bar just waiting to be chosen at the end of the night.” Developing a sexual relationship that is not intended to be emotional is just another alternative to traditional dating. It is a pattern repeated all across the country and rationalized by students, who told us repeatedly that they have never seen a successful adult romantic relationship.”
Young people who read my blog, I have an important message for you. I went to college in the early 1990s. There was not much “traditional dating.” Lots of people complained about this, especially in newspaper editorials, and worried about our ability to forge meaningful relationships. You know what happened to us? We all figured out how to get married and have kids. Just so you know.
I’m told that one trick to the astonishing feats carried out by world-class competitive eaters is that your satiety sensor is on something like a twenty-minute delay; so you can really pack an immense amount of food into your body before your brain realizes you’re doing something your stomach doesn’t want you to do.
I was talking to a colleague who wants to start a blog and asked for some advice, and I realized that blogging is kind of like this, too. My math posts are very casual and full of mistakes, and the reason is that my practice is to write a post as soon as it occurs to me — I then have about a half hour before my brain says “Wait, you’re supposed to be working right now.” So in that half hour I have to write as fast as I can, like Kobayashi smashing hot dogs into his mouth.
Yes, this is me blogging:
Is this a good time to mention that I once drank a gallon of milk in four minutes? Here are my tips for success at this important task:
Hal Pashler, a psychologist at UCSD, tweeted my post about the “end of history” study, and this led me to his interesting paper, “Is the Replicability Crisis Overblown?” (with Christine Harris.) Like all papers whose title is a rhetorical question, it comes down in favor of “no.”
Among other things, Pashler and Harris are concerned about the widespread practice of “conceptual replication,” in which rather than reproduce an existing experiment you try to find a similar effect in an adjacent domain. What happens when you don’t find anything?
Rarely, it seems to us, would the investigators themselves believe they have learned much of anything. We conjecture that the typical response of an investigator in this (not uncommon) situation is to think something like “I should have tried an experiment closer to the original procedure—my mistake.” Whereas the investigator may conclude that the underlying effect is not as robust or generalizable as had been hoped, he or she is not likely to question the veracity of the original report. As with direct replication failures, the likelihood of being able to publish a conceptual replication failure in a journal is very low. But here, the failure will likely generate no gossip—there is nothing interesting enough to talk about here. The upshot, then, is that a great many failures of conceptual replication attempts can take place without triggering any general skepticism of the phenomenon at issue.
The solutions are not very sexy but are pretty clear — create publication venues for negative results and direct replication, and give researchers real credit for them. Gary Marcus has a good roundup in his New Yorker blog of other structural changes that might lower the error rate of lab science. Marcus concludes:
In the long run, science is self-correcting. Ptolemy’s epicycles were replaced by Copernicus’s heliocentric system. The theory that stomach ulcers were caused by spicy foods has been replaced by the discovery that many ulcers are caused by a bacterium. A dogma that primates never grew new neurons held sway for forty years, based on relatively little evidence, but was finally chucked recently when new scientists addressed older questions with better methods that had newly become available.
but Pashler and Harris are not so sure:
Is there evidence that this sort of slow correction process is actually happening? Using Google Scholar we searched <“failure to replicate”, psychology> and checked the first 40 articles among the search returns that reported a nonreplication. The median time between the original target article and the replication attempt was 4 years, with only 10% of the replication attempts occurring at lags longer than 10 years (n = 4). This suggests that when replication efforts are made (which, as already discussed, happens infrequently), they generally target very recent research. We see no sign that long-lag corrections are taking place.
It cannot be doubted that there are plenty of published results in the mathematical literature that are wrong. But the ones that go uncorrected are the ones that no one cares about.
It could be that the self-correction process is most intense, and thus most effective, in areas of science which are most interesting, and most important, and have the highest stakes, even as errors are allowed to persist elsewhere. That’s the optimistic view, at any rate.
Let’s say I present you with a portfolio of five stocks, and ask you to predict each stock’s price one year from now. You know the current prices, and you know stocks are pretty volatile, but absent any special reason to think five companies are more likely to have good years than bad ones, you write down the current price as your best prediction for all five slots.
Then I write a paper accusing you of suffering from an “end of financial history illusion.” After all, on average you predicted that the stock values won’t change at all over six months — but in reality, stock prices change a lot! If I compute how much each of the five stock prices changed over the last six months, and average those numbers, I get something pretty big. And yet you, you crazy thing, seem to believe that the stock prices, having arrived at their current values, are to be fixed in place forever more.
Pretty bad argument, right?
And yet the same computation, applied to five personality traits instead of five stocks, got published in Science. Quoidbach, Gilbert, and Wilson write:
In study 1, we sought to determine whether people underestimate the extent to which their personalities will change in the future. We recruited a sample of 7519 adults ranging in age from 18 to 68 years [mean (M) = 40 years, standard deviation (SD) = 11.3 years, 80% women] through the Web site of a popular television show and asked them to complete the Ten Item Personality Inventory (1), which is a standard measure of the five trait dimensions that underlie human personality (i.e., conscientiousness, agreeableness, emotional stability, openness to experience, and extraversion). Participants were then randomly assigned either to the reporter condition (and were asked to complete the measure as they would have completed it 10 years earlier) or the predictor condition (and were asked to complete the measure as they thought they would complete it 10 years hence). We then computed the absolute value of the difference between participants’ ratings of their current personality and their reported or predicted personality and averaged these across the five traits to create a measure of reported or predicted change in personality.
Unfortunately, it’s wrong.
The difference in predictions is not the predicted difference
The error here is just the same as in the story of the stocks. The two quantities
sound like the same thing. But they’re not the same thing. Life’s noncommutative that way sometimes. Quoidbach et al are measuring the former quantity and referring to it as if it’s the latter.
You can see the difference even in a very simple model. Let’s say the ways a stock works is that, over six months, there’s a 30% chance it goes up a dollar, a 25% chance it goes down a dollar, and a 45% chance it stays the same. And let’s say you know this. Then your estimated expected value of the stock price six months from now is “price now + 5 cents,” and the first number — the size of difference between your predicted value and the current value is 5 cents.
But what’s the second number? In your model, the difference between the future price and the current price has a 55% chance of being a dollar and a 45% chance of being zero. So your prediction for the size of the difference is 55 cents — 11 times as much!
If you measure the first quantity and say you’ve measured the second, you’re gonna have a bad time.
In the “predictor” condition of the paper, a rational respondent quizzed about a bunch of stocks will get a score of about 5 cents. What about the “reporter” condition? Then the respondent’s score will be the average value of the difference between the price six months ago and the price now; this difference will be a dollar 55% of the time and zero 45% of the time, so the scores in the reporter condition will average 55 cents.
To sum up: completely rational respondents with full information ought to display the behavior observed by Quoidbach et al — precisely the behavior the authors adduce as evidence that their subjects are in the grips of a cognitive bias!
To get mathy with it for a minute — if Y is the value of a variable at some future time, and X is the value now, the two quantities are
Those numbers would be the same if absolute value were a linear function. But absolute value isn’t a linear function. Unless, that is, you know a priori that Y -X was positive. In other words, if people knew for certain that over a decade they’d get less extraverted, but didn’t know to what extent, you might expect to see the same scores appearing in the predictor and reporter conditions. But this is not, in fact, something people know about themselves.
I always think I’m right but I don’t think I’m always right
The study I’ve mentioned isn’t the only one in the paper. Here’s another:
[In study 3]…we recruited a new sample of 7130 adults ranging from 18 to 68 years old (M = 40.2 years, SD = 11.1 years, 80% women) through the same Web site and asked them to report their favorite type of music, their favorite type of vacation, their favorite type of food, their favorite hobby, and the name of their best friend. Participants were then randomly assigned either to the reporter condition (and were asked to report whether each of their current preferences was the same as or different than it was 10 years ago) or the predictor condition (and were asked to predict whether each of their current preferences would be the same or different 10 years from now). We then counted the number of items on which participants responded “different” and used this as a measure of reported or predicted changes in preference.
Let’s say I tend to change my favorite music (respectively vacation, food, hobby, and friend) about once every 25 years, so that there’s about a 40% chance that in a given ten-year period I’ll make a change. And let’s say I know this about myself, and I’m free from cognitive biases. If you ask me to predict whether I’ll have the same or different favorite food in ten years, I’ll say “same” — after all, there’s a 60-40 chance that’s correct! Ditto for the other four categories.
Once again, Quoidbach et al refer to the number of times I answer “different” as “a measure of predicted changes in preference.” But it isn’t — or rather, it has nothing to say about the predicted number of changes. If you ask me “How many of the five categories do you think I’ll change in the next ten years?” I’ll say “two.” While if you ask me, for each of the five categories in turn, “Do you think you’ll change this in the next ten years?” I’ll say no, five times straight. This is not a contradiction and it is not a failure of rationality and it is not a cognitive bias. It is math, done correctly.
(Relevant philosophical maxim about groundedness of belief: “I always think I’m right, but I don’t think I’m always right.” We correctly recognize that some subset of things we currently believe are wrong, but each particular belief we take as correct. Update: NDE in comments reminds me that WVO Quine is the source of the maxim.)
What kind of behavior would the authors consider rational in this case? Presumably, one in which the proportion of “different” answers is the same in the prospective and retrospective conditions. In other words, I’d score as bias-free if I answered
“My best friend and my favorite music will change, but my favorite food, vacation, and hobby will stay the same.”
This answer has a substantially smaller chance of being correct than my original one. (108/3125 against 243/3125, if you’re keeping score at home.) The author’s suggestion that it represents a less biased response is wrong.
Now you may ask: why didn’t Quoidbach et al just directly ask people “to what extent do you expect your personality to change over the next ten years?” and compare that with retrospective report? To their credit, they did just that — and there they did indeed find that people predicted smaller changes than they reported:
Third, is it possible that predictors in study 1 knew that they would change over the next 10 years, but because they did not know exactly how they would change, they did not feel confident predicting specific changes? To investigate this possibility, we replicated study 1 with an independent sample of 1163 adults (M = 38.4 years, SD = 12.1 years, 78% women) recruited through the same Web site. Instead of being asked to report or predict their specific personality traits, these participants were simply asked to report how much they felt they had “changed as a person over the last 10 years” and how much they thought they would “change as a person over the next 10 years.” Because some participants contributed data to both conditions, we performed a multilevel version of the analysis described in study 1. The analysis revealed the expected effect of condition (β = –0.74, P = 0.007), indicating that predictors aged a years predicted that they would change less over the next decade than reporters aged a + 10 years reported having changed over the same decade. This finding suggests that a lack of specific knowledge about how one might change in the future was not the cause of the effects seen in study 1.
This study, unlike the others, addresses the question the paper proposes to consider. To me, it seems questionable that numerical answers to “how much will you change as a person in the next 10 years?” are directly comparable with numerical answers to “how much did you change as a person over the last 10 years?” but this is a question about empirical social science, not mathematics. Even if I were a social scientist, I couldn’t really judge this part of the study, because the paragraph I just quoted is all we see of it — how the questions were worded, how they were scored, what the rest of the coefficients in the regression were, etc, are not available, either in the main body of the paper or the supplementary material.
[Update: Commenter deinst makes the really important point that Quoidbach et al have made their data publicly available at the ICPSR repository, and that things like the exact wording of the questions, the scoring mechanism, are available there.]
Do we actually underestimate the extent to which we’ll change our personalities and preferences over time? It certainly seems plausible: indeed, other researchers have observed similar effects, and the “changed as a person” study in the present paper is suggestive in this respect.
But much of the paper doesn’t actually address that question. Let me be clear: I don’t think the authors are trying to put one over. This is a mistake — a somewhat subtle mistake, but a bad mistake, and one which kills a big chunk of the paper. Science should not have accepted the article in its current form, and the authors should withdraw it, revise it, and resubmit it.
Yes, I know this isn’t actually going to happen.
One good feature of meeting Adam Phillips was that I got to ask him about Grothendieck’s use of the phrase “the capacity to be alone,” generally associated with the psychoanalyst D.W. Winnicott. Winnicott was Phillips’s analyst’s analyst, and Phillips has written extensively on him, so I thought I’d run the quote by him. Phillips told me:
I have often been told I needed to sit down and have a conversation with a psychoanalyst, and now I’m doing it — in public! Adam Phillips and I will be at Hillel Friday morning to talk about the challenges of writing about technical material for a general audience. Feel free to suggest questions for Phillips in the comments.