If you can’t say anything intelligent about probability, don’t say anything at all

This, from Politico’s Dylan Byers, is infuriating:

Prediction is the name of Silver’s game, the basis for his celebrity. So should Mitt Romney win on Nov. 6, it’s difficult to see how people can continue to put faith in the predictions of someone who has never given that candidate anything higher than a 41 percent chance of winning (way back on June 2) and — one week from the election — gives him a one-in-four chance, even as the polls have him almost neck-and-neck with the incumbent.

Why?  Why is it difficult to see that?  Does Dylan Byers not know the difference between saying something is unlikely to happen and declaring that it will not happen?

Silver cautions against confusing prediction with prophecy. “If the Giants lead the Redskins 24-21 in the fourth quarter, it’s a close game that either team could win. But it’s also not a “toss-up”: The Giants are favored. It’s the same principle here: Obama is ahead in the polling averages in states like Ohio that would suffice for him to win the Electoral College. Hence, he’s the favorite,” Silver said.

For all the confidence Silver puts in his predictions, he often gives the impression of hedging. Which, given all the variables involved in a presidential election, isn’t surprising. For this reason and others — and this may shock the coffee-drinking NPR types of Seattle, San Francisco and Madison, Wis. — more than a few political pundits and reporters, including some of his own colleagues, believe Silver is highly overrated.

Hey!  That’s me!  I live in Madison, Wisconsin!  I drink coffee!  Wait, why was that relevant again?

To sum up:  Byers thinks Nate Silver is overrated because he “hedges” — which is to say, he gives an accurate assessment of what’s going on instead of an inaccurate one.

This makes me want to stab my hand with a fork.

I’m happy that Ezra Klein at the Post decided to devote a big chunk of words to explaining just how wrong this viewpoint is, so I don’t have to.  You know what, though, I’ll bet Ezra Klein drinks coffee.

Tagged , ,

29 thoughts on “If you can’t say anything intelligent about probability, don’t say anything at all

  1. byesac says:

    As much as I dislike Barack Obama, I have to say you’re right. I haven’t seen Nate’s methodologies, but I know of him from Baseball Prospectus.

  2. Rob H. says:

    Wow. So, you can get paid to just talk nonsense? How do I get one of these jobs?

  3. David Speyer says:

    Einstein was disturbed at the idea that the final subatomic theory might only make probabilistic statements rather than predict exactly what would happen.

    I never knew that people held similar expectations for political science.

  4. JSE says:

    I wasn’t sure how people would respond to this post, but I can certainly say that comparisons between the Politico writer and Einstein were not among the outcomes I considered.

  5. rmb says:

    But do you listen to NPR? Politico seems to exists to publish idiotic articles, and I’ve never heard of Dylan Byers before this. I find it much more disturbing that David Brooks, who has a regular column in the New York Times, is going around claiming not to understand (or believe in?) probability.

  6. John Baez says:

    “This makes me want to stab my hand with a fork.”

    That’d be the wrong person’s hand.

  7. Noah Snyder says:

    My favorite thing I’ve read about this brouhaha was Ezra Klein’s tweet: “Lots of pundits don’t like Nate Silver because he makes them feel innumerate. Then they criticize him and prove it.”

  8. Noah Snyder says:

    The other thing that annoys me in the anti-538 articles, is that they entirely forget the primary season from 2008. Nate Silver’s impressive accomplishment wasn’t predicting which states would go which way in 2008: that wasn’t very hard and other people got it roughly right too. The impressive thing that made his name was that he thrashed the completely poll based models in predicting the primary and caucus outcomes by using correlations across different states to overcome the sparsity of good polling data.

  9. James Borger says:

    I actually think there are some subtle issues here, so I’ll try to play public defender (even though I don’t really doubt the defendant is guilty). What actually does it mean to say, as Silver does, that as of Oct 30, Obama has a 77.4% chance of winning? If Obama goes on to win, was the number 77.4 too low? Or if he loses, was it too high? Suppose someone else says Obama has a 20% chance. I think any argument about which is a better prediction would involve some discussion of what the *real* probability of an Obama victory is. And what does that mean?

    Ignoring for the moment that 2012 US presidential election is a one-off event, the usual way (I suppose) of answering questions like this would be to run repeated trials. But even there, it’s not clear to me how to compare two prediction schemes. Suppose you have two weather forecasters that give rain predictions, certain percentages, every day. How do you decide which one does a better job? Surely the answer depends on what you care about. If you don’t mind carrying an umbrella so much, but you hate to get wet, then you’d prefer a cautious forecaster. But the opposite would be true if you hate carrying an umbrella and don’t mind getting wet. Perhaps you could set up some kind of gambling house where the two forecasters could play the rain-prediction game and some standard casino games with certain mathematically justified odds so that they could do some arbitrage. But surely who does better would depend on the details of the auxiliary games, right?

    For one-off events, things are even tougher, at least for me. I can’t think of any way of even trying to answer such questions without forming a model and looking at generic events that the model allows (sort of like a spreading argument in arithmetic algebraic geometry!). But this depends hugely on the model, right? If you and I are going to flip a coin once, we might agree on the model. But if we’re going to wager on the 1,000s digit of the 2013 population of the town I lived in when I was ten, we’ll probably have different models.

    I guess all I’m trying to say that I find these things genuinely confusing (albeit to varying degrees), so I’m a little sympathetic when other people get tripped up. On the other hand, I wouldn’t bet against Silver.

  10. Willie Wong says:

    For a proper assessment of people like Nate Silver, one should not turn to political pundits. Instead, one should turn to bookies. :-) http://www.oddschecker.com/specials/politics-and-election/us-presidential-election/winner A lot of the betting agencies are offering around 2:5 for Obama and 2:1 against Romney. Since the betting agencies are in the business of making money, that at least shows that somewhere around twice as many bets are placed for Obama as opposed to Romney.

  11. Noah Snyder says:

    Of course this is a bit circular, as people’s bets are affected by prominent predictions like Silver’s.

  12. Noah Snyder says:

    I would really like to see a good write-up of how we should evaluate models like Silver’s (at least the final numbers, testing earlier predictions is even harder). Bayesian statistics should at least be able to say *something* about how the results of the election should update our degree of belief in Silver’s predictive abilities. In addition there should be some way to leverage the fact that we end up with 50 different (but not independent!) pieces of data coming in. We might need some additional non-public information about the results of Silver’s simulations, but I bet if there was something specific we could ask him for he might be willing to give more info.

    I’m imagining something like “a prediction of the election outcome consists of the mean and std. deviation for the overall popular vote, plus for each state the mean and std. deviation for how far that state’s vote is from the overall popular vote.” It seems like there should be some way to somewhat rigorously test whether such a prediction is good given the outcome of the election.

  13. Noah Snyder says:

    Another way to test the model, which again may require access to non-public data, is to find specific bets that the model is very confident in. For, example, 538 has Obama as a 98% favorite in MI. So if Romney wins Michigan you have very good reason to doubt that 538 is worth paying attention to. But you should be able to find more complicated hedged bets that 538 is also very confident in. Say something like “we will not see Obama winning Virginia while losing North Carolina by double digits.” It should be possible to find hedged bets like this that would distinguish between different models with higher confidence than just looking at the headline result.

  14. Noah Snyder says:

    One thing that won’t be difficult is judging which is more right 538 or unskewed polls. Unskewed polls’ “definitive prediction” has lots of things that 538 has at one or two percentage likelihood or worse. Unskewed’s PA prediction (Romney by 6) is a roughly 4 standard deviation result under 538′s model.

  15. berseliusx says:

    This line of attack on Silver has been facepalm-worthy, but it’s kind of funny to see the stats-vs-scouts narrative spill into the political area. Silver’s had a lot of experience dealing with these types from his baseball work.

  16. Kevin C. says:

    For a single event like “Obama wins Michigan”, Obama actually winning would not provide too much confirmation. Combining several such events would provide a little more confirmation, but the catch is that different hedged bets are highly non-independent of each other.

    I guess an interesting question is if there’s a way to find “uncoupled” events to avoid this.

  17. byesac says:

    One marked difference between predicting sports and predicting politics is that in sports, predictions are based on the attributes of the competitors, whereas in politics, I suppose predictions are based on what other people think of the competitors. I would think that this is a challenge in predicing politics.

  18. byesac says:

    Now Hardball Talk has posted your story.

  19. JSE says:

    Where? I don’t see it.

  20. plm says:

    As remarked by Noah Snyder, James Borger, and probably many others it is not a trivial issue to assess the meaning of “probability of winning the election” and the models used to quantify such probabilities.

    I think the best way to present the topic would perhaps have been to mention, right after a leisurely baseball introduction, that this can actually be formalized and done with mathematics, and then try to intelligently criticize alternative models (among which that of Nate Silver), and perhaps relate them to people’s attitudes, intuitive ways of coming up with such propabilities.
    I understand this may have been unexciting prospect.

    I am not a statistician and in JSE’s first post on the matter I made the following comment, describing my reasoning, trying to understand those issues, which I hope(d) could be helpful:

    I know that a blog often balances technical and informal content, and may be a way to release some pressure for the author, but I’d be interested to have technical comments (even only references) on the nonparametric statistics involved in Nate Silver’s estimation/prediction work. I think this could help other scientifically-inclined readers form their opinion on the matter, and it may have made relevant discussions on this blog more efficient.

  21. Rob H. says:

    I wish these people were at least displaying some bravado. Instead, they’re really just being wimps about it. They’re basically saying if Romney wins, then Nate Silver should never be listened to again. I wish at least one of these “critics” would have the guts to say “and if on the other hand, Obama wins the way Nate Silver predicts (the most common outcome being a bit over 330 electoral votes), then I will quit punditry”. It’s one thing being a brave idiot…

  22. Ben Wieland says:

    JB, NS: I know of two ways of scoring predictions and aggregating
    performance on disparate events.
    The first method is calibration. It is a consistency check that
    does not measure quality. But it is fair to compare calibration of
    people who made predictions about different events. The second
    method is a proper scoring rules. Someone trying to maximize score
    must reveal his true beliefs. But it is only fair to compare
    scores for predictions of the same set of events.
    Another problem is that it is hard to get people to try to
    maximize their scores, rather than get a better score than their

    Look at all the times you said 90%. If
    90% of these are true, you did a good job. If 80% are true, you’re
    overconfident and if 95% are true, you’re underconfident.
    Most people are overconfident, really 70% correct when they say 90%.
    It is fair to compare calibrations for different sets of events,
    but some events are easier than others. If you’re calibrated with
    respect to dice, it doesn’t give much confidence about your
    political predictions. But if you aren’t calibrated on dice rolls,
    your predictions about politics are probably worthless, at least
    if they pass through your intuitive sense of probability.
    PredictionBook is a website
    for recording predictions and measuring calibration.

    Scoring rules: There are several scoring rules that assign
    a score to each prediction, which are summed to produce a total
    score. The logarithmic scoring rule has the theoretical advantage
    that the score does not depend on the decomposition of events;
    that is, if you consider two coin flips to be two events, whose
    scores are added, or a single event with four outcomes. The
    logarithmic scoring rule is that you if outcome i occurs and you
    assign probability p_i to it, you get log(p_i) points. That’s a
    negative number; its absolute value is called surprisal. If every
    cycle you say that every House race is a coinflip, you’ll be well
    calibrated. But someone who says that the incumbent has a 2/3 chance of winning will have a better score and worse calibration.

    It is fair to score people on correlated predictions. For example,
    we can compare 538 vs Intrade on 100 different predictions, all
    predicting the election, but taken on 100 different days.
    If Obama wins, then the score is just a measure of how much they
    favored him, over time. Maybe they have the same prediction today,
    but if Intrade had too much of a debate bounce, then it is
    penalized. But in a hypothetical scenario in which the debate
    really did reverse the situation, someone who waited to respond to
    the debate would be penalized for tardiness. So it makes some
    intuitive sense that at the time of the debate you should reveal
    your true belief.

    If you correctly model an event then your expected absolute
    score, aka your expected surprisal, is the entropy of your
    model. If you are wrong, the KL divergence shows up as a
    correction. If you make a bunch of uncorrelated
    predictions with roughly equal variance, CLT applies and comparing
    your score to your modeled entropy is a form of calibration. For
    example, I believe that 538 models the states as correlated only
    through the national popular vote. Thus, once we know the national
    popular vote, the model yields 50 predictions that it claims are
    uncorrelated. Thus it should expect its score across those 50
    predictions to be pretty close to the entropy of the model given
    the popular vote. You can also consider the model as a single
    random variable with a very large state space. It is fair to
    compute a score on that large prediction, to compare to other
    people’s predictions, but because of the correlations, it’s not
    clear that CLT applies and thus it is not clear how bothered you
    should be if the surprisal is large. But you can compute the
    variance of the surprisal of your model and decide if a high
    surprisal is a sign of poor calibration.

    JB: even if you have an asymmetric loss function (getting wet is
    more expensive than carrying an umbrella), (von
    Neumann-Morgenstern) decision theory says that you want accurate
    predictions. You then choose your action to minimize the expected cost of
    your action, a random variable depending on the random variable of

  23. Richard Séguin says:

    Nate Silver was interviewed today on the Canadian radio program As it Happens:


    His was the second interview in the hour long program:


    “The number-cruncher whose predictions for the last U-S election were bang on explains why so many people are at odds with odds this time.”

    (For those in Wisconsin who are not aware of this program, it airs on Wisconsin Public Radio at 8 PM on weekdays.)

  24. DNA Birthday says:

    Over the past few days, Nate Silver has written a series of brilliantly lucid posts implicitly responding to the critics, in the process explaining subtleties in the use and interpretation of statistics to the general public.
    He has been both a model of decorum and pedagogy, as well as having constructed a detailed predictive model (the full details of which I wish were documented somewhere…).
    Am motivated to find time to read his book, which you reviewed positively.

  25. plm says:

    I am quite surprised nobody tries to discuss the ways people come up with “win(ning) probabilities”. I tried a little to learn about that and it is not as straightforward as I had hoped.

    In baseball predictions they use Markov chains with lots of parameters which they estimate based on lots of statistics on players, teams, places, etc.

    In the case of Nate Silver’s blog the information I could get on their site is that it is very much ad hoc, not trying to view election runs as processes with poll data giving values over time, but relying on recipes that look very simplistic, that they probably feel reliable but are a bit fuzzy to me. At least I wish they explained them clearly and in detail on their website.

    (see step 7)

    It seems that parametric time series models like moving averages are not designed with elections or baseball scores in mind as processes to model, so they would not give more relevant results than very simplistic methods, while requiring alot more computation. Similarly for the well-developed nonparametric analysis of, say, financial time series.

    So it is possible that there is a need to adapt some of the methods of time series analysis to perhaps more stable processes like baseball scores or voters preferences. That is, I think perhaps sports scores and people preferences are more stable, or consolidating than economic time series. There may be an interesting relation between underlying dynamical systems and the choice of time series model to put to light.

    (I think there have been recent books treating statistics applied to determining dynamical systems.)

  26. Richard Séguin says:

    Today Jeff Greenfield of Yahoo! News wrote:

    “… the ability of analysts like the New York Times’ Nate Silver to predict the outcome of a race with near precision, means that those of us who got into politics because we were told there’d be no math have got to get a clue.”

    “Issenberg’s book details precisely how the combination of behavioral psychology and data crunching enables campaigns to find supporters and persuade them to go to the polls.”

    “This just-concluded campaign demonstrated forcefully that if you do not understand this brave new world, you will not understand politics, no matter how well you know the history of the Electoral College.”


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 322 other followers

%d bloggers like this: