Nate Silver at 538 looks at the trailing digits of about 5000 poll results from secretive polling outfit Strategic Vision, finds a badly non-uniform distribution, and says this strongly suggests that SV is making up numbers. I’m a fan of Nate’s stuff, both sabermetric and electoral, but I’m not so sure he’s right on this.

Nate’s argument is similar to that of Beber and Scacco’s article on the fraudulence of Iran’s election returns. Humans are bad at picking “random” numbers; so the last digits of human-chosen (i.e. fake) numbers will look less uniform than truly random digits would.

There are at least three ways Nate’s case is weaker than Beber and Scacco’s.

- In the Iranian numbers, there were too many numbers ending in 7 and too few ending in 0, consistent with the empirical finding that people trying to generate random numbers tend to disfavor “round” numbers like those ending in 0 and 5. The digits from Strategic Vision have a lot of 7s, but even more 8s, and the 0s and 5s are approximately where they should be.
- It’s not so clear to me that the “right” distribution for these digits is uniform. Lots of 7s and 8s, few 1s; maybe that’s because in close polls with a small proportion of undecideds, you’ll see a lot of 48-47 results and not so many 51-41s. I don’t really know what the expected distribution of the digits is — but the fact that I don’t know is a big clothespin between my nose and any assertion of a fishy smell.
- And of course my prior for “major US polling firm invents data out of whole cloth” is way lower than my prior for the Iranian federal government doing the same thing. Strategic Vision could run up exactly the same numbers that Beber and Scacco found, and you’d still be correct to trust them more than the Iran election bureau. Unless your priors are very different from mine.

So I wouldn’t say, as Nate does, that the numbers compiled at 538 “suggest, perhaps strongly, the possibility of fraud.”

**Update (27 Sep):** More from Nate on the Strategic Vision digits. Here he directly compares the digits from Strategic Vision to digits gathered by the same protocol from Quinnipiac. To my eye, they certainly look different. I think this strengthens his case. If he ran the same procedure for five other national pollsters, and the other five all looked like Quinnipiac, I think we’d be in the position of saying “There is good evidence that there’s a methodological difference between SV and other pollsters which has an effect on the distribution of terminal digits.” But it’s a long way from there to “The methodological difference is that SV makes stuff up.”

On the other hand, Nate remarks that the deviation of the Quinnipiac digits from uniformity is consistent with Benford’s Law. This is a terrible thing to remark. Benford’s law applies to the *leading* digit, not the last one. The fact that Nate would even bring it up in this context makes me feel a little shaky about the rest of his computations.

Also, there’s a good post about this on Pollster by Mark Blumenthal, whose priors about polling firms are far more reliable than mine.

I had exactly the same reaction.

I was bothered by some of the same things. But on the other hand, even if 48-47 is likelier than 51-44, some of that difference is going to be cancelled out by 54-41 being likelier than 57-38 — the fact that last digits come in a snaking order will tend to make the distribution of last digits flat unless there’s a *very* strong tendency towards close results and low single digit undecideds. But I do think he needs to look at the distribution of %undecided and the distribution of fractional spreads (by which I mean (leader%-trailer%)/(1-%undecided)), and whether or not these seem to be independent, before trying to draw any conclusions.

Another possible route to non-uniformity might be if Strategic Vision just has incredibly bad random sampling protocols. Imagine the extreme case in which they poll 600 people, but poll 200 from each of three highly homogeneous small towns. Then one would expect the last digits to cluster around 0, 3, 7. Obviously this isn’t realistic as stated, but I’m not convinced that subtler poor sampling procedures wouldn’t also introduce biases.

Those of you who are criticizing Nate Silver for his references to Benford’s Law have probably not read his article carefully. He didn’t believe, any more than you do, that Benford’s law should apply to this situation. But Benford’s Law is the only well-established priniple on which any set of presumably random numbers would deviate – in any digit – from an essentially even distribution. In an attempt to preempt criticism of his conclusions by people who might cite Benford’s Law as an explanation for SV’s unexpected results, he showed that Benford’s Law wouldn’t change his conclusion.

No, the Benford reference is total nonsense, even in the way he thinks (or some of his commenters think) it might apply. The idea seems to be that since 11 comes before 19, and 21 comes before 29, and 31 comes before 39, etc., you might expect the last digit 1 to be more frequent than last digit 9. But Benford’s law applies to real-world distributions of numbers where there’s no forcible upper bound on the numbers appearing in the distribution — say, street addresses, or dollar amounts appearing in filled-out tax forms, etc. You wouldn’t expect Benford’s law in any form to apply to percentages, which have to be in the interval [0,100].

I’m not remotely competent to respond to the statistical arguments here, though I hope I can learn from them. There is, however, some news affecting what Jordan calls priors: as both Nate Silver and Mark Blumenthal have now reported, it seems to be quite hard to find a Strategic Vision office that looks like an office, and hard to find an SV employee other than the president of the firm. I too would like to see Nate Silver re-run his analysis with a couple of other high-volume political pollsters (SUSA, for example), but I wonder whether SV will now feel obligated to prove that it exists.

Benford’s law might be relevant if SV “made up” their numbers by taking a collection of “real” world numbers, and reversed them (eg to make their data look “more random”). But I agree that it’s a bit of a stretch.

Update: SV has a storefront office in Blairsville, GA. Still waiting for evidence that actual polls were conducted there, though.

According to one of Nate’s posts Strategic Vision claims (or has claimed) to have an office in Madison, WI. The address given? The UPS store on regent street. I wouldn’t call a mailbox an office.

This comment to Nate’s latest post, assuming the commenter’s calculations are correct (I haven’t double-checked them), goes a long way towards skewering Nate’s analysis. The distribution of undecideds and the distribution of spreads can have a strong effect on the distribution of last digits:

http://www.fivethirtyeight.com/2009/09/monday-mish-mash-strategic-vision-las.html#comment-649421898547558528

[…] but want to see more evidence. Jordan Ellenberg, a University of Wisconsin, Madison, mathematician, blogged that the case isn’t as persuasive as investigations into possible fraud in the Iranian […]