## Iranian election statistics — never mind the digits?

I blogged last year about claims that fraud in the 2009 Iranian election could be detected by studying irregularities in the distribution of terminal digits.  Eric A. Brill just e-mailed me an article of his which argues against this methodology, pointing out that the provincial vote totals (the ones with the fishy last digits) agree with the sums of the county totals, which in turn agree with the sums of the district totals.  In order for the provincial totals to have been made up, you’d have to change a lot of county totals too (changing the total in just one county by a believable amount presumably wouldn’t make a big enough difference in the provincial totals.)  But if you add Ahmadinejad votes to a county here and a county there, the provincial total would be the sum of a bunch of human-chosen numbers, and there’s no reason to expect such a sum to have non-uniformly distributed last digits.  The Beber-Scacco model requires that the culprits start with a target number at the provincial level and then carefully modify county and district level numbers to make the sums match.  But why would they?

## Strategic Vision done in by the digits?

Nate Silver at 538 looks at the trailing digits of about 5000 poll results from secretive polling outfit Strategic Vision, finds a badly non-uniform distribution, and says this strongly suggests that SV is making up numbers.  I’m a fan of Nate’s stuff, both sabermetric and electoral, but I’m not so sure he’s right on this.

Nate’s argument is similar to that of Beber and Scacco’s article on the fraudulence of Iran’s election returns.  Humans are bad at picking “random” numbers; so the last digits of human-chosen (i.e. fake) numbers will look less uniform than truly random digits would.

There are at least three ways Nate’s case is weaker than Beber and Scacco’s.

1. In the Iranian numbers, there were too many numbers ending in 7 and too few ending in 0, consistent with the empirical finding that people trying to generate random numbers tend to disfavor “round” numbers like those ending in 0 and 5.  The digits from Strategic Vision have a lot of 7s, but even more 8s, and the 0s and 5s are approximately where they should be.
2. It’s not so clear to me that the “right” distribution for these digits is uniform.  Lots of 7s and 8s, few 1s; maybe that’s because in close polls with a small proportion of undecideds, you’ll see a lot of 48-47 results and not so many 51-41s.  I don’t really know what the expected distribution of the digits is — but the fact that I don’t know is a big clothespin between my nose and any assertion of a fishy smell.
3. And of course my prior for “major US polling firm invents data out of whole cloth” is way lower than my prior for the Iranian federal government doing the same thing.  Strategic Vision could run up exactly the same numbers that Beber and Scacco found, and you’d still be correct to trust them more than the Iran election bureau.  Unless your priors are very different from mine.

So I wouldn’t say, as Nate does, that the numbers compiled at 538  “suggest, perhaps strongly, the possibility of fraud.”

Update (27 Sep): More from Nate on the Strategic Vision digits.  Here he directly compares the digits from Strategic Vision to digits gathered by the same protocol from Quinnipiac.  To my eye, they certainly look different.  I think this strengthens his case.  If he ran the same procedure for five other national pollsters, and the other five all looked like Quinnipiac, I think we’d be in the position of saying “There is good evidence that there’s a methodological difference between SV and other pollsters which has an effect on the distribution of terminal digits.”  But it’s a long way from there to “The methodological difference is that SV makes stuff up.”

On the other hand, Nate remarks that the deviation of the Quinnipiac digits from uniformity is consistent with Benford’s Law.  This is a terrible thing to remark.  Benford’s law applies to the leading digit, not the last one.   The fact that Nate would even bring it up in this context makes me feel a little shaky about the rest of his computations.

Also, there’s a good post about this on Pollster by Mark Blumenthal, whose priors about polling firms are far more reliable than mine.