I gave a talk at East HS yesterday about “not getting fooled by probability,” in which I talked about the Diana Sylvester murder case, discussed previously on the blog as an example of the prosecutor’s fallacy. While getting ready for the talk I came across this post about the case by Penn State law professor David Kaye, which explains how, in this case, the calculation proposed by the defense was wrong in its own way.
Here’s the setup. You’ve got DNA found at the scene of the crime, with enough intact markers that the chance of a random person matching the DNA is 1 in 1.1 million. You’ve got a database of 300,000 convicted violent criminals whose DNA you have on file. Out of these 300,000, you find one guy — otherwise unconnected with the crime — who matches on all six. This guy gets arrested and goes to trial. What should you tell the jury about the probability that the defendant is guilty?
The prosecution in the case stressed the “1 in 1.1 million” figure. But this not the probability that the defendant was innocent. The defense brought forward the fact that the expected number of false positives in the database was about 1/3; but this isn’t the probability of innocence either. (The judge blocked the jury from hearing the defense’s number, but allowed the prosecution’s, whence the statistical controversy over the case.)
As Kaye points out, the missing piece of information is the prior probability that the guilty party is somewhere in the DNA database. If that probability is 0, then the defendant is innocent, DNA or no DNA. If that probability is 1, then the defendant is guilty, because everyone else in the database has been positively ruled out. Let x be the prior probability that someone in the database is guilty; let p be the probability of a false positive on the test; and let N be the size of the database. Then a quick Bayes shows that the probability of the defendant’s guilt is
x / (x + (1-x)Np).
If Np is about 1/3, as in the Sylvester case, then everything depends on x. But it’s very hard to imagine that a good value for x is any more than 1/2, especially given the existence of a pretty good suspect in the case who died in 1978 and isn’t in the database. Then the defendant has at most a 3/4 probability of guilt, not nearly enough to convict. The prosecution and the defense both presented wrong numbers to the judge; but the defense numbers were less wrong.