Category Archives: bad statistics

More pie than plate, Dane County edition

One chapter of How Not To Be Wrong, called “More Pie Than Plate” (excerpted in Slate here) is about the perils you are subject to when you talk about percentages of numbers (like “net new jobs”) that may be negative.

Various people, since the book came out, have complained that How Not To Be Wrong is a leftist tract, intended to smear Republicans as being bad at math.  I do not in fact think Republicans are bad at math and it sort of depresses me to feel my book reads that way to those people.  What’s true is that, in “More Pie Than Plate,”  I tear down an old Mitt Romney ad and a Scott Walker press release.  But the example I lead with is a claim almost always put forward by liberal types:  that the whole of the post-recession rebound has accrued to the 1%.  Not really true!

Long intro to this: I get to polish my “calling out liberal claims” cred by objecting to this, from the Milwaukee Journal-Sentinel:

UW-Madison, the fourth-largest academic research institution in the country with $1.1 billion of annual research spending, has helped spur strong job growth in surrounding Dane County. In fact, employment gains there during the last 10 years far outstrip those in any other Wisconsin county, accounting for more than half of the state’s 36,941 net new private-sector jobs.

I’m pro-UW and pro-Dane County, obviously, but people need to stop reporting percentages of net job gains.  What’s more — the reason job gains here outstrip other counties is that it’s the second-biggest county in the state, with a half-million people.  Credit to the Journal-Sentinel; at least they included a table, so you can see for yourself that lots of other counties experienced healthy job growth over the decade.

But just as I was ready to placate my conservative critics, Rick Perry went to Iowa and said:

“In the last 14 years, Texas has created almost one-third of all the new jobs in America.”

Dane County and Rick Perry, you both have to stop reporting percentages of net job gains.

Tagged , ,

Michael Harris on Elster on Montaigne on Diagoras on Abraham Wald

Michael Harris — who is now blogging! — points out that Montaigne very crisply got to the point I make in How Not To Be Wrong about survivorship bias, Abraham Wald, and the missing bullet holes:

Here, for example, is how Montaigne explains the errors in reasoning that lead people to believe in the accuracy of divinations: “That explains the reply made by Diagoras, surnamed the Atheist, when he was in Samothrace: he was shown many vows and votive portraits from those who have survived shipwrecks and was then asked, ‘You, there, who think that the gods are indifferent to human affairs, what have you to say about so many men saved by their grace?’— ‘It is like this’, he replied, ‘there are no portraits here of those who stayed and drowned—and they are more numerous!’ ”

The quote is from Jon Elster, Reason and Rationality, p.26.

Tagged , , ,

What correlation means

From Maria Konnikova’s New Yorker piece on Randall Munroe and what makes science interesting:

In a meta-analysis of sixty-six studies tracking interests over time (the average study followed subjects for seven years), psychologists from the University of Illinois at Urbana–Champaign found that our interests in adolescence had only a point-five correlation with our interests later in life. This means that if a subject filled out a questionnaire about her interests at the age of, say, thirteen, and again at the age of twenty-one, only half of her answers remained consistent on both.

I think it’s totally OK to not say precisely what correlation means.  It’s sort of subtle!  It would be fine to say the correlation was “moderate,” or something like that.

But I don’t think it’s OK to say “This means that…” and then say something which isn’t what it means.  If the questionnaire was a series of yes-or-no questions, and if exactly half the answers stayed the same between age 13 and 21, the correlation would be zero.  As it should be — 50% agreement is what you’d expect if the two questionnaires had nothing to do with each other.  If the questionnaire was of a different kind, say, “rate your interest in the following subjects on a scale of 1 to 5,” then agreement on 50% of the answers would be more suggestive of a positive relationship; but it wouldn’t in any sense be the same thing as 0.5 correlation.  What does the number 0.5 add to the meaning of the piece?  What does the explanation add?  I think nothing, and I think both should have been taken out.

Credit, though — the piece does include a link to the original study, a practice that is sadly not universal!  But demerit — the piece is behind a paywall, leaving most readers just as unable as before to figure out what the study actually measured.  If you’re a journal, is the cost of depaywalling one article really so great that it’s worth forgoing thousands of New Yorker readers actually looking at your science?

 

 

 

Tagged ,

Where are people buying How Not To Be Wrong?

Amazon Author Central shows you Bookscan sales for your books broken down by metropolitan statistical area.  (BookScan tracks most hardcover sales, but not e-book sales.)  This allows me to see which MSAs are buying the most and fewest copies, per capita, of How Not To Be Wrong.  Unsurprisingly, Madison has by far the highest number of copies of HNTBW per person.  But Burlington, VT is not far behind!  Then there’s a big drop, until you get down to DC, SF, Boston, and Seattle, each of which still bought more than twice as many copies per person as the median MSA.

Where do people not want the book?  Lowest sales per capita are in Miami.  They also have little use for me in Los Angeles, Atlanta, and Houston.  Note that for reasons of time I only looked at the 30 MSAs that sold the most copies of the book; going farther down that list, there are more pretty big cities where the book is unpopular, like Tampa, Charlotte, San Antonio, and Orlando.

It would be interesting to compare the sales figures, not to population, but to overall hardcover book sales.  But I couldn’t find this information broken down by city.

 

How do you share your New York Times?

My op/ed about math teaching and Little League coaching is the most emailed article in the New York Times today.  Very cool!

But here’s something interesting; it’s only the 14th most viewed article, the 6th most tweeted, and the 6th most shared on Facebook.  On the other hand, this article about child refugees from Honduras is

#14 most emailed

#1 most viewed

#1 most shared on Facebook

#1 most tweeted

while Paul Krugman’s column about California is

#4 most emailed

#3 most viewed

#4 most shared on Facebook

#7 most tweeted.

Why are some articles, like mine, much more emailed than tweeted, while others, like the one about refugees, much more tweeted than emailed, and others still, like Krugman’s, come out about even?  Is it always the case that views track tweets, not emails?  Not necessarily; an article about the commercial success and legal woes of conservative poo-stirrer Dinesh D’Souza is #3 most viewed, but only #13 in tweets (and #9 in emails.)  Today’s Gaza story has lots of tweets and views but not so many emails, like the Honduras piece, so maybe this is a pattern for international news?  Presumably people inside newspapers actually study stuff like this; is any of that research public?  Now I’m curious.

 

 

Tagged , , , ,

Statistical chutzpah in the Indiana school grade-changing scandal

I wrote a piece for Slate yesterday about Tony Bennett, the former Indiana schools czar who intervened in the state’s school-grading system to ensure that a politically connected public charter got an A instead of a C.  (The AP’s Tom LoBianco broke the original story.)  Bennett offered interviewers an explanation for the last-minute grade change which was plainly contradicted by the figures in the internal e-mails LoBianco had obtained and released.  Presumably, Bennett figured nobody would bother to look at the actual numbers.  That is incredibly annoying.

Summary of what actually happened in Indiana, by analogy:

Suppose the syllabus for my math class said that the final grade would be determined by averaging the homework grade and the exam grade, and that the exam grade was itself the average of the grades on the three tests I gave. Now imagine a student gets a B on the homework, gets a D-minus on the first two tests, and misses the third. She then comes to me and says, “Professor, your syllabus says the exam component of the grade is the average of my grade on the three tests—but I only took twotests, so that line of the syllabus doesn’t apply to my special case, and the only fair thing is to drop the entire exam component and give me a B for the course.”

I would laugh her out of the office. Or maybe suggest that she apply for a job as a state superintendent of instruction.

 

 

 

Tagged , , , , ,

10,000 baby names of Harvard

My 20th Harvard reunion book is in hand, offering a social snapshot of a certain educationally (and mostly financially) elite slice of the US population.

Here is what Harvard alums name their kids.  These are chosen by alphabetical order of surname from one segment of the book.  Most of these children are born between 2003 and the present.  They are grouped by family.

Molly, Danielle

Zachary, Zoe, Alex

Elias, Ella, Irena

Sawyer, Luke

Peyton, Aiden

Richard, Sonya

Grayson, Parker, Saya

Yoomi, Dae-il

Io, Pico, Daphne

Lucine, Mayri

Matthew, Christopher

Richard, Annalise, Ryan

Jackson

Christopher, Sarah, Zachary, Claire

Shaiann, Zaccary

Alexandra, Victoria, Arianna, Madeline

Samara

Grace, Luke, Anna

William, Cecilia, Maya

Bode, Tyler

Daniel, Catherine

Alex, Gretchen

Nathan, Spencer, Benjamin

Ezekiel, Jesse

Matthew, Lauren, Ava, Nathan

Samuel, Katherine, Peter, Sophia

Ameri, Charles

Sebastian

Andrew, Zachary, Nathan

Alexander, Gabriella

Liam

Andrew, Nadia

Caroline, Elizabeth

Paul, Andrew

Shania, Tell, Delia

Saxon, Beatrix

Benjamin

Nathan, Lukas, Jacob

Noah, Haydn, Ellyson

Freddie

Leonidas, Cyrus

Isabelle, Emma

Joseph, Theodore

Asha, Sophie, Tejas

Gabriela, Carlos, Sebastian

Brendan, Katherine

Rayne

James, Seeger, Arden

Helena, Freya

Alexandra, Matthew

George

If you saw these names, would you be able to guess roughly what part of the culture they were drawn from?  Are there ways in which the distribution is plainly different from “standard” US naming practice?

 

Tagged , ,

Tantalisingly close to significance

Matthew Hankins and others on Twitter are making fun of scientists who twist themselves up lexically in order to report results that fail the significance test, using phrases like “approached but did not quite achieve significance” and “only just insignificant” and “tantalisingly close to significance.”

But I think this fun-making is somewhat misplaced!  We should instead be jeering at the conventional dichotomy that a result significant at p < .05 is “a real effect” and one that scores at p = .06 is “no effect.”

The lexically twisted scientists are on the side of the angels here, insisting that a statistically insignificant finding is usually much better described as “not enough evidence” than “no evidence,” and should be mentioned, in whatever language the journal allows, not mulched.

 

 

 

 

Tagged , ,

Math on Trial, by Leila Schneps and Coralie Colmez

The arithmetic geometer Leila Schneps, who taught me most of what I know about Galois actions on fundamental groups of varieties, has a new book out, Math on Trial:  How Numbers Get Used and Abused in the Courtroom, written with her daughter Coralie Colmez.  Each chapter covers a famous case whose resolution, for better or worse, involved a mathematical argument.  Interspersed among the homicide and vice are short chapters that speak directly to some of the mathematical and statistical issues that arise in legal matters.  One of the cases is the much-publicized prosecution of college student Amanda Knox for a murder in Italy; today in the New York Times, Schneps and Colmez write about some of the mathematical pratfalls in their trial.

I am happy to have played some small part in building their book — I was the one who told Leila about the murder of Diana Sylvester, which turned into a whole chapter of Math on Trial; very satisfying to see the case treated with much more rigor, depth, and care than I gave it on the blog!  I hope it is not a spoiler to say that Schneps and Colmez come down on the side of assigning a probability close to 1 that the right man was convicted (though not nearly so close to 1 as the prosecution claimed, and perhaps to close enough for a jury to have rightfully convicted, depending on how you roll re “reasonable doubt.”)

Anyway — great book!  Buy, read, publicize!

 

 

Tagged , , , , , ,

Voros McCracken is a wise, wise man

From McCracken’s talk at the MIT Sloan Sports Analytics Conference, reported by Fangraphs:

Just because everyone knows OBP is important doesn’t mean OBP isn’t important. Just because we learned something a long time ago doesn’t mean we should unlearn it. We should keep it and add to it. There are a lot of people who are itching to do the next new thing. That’s great, it’s just that mindset can cause you to forget some of the basics.

“Not to pint fingers at any team, but to a certain extent the Mariners did that. They got so wrapped up in talking advantage of fielding statistics that they forgot they should have a first baseman with an on-base percentage over .280. Maybe that’s unfair. If they were here, they may interrupt me and say no, that’s not the way it happened. But my perception is that sometimes you can forget about the basics when you’re pursuing something new.

“You might say to yourself, ‘I want a stat that can measure this.’ Then video technology comes out and gives you the stat you wanted to measure. There is a tendency to think, “Ooh, I’ve been waiting for this, and now I’ve got it, and it’s the greatest stat in the world.” But you haven’t even looked at it yet. You haven’t looked at what it actually says — what its weaknesses are. There’s a hazard there. You want to know more things than your competition. What you don’t want is to know something your competition doesn’t, and it’s wrong. If everybody is wrong about something it doesn’t hurt you too bad, but if you’re the only one, you have 29 teams taking advantage of your mistake.

“Scouting is still an important a smell test. If scouts all say someone is a terrible defender, and a stat says he’s the best defender in the world, the truth is probably somewhere in between. Scouts say things for a reason, and you shouldn’t dismiss that.

“If you come up with a new number, and somebody says they don’t like it, I don’t think it’s helpful to just keep pointing at it, over and over again. ‘Well, that’s the number.’ Every number a guy like me comes up with it, you have be skeptical of. You have to be extremely skeptical. That’s the quickest way to knowledge. If you don’t believe something, figure out if it’s true or not. It’s a basic scientific approach, to a certain extent. Falsifiable hypotheses, that sort of thing.

 

Tagged ,
Follow

Get every new post delivered to your Inbox.

Join 642 other followers

%d bloggers like this: