A phrase I learned from Aaron Clauset’s great colloquium on the non-ubiquity of scale-free networks. “Ocular regression” is the practice of squinting at the data until it looks linear.

## Ocular regression

**Tagged**linear regression

A phrase I learned from Aaron Clauset’s great colloquium on the non-ubiquity of scale-free networks. “Ocular regression” is the practice of squinting at the data until it looks linear.

The oral arguments in *Gill v. Whitford*, the Wisconsin gerrymandering case, are now a month behind us. But there’s a factual error in the state’s case, and I don’t want to let it be forgotten. Thanks to Mira Bernstein for pointing this issue out to me.

Misha Tseytlin, Wisconsin’s solicitor general, was one of two lawyers arguing that the state’s Republican-drawn legislative boundaries should be allowed to stand. Tseytlin argued that the metrics that flagged Wisconsin’s maps as drastically skewed in the GOP’s favor were unreliable:

And I think the easiest way to see this is to take a look at a chart that plaintiff’s own expert created, and that’s available on Supplemental Appendix 235. This is plain — plaintiff’s expert studied maps from 30 years, and he identified the 17 worst of the worst maps. What is so striking about that list of 17 is that 10 were neutral draws. There were court-drawn maps, commission-drawn maps, bipartisan drawn maps, including the immediately prior Wisconsin drawn map.

That’s a strong claim, which jumped out at me when I read the transcripts–10 of the 17 very worst maps, according to the metrics, were drawn by neutral parties! That really makes it sound like whatever those metrics are measuring, it’s not partisan gerrymandering.

But the claim isn’t true.

(To be clear, I believe Tseytlin made a mistake here, not a deliberate misrepresentation.)

The table he’s referring to is on p.55 of this paper by Simon Jackman, described as follows:

Of these, 17 plans are

utterly unambiguouswith respect to the sign of the efficiency gap estimates recorded over the life of the plan:

Let me unpack what Jackman’s saying here. These are the 17 maps where we can be *sure *the efficiency gap favored the same party, three elections in a row. You might ask: why wouldn’t we be sure about which side the map favors? Isn’t the efficiency gap something we can compute precisely? Not exactly. The basic efficiency gap formula assumes both parties are running candidates in every district. If there’s an uncontested race, you have to make your best estimate for what the candidate’s vote shares would have been if there *had* been candidates of both parties. So you have an estimate for the efficiency gap, but also some uncertainty. The more uncontested races, the more uncertain you are about the efficiency gap.

So the maps on this list aren’t the 17 “worst of the worst maps.” They’re not the ones with the highest efficiency gaps, not the ones most badly gerrymandered by any measure. They’re the ones in states with so few uncontested races that we can be essentially certain the efficiency gap favored the same party three years running.

Tseytlin’s argument is supposed to make you think that big efficiency gaps are as likely to come from neutral maps as partisan ones. But that’s not true. Maps drawn by Democratic legislatures have average efficiency gap favoring Democrats; those by GOP on average favor the GOP; neutral maps are in between, and have smaller efficiency gaps overall.

That’s from p.35 of another Jackman paper. Note the big change after 2010. It wasn’t *always* the case that partisan legislators automatically thumbed the scales strongly in their favor when drawing the maps. But these days, it kind of is. Is that because partisanship is worse now? Or because cheaper, faster computation makes it easier for one-party legislatures to do what they always would have done, if they could? I can’t say for sure.

Efficiency gap isn’t a perfect measure, and neither side in this case is arguing it should be the single or final arbiter of unconstitutional gerrymandering. But the idea that efficiency gap flags neutral maps as often as partisan maps is just wrong, and it shouldn’t have been part of the state’s argument before the court.

Data:

2006: 27

2007: 19

2008: 22

2009: 30

2010: 23

2011: 19

2012: 27

2013: 35

2014: 31

2015: 38

2016: 29

Don’t quite know what to make of this. I’m sort of surprised there’s so much variation! I’d have thought I’d have read less when my kids were infants, or when I was writing my own book, but it seems pretty random. I do see that I’ve been clearly reading more books the last few years than I did in 2012 and before.

Lists, as always, are here (2011 on) and here (2006-2010.)

Couldn’t find my phone yesterday morning. I definitely remembered having it in the car on the way home from the kids’ swim lesson, so I knew I hadn’t left it. “Find my iPhone” told me the phone was on the blacktop of the elementary school, about 1000 feet from my house. What? Why? Then a few minutes later the location updated to the driveway of a bank, closer to my house but in the other direction. So I went over to the bank and looked around in the driveway, even peering into the garbage shed and seeing if my phone was in their dumpster.

But why did I do that? It was terrible reason. There was no chain of events leaving my phone at the bank, or at the school, which wasn’t incredibly a prior unlikely. I should have reasoned: “The insistence of Find my iPhone that my phone is at the bank drastically increases the probability my phone is at the bank, but that probability started out so tiny that it remains tiny, and the highest-expected-utility use of my time is to keep looking around my house and my car until I find it.”

Anyway, it was in the basement.

Just ran across this hunk of data journalism from the Washington Post:

In a 100-friend scenario, the average white person has 91 white friends; one each of black, Latino, Asian, mixed race, and other races; and three friends of unknown race. The average black person, on the other hand, has 83 black friends, eight white friends, two Latino friends, zero Asian friends, three mixed race friends, one other race friend and four friends of unknown race.

Going back to Chris Rock’s point, the average black person’s friend network is eight percent white, but the average white person’s network is only one percent black. To put it another way: Blacks have ten times as many black friends as white friends. But white Americans have an astonishing 91 times as many white friends as black friends.

100 friends and only one black person! That’s pretty white!

It’s worth taking a look at the actual study they’re writing about. They didn’t ask people to list their top 100 friends. They said to list at most seven people, using this prompt:

From time to time, most people discuss important matters with other people. Looking back over the last six months – who are the people with whom you discussed matters important to you?

The white respondents only named 3.3 people on average, of whom 1.9 were immediate family members. So a better headline wouldn’t be “75% of white people have no black friends,” but “75% of whites are married to another white person, have two white parents, and have a white best friend, if they have a best friend” As for the quoted paragraph, it should read

In a 100-friend scenario, the average white person has 57 immediate family members.

Who knew?

(Note: I just noticed that Emily Swanson at Huffington Post made this point much earlier.)

As I mentioned, I’m reading Ph.D. admission files. Each file is read by two committee members and thus each file has two numerical scores.

How to put all this information together into a preliminary ranking?

The traditional way is to assign to each applicant their mean score. But there’s a problem: different raters have different scales. My 7 might be your 5.

You could just normalize the scores by subtracting that rater’s overall mean. But that’s problematic too. What if one rater actually happens to have looked at stronger files? Or even if not: what if the relation between rater A’s scale and rater B’s scale isn’t linear? Maybe, for instance, rater A gives everyone she doesn’t think should get in a 0, while rater A uses a range of low scores to express the same opinion, depending on just how unsuitable the candidate seems.

Here’s what I did last year. If (r,a,a’) is a triple with r is a rater and a and a’ are two applicants, such that r rated a higher than a’, you can think of that as a judgment that a is more admittable than a’. And you can put all those judgments from all the raters in a big bag, and then see if you can find a ranking of the applicants (or, if you like, a real-valued function f on the applicants) such that, for every judgment a > a’, we have f(a) > f(a’).

Of course, this might not be possible — two raters might disagree! Or there might be more complicated incompatibilities generated by multiple raters. Still, you can ask: what if I tried to minimize the number of “mistakes”, i.e. the number of judgments in your bag that your choice of ranking contradicts?

Well, you can ask that, but you may not get an answer, because that’s a highly non-convex minimization problem, and is as far as we know completely intractable.

But here’s a way out, or at least a way part of the way out — we can use a *convex relaxation*. Set it up this way. Let V be the space of real-valued functions on applicants. For each judgment j, let mistake_j(f) be the step function

mistake_j(f) = 1 if f(a) < f(a’) + 1

mistake_j(f) = 0 if f(a) >= f(a’) + 1

Then “minimize total number of mistakes” is the problem of minimizing

M = sum_j mistake_j(f)

over V. And M is terribly nonconvex. If you try to gradient-descend (e.g. start with a random ranking and then switch two adjacent applicants whenever doing so reduces the total number of mistakes) you are likely to get caught in a local minimum that’s far from optimal. (Or at least that *can* happen; whether this typically actually happens in practice, I haven’t checked!)

So here’s the move: replace mistake_j(f) with a function that’s “close enough,” but is convex. It acts as a sort of tractable proxy for the optimization you’re actually after. The customary choice here is the *hinge loss*:

hinge_j(f) = min(0, f(a)-f(a’) -1).

Then H := sum_j hinge_j(f) is a convex function on f, which you can easily minimize in Matlab or python. If you can actually find an f with H(f) = 0, you’ve found a ranking which agrees with every judgment in your bag. Usually you can’t, but that’s OK! You’ve very quickly found a function H which does a decent job aggregating the committee scores. and which you can use as your starting point.

Now here’s a paper by Nihal Shah and Martin Wainwright commenter Dustin Mixon linked in my last ranking post. It suggests doing something much simpler: using a *linear* function as a proxy for mistake_j. What this amounts to is: score each applicant by the number of times they were placed above another applicant. Should I be doing this instead? My first instinct is no. It looks like Shah and Wainwright assume that each pair of applicants is equally likely to be compared; I think I don’t want to assume that, and I think (but correct me if I’m wrong!) the optimality they get may not be robust to that?

Anyway, all thoughts on this question — or suggestions as to something *totally different* I could be doing — welcome, of course.

This, from the *New York Times Book Review*, bugged me:

There are 33 percent more such women in their 20s than men. To help us see what a big difference 33 percent is, Birger invites us to imagine a late-night dorm room hangout that’s drawing to an end, and everyone wants to hook up. “Now imagine,” he writes, that in this dorm room, “there are three women and two men.”

It’s not so bad that the reviewer was confused about percentages; it’s that she *went out of her way* to explain what the percentage meant, and said something totally wrong.

I figured the mistake was probably inherited from the book under review, so I checked on Google Books, and nope; the book uses the example, but *correctly*, as an example of how to visualize a population with 50% more women than men!

One chapter of *How Not To Be Wrong*, called “More Pie Than Plate” (excerpted in Slate here) is about the perils you are subject to when you talk about percentages of numbers (like “net new jobs”) that may be negative.

Various people, since the book came out, have complained that *How Not To Be Wrong *is a leftist tract, intended to smear Republicans as being bad at math. I do not in fact think Republicans are bad at math and it sort of depresses me to feel my book reads that way to those people. What’s true is that, in “More Pie Than Plate,” I tear down an old Mitt Romney ad and a Scott Walker press release. But the example I lead with is a claim almost always put forward by liberal types: that the whole of the post-recession rebound has accrued to the 1%. Not really true!

Long intro to this: I get to polish my “calling out liberal claims” cred by objecting to this, from the Milwaukee Journal-Sentinel:

UW-Madison, the fourth-largest academic research institution in the country with $1.1 billion of annual research spending, has helped spur strong job growth in surrounding Dane County. In fact, employment gains there during the last 10 years far outstrip those in any other Wisconsin county, accounting for more than half of the state’s 36,941 net new private-sector jobs.

I’m pro-UW and pro-Dane County, obviously, but *people need to stop reporting percentages of net job gains.* What’s more — the reason job gains here outstrip other counties is that it’s the second-biggest county in the state, with a half-million people. Credit to the Journal-Sentinel; at least they included a table, so you can see for yourself that lots of other counties experienced healthy job growth over the decade.

But just as I was ready to placate my conservative critics, Rick Perry went to Iowa and said:

“In the last 14 years, Texas has created almost one-third of all the new jobs in America.”

Dane County and Rick Perry, you *both* have to stop reporting percentages of net job gains.

Michael Harris — who is now blogging! — points out that Montaigne very crisply got to the point I make in *How Not To Be Wrong* about survivorship bias, Abraham Wald, and the missing bullet holes:

Here, for example, is how Montaigne explains the errors in reasoning that lead people to believe in the accuracy of divinations: “That explains the reply made by Diagoras, surnamed the Atheist, when he was in Samothrace: he was shown many vows and votive portraits from those who have survived shipwrecks and was then asked, ‘You, there, who think that the gods are indifferent to human affairs, what have you to say about so many men saved by their grace?’— ‘It is like this’, he replied, ‘there are no portraits here of those who stayed and drowned—and they are more numerous!’ ”

The quote is from Jon Elster, *Reason and Rationality*, p.26.

From Maria Konnikova’s New Yorker piece on Randall Munroe and what makes science interesting:

In a meta-analysis of sixty-six studies tracking interests over time (the average study followed subjects for seven years), psychologists from the University of Illinois at Urbana–Champaign found that our interests in adolescence had only a point-five correlation with our interests later in life. This means that if a subject filled out a questionnaire about her interests at the age of, say, thirteen, and again at the age of twenty-one, only half of her answers remained consistent on both.

I think it’s totally OK to not say precisely what correlation means. It’s sort of subtle! It would be fine to say the correlation was “moderate,” or something like that.

But I don’t think it’s OK to say “This means that…” and then say something which isn’t what it means. If the questionnaire was a series of yes-or-no questions, and if exactly half the answers stayed the same between age 13 and 21, the correlation would be zero. As it should be — 50% agreement is what you’d expect if the two questionnaires had nothing to do with each other. If the questionnaire was of a different kind, say, “rate your interest in the following subjects on a scale of 1 to 5,” then agreement on 50% of the answers would be more suggestive of a positive relationship; but it wouldn’t in any sense be the same thing as 0.5 correlation. What does the number 0.5 add to the meaning of the piece? What does the explanation add? I think nothing, and I think both should have been taken out.

Credit, though — the piece does include a link to the original study, a practice that is sadly not universal! But demerit — the piece is behind a paywall, leaving most readers just as unable as before to figure out what the study actually measured. If you’re a journal, is the cost of depaywalling one article really so great that it’s worth forgoing thousands of New Yorker readers actually looking at your science?

%d bloggers like this: