Monthly Archives: February 2012

Reader survey: do you cut, butt, or budge in line?

Waiting for the bus this morning, CJ told another kid not to budge in line.  “You mean butt in line,” I said.  “DADDY,” CJ said, giggling, “you are being silly.”  “No, seriously,” I said, “it’s butt, not budge.”  So we asked the other kids in line, and all agreed — when you force your way into a line, you are “budging.”

I researched this, and indeed — “budge in line” is Wisconsin / Minnesota dialect.  (It’s also apparently common in Western Canada, for some reason.)  This was news to me.  Tanya reports hearing “ditch in line” as a kid, which is apparently some kind of Ohio thing.

So:  do you cut, butt, or budge?  And where are you from?

(Subsidiary question:  is there a poll site, a la surveymonkey, that will allow me to set this up as an online poll, ask respondents for the zip code of their home town, and then plot the answers on the map?)

Raw polling data as playground

This is a picture of the American electorate!

More precisely; this is a scatterplot I just made using the dataset recently released by PPP, a major political polling firm.  (They’re the outfit that did the “is your state hot or not” poll I blogged about last week.)  PPP has made available the raw responses from 46 polls with 1000 responses each, conducted more or less weekly over the course of 2011.  Here’s the whole thing as a .zip file.

Analyzing data sets like this is in some sense not hard.  But there’s a learning curve.  Little things, like:  you have to know that the .csv format is beautifully portable and universal — it’s the ASCII of data.  You have to know how to get your .csv file into your math package of choice (in my case, python, but I think I could easily have done this in r or MatLab as well) and you have to know where to get a PCA package, if it’s not already installed.  And you have to know how to output a new .csv file and make a graphic from it when you’re done.  (As you can see, I haven’t quite mastered this last part, and have presented you with a cruddy Excel scatterplot.)  In total, this probably took me about three hours to do, and now that I have a data-to-picture path I understand how to use, I think I could do it again in about 30 minutes.  It’s fun and I highly recommend it.  There’s a lot of data out there.

So what is this picture?  The scatterplot has 1000 points, one for each person polled in the December 15, 2011 PPP survey.  The respondents answered a bunch of questions, mostly about politics:

Q1: Do you have a favorable or unfavorable opinion of Barack Obama?
Q2: Do you approve or disapprove of Barack Obama’s job performance?
Q3: Do you think Barack Obama is too liberal, too conservative, or about right?
Q4: Do you approve or disapprove of the job Harry Reid is doing?
Q5: Do you approve or disapprove of the job Mitch McConnell is doing?
Q6: Do you have a favorable or unfavorable opinion of the Democratic Party?
Q7: Do you have a favorable or unfavorable opinion of the Republican Party?
Q8: Generally speaking, if there was an election today, would you vote to reelect Barack Obama, or would you vote for his Republican opponent?
Q9: Are you very excited, somewhat excited, or not at all excited about voting in the 2012 elections?
Q10: If passed into law one version of immigration reform that people have discussed would secure the border and crack down on employers who hire illegal immigrants. It would also require illegal immigrants to register for legal immigration status, pay back taxes, and learn English in order to be eligible for U.S. citizenship. Do you favor or oppose Congress passing this version of immigration reform?
Q11: Have you heard about the $10,000 bet Mitt Romney challenged Rick Perry to in last week’s Republican Presidential debate?
Q12: (Asked only of those who say ‘yes’ to Q11:) Did Romney‚Äôs bet make you more or less likely to vote for him next year, or did it not make a difference either way?
Q13: Do you believe that there’s a “War on Christmas” or not?
Q14: Do you consider yourself to be a liberal, moderate, or conservative?
Q15: Do you consider yourself to be a supporter of the Tea Party or not?
Q16: Are you or is anyone in your household a member of a labor union?
Q17: If you are a woman, press 1. If a man, press 2.
Q18: If you are a Democrat, press 1. If a Republican, press 2. If you are an independent or a member of another party, press 3.
Q19: If you are Hispanic, press 1. If white, press 2. If African American, press 3. If Asian, press 4. If you are an American Indian, press 5. If other, press 6.
Q20: (Asked only of people who say American Indian on Q19:) Are you enrolled in a federally recognized tribe?
Q21: If you are 18 to 29 years old, press 1. If 30 to 45, press 2. If 46 to 65, press 3. If you are older than 65, press 4.
Q22: What part of the country do you live in NOW – the Northeast, the Midwest, the South, or the West?
Q23: What is your household’s annual income?

The answers to these questions, which are coded as integers, now give us 1000 points in R^{23}.  Our eyes are not good at looking at point clouds in 23-dimensional space.  So it’s useful to project down to R^2, that mos bloggable of Euclidean spaces.  But how?  We could just look at two coordinates and see what we get.  But this requires careful choice.  Suppose I map the voters onto the plane via their answers to Q1 and Q2.  The problem is, almost everyone who has a favorable opinion of Barack Obama approves of his job performance, and vice versa.  Considering these two features is hardly better than considering only one feature.  Better would be to look at Q8 and Q21; these two variables are surely less correlated, and studying both together would give us good information on how support for Obama varies with age.  But still, we’re throwing out a lot.  Principal component analysis is a very popular quick-n-dirty method of dimension reduction; it finds the projection onto R^2 (or a Euclidean space of any desired dimension) which best captures the variance in the original dataset.  In particular, the two axes in the PCA projection have correlation zero with each other.

A projection from R^23 to R^2 can be expressed by two vectors, each one of which is some linear combination of the original 23 variables.  The hope is always that, when you stare at the entries of these vectors, the corresponding axis has some “meaning” that jumps out at you.  And that’s just what happens here.

The horizontal axis is “left vs. right.”  It assigns positive weight to approving of Obama, identifying as a liberal, and approving of the Democratic Party, and negative weight to supporting the Tea Party and believing in a “War on Christmas.”  It would be very weird if any analysis of this kind of polling data didn’t pull out political affiliation as the dominant determinant of poll answers.

The second axis is “low-information voter vs. high-information voter,” I think.  It assigns a negative value to all answers of the form “don’t know / won’t answer,” and positive value to saying you are “very excited to vote” and having heard about Mitt Romney’s $10,000 bet.  (Remember that?)

And now the picture already tells you something interesting.  These two variables are uncorrelated, by definition, but they are not unrelated.  The voters split roughly into two clusters, the Democrats and the Republicans.  But the plot is “heart-shaped” — the farther you go into the low-information voters, the less polarization there is between the two parties, until in the lower third of the graph it is hard to tell there are two parties at all.  This phenomenon is not surprising — but I think it’s pretty cool that it pops right out of a completely automatic process.

(I am less sure about the third-strongest axis, which I didn’t include in the plot.  High scorers here, like low scorers on axis 2, tend to give a lot of “don’t know” answers, except when asked about Harry Reid and Mitch McConnell, whom they dislike.  They are more likely to say they’re “not at all excited to vote” and more likely to be independents.  So I think one might call this the “to hell with all those crooks” axis.)

A few technical notes:  I removed questions, like “region of residence,” that didn’t really map on a linear scale, and others, like “income,” that not everyone answered.  I normalized all the columns to have equal variance.  I made new 0-1-valued columns to record “don’t know” answers.  Yes, I know that many people consider it bad news to run PCA on binary variables, but I decided that since I was just trying to draw pictures and not infer anything, it would be OK.

Tagged , , , , , , , ,

Show report: Fatty Acids and Sat Nite Duets

Happy surprise: the Fatty Acids were good when I saw them before but great tonight — powerpop played so loud the amps start to get that blown-out sound, and then this very beautiful clean line of trumpet cutting through it all.

Notes on Sat. Nite Duets:

1.  They have arrived at the level of success where one row of people at the front knows the words to all the songs.

2.  Their songs are what you would call “anthemic” but the culminating shouted lyric is often something intentionally without intrinsic affect.  Examples include:


3.   Relationship betw. Sat. Nite Duets and Pavement still not very clear to me.  None of their songs could be mistaken for a Pavement song.  But I think the way the songs are put together — not exactly the way the songs sound, but the way the songs make decisions — is more Pavement than anybody else who’s not Pavement.  Drums also similar.  In fact both Fatty Acids and SND sync drums to guitar in the manner of “Lions (Linden).”

4.  I don’t usually link to the same video twice, but “All Nite Long” is still the best song anyone has recorded in the young decade.  So here it is again.

Sat. Nite Duets’ new album available imminently.

Tagged ,

Sat. Nite Duets at the Rathskeller tonight

The best band in Wisconsin plays a free campus show tonight; 9:30, with Fatty Acids opening.  AV Club reviews new album.  Previous Sat. Nite Duets coverage on Quomodocumque.


America has spoken: Wisconsin is better than Minnesota or Illinois

So says an immensely enjoyable PPP poll, which collected approval/disapproval numbers for all 50 states.   Thanks to Steve Burt for pointing this out to me.  Here’s the full data set with crosstabs.  Great stuff here!  Young people are much more anti-Florida than is the nation as a whole.  Nevada is rejected by “very liberals” and “very conservatives” but applauded by the middle.  Everyone, whatever their politics, slightly dislikes New Jersey.

Seems an appropriate time to listen to John Linnell’s “Oregon (Is Bad)” from the ultra-classic State Songs LP.  Though Americans in fact believe that Oregon is good, by a margin of 43-14.


Tagged ,

The S&P 500 and the Dow Jones

Adam Davidson in last Sunday’s New York Times Magazine, on the drawbacks of the Dow:

None of these criticisms will come as news to finance professionals, most of whom use far more precise measures — like the S&P 500 or the Wilshire 5000, which cover more companies more precisely — when making investment decisions.

The S&P 500 certainly covers a wider range of companies.  But in a typical 5-day window its correlation with the Dow is more than 95%.  How much more precise could it be?

It’s good to have fine-grained measures, but it’s also good to know at what point extra granularity stops addding new content.


Tagged , , , ,

Why aren’t there people named Cobbler?

OK, per Google and Facebook there are some.  But very, very few.  Why?  Were there that many more wheelers, coopers, sawyers, and carters than there were cobblers?


(Maybe helpful:  the Census Bureau’s list of the 89,000 or so most common US surnames.)


Hwang and To on injectivity radius and gonality, and “Typical curves are not typical.”

Interesting new paper in the American Journal of Mathematics, not on arXiv unfortunately.  An old theorem of Li and Yau shows how to lower-bound the gonality of a Riemann surface in terms of the spectral gap on its Laplacian; this (together with new theorems by many people on superstrong approximation for thin groups) is what Chris Hall, Emmanuel Kowalski, and I used to give lower bounds on gonalities in various families of covers of a fixed base.

The new paper gives a lower bound for the gonality of a compact Riemann surface in terms of the injectivity radius, which is half the length of the shortest closed geodesic loop.  You could think of it like this — they show that the low-gonality loci in M_g stay very close to the boundary.

“The middle” of M_g is a mysterious place.  A “typical” curve of genus g has a big spectral gap, gonality on order g/2, a big injectivity radius…  but most curves you can write down are just the opposite.

Typical curves are not typical.

When g is large, M_g is general type, and so the generic curve doesn’t move in a rational family.  Are all the rational families near the boundary?  Gaby Farkas explained to me on Math Overflow how to construct a rationally parametrized family of genus-g curves whose gonality is generic, as a pencil of curves on a K3 surface.  I wonder how “typical” these curves are?  Do some have large injectivity radius?  Or a large spectral gap?



Tagged , ,

The hardest Rush Hour position

It takes 93 moves to solve, per this paper by Collette, Raskin, and Servais.  I tried it and got nowhere.

You can think of the space of all possible configurations of vehicles as, well, a configuration space, not unlike the configuration spaces of disks in a box.  But here there is a bit less topology; the space is just a graph, with two configurations made adjacent if one can be reached from the other by making a single move.  The connected component of configuration space containing the “hardest case” shown here has 24,132 vertices.






I wonder what this graph looks like?   What does the path of the cars look like as you traverse the 93-step path; do most of the cars traverse most of their range?  How many of the possible configurations of the 13 vehicles (constrained to stay in the given rows and columns, and in the same linear order when two share a row or column) are actually contained in this component?  Maybe Matt Kahle knows.  By the way, another Matt Kahle-like fact is that among the list of the hardest configurations are some which are not so dense at all, like this one with only 9 cars.  It looks like it should be easy, but apparently it takes 83 moves to solve!


Tagged , , , , , ,

Elsevier’s sorry defense of the Research Works Act

Lynne Herndon, former president of Elsevier imprint Cell Press, writes to the Boston Globe to protest Gareth Cook’s editorial, “Why scientists are boycotting a publisher.”  Herndon writes:

If the intent is to make the fruits of government-funded research available to taxpayers – a fair and laudable goal – government agencies could simply publish the annual progress reports from scientists that they already require. But instead they see value in the publishing process, and claim our contributions as their own without paying for them.

Herndon is presumably counting on the fact that most readers of the Globe have never submitted a federally required annual progress report.  The progress report is not the research; it is a terse summary of the research.

What taxpayers want and deserve access to is the actual research they paid for — research which is produced and written by federally funded scientists, not by Elsevier.

How about this:  the NSF and NIH can start requiring that we include copies of all our papers, as submitted, in our progress reports, and then these can become open-access.  Then people can decide for themselves whether they want to pay Elsevier to look at my papers (and enjoy whatever value Elsevier has added) or whether they’d rather freely download the identical LaTeX version in my progress report.

Well, actually, in my case, they can’t, because I have signed the petition pledging not to submit to Elsevier journals.  And if their behavior irritates you as it does me, you might consider doing the same.

%d bloggers like this: