## Which pictures do children draw with straight lines?

Edray Goins gave a great colloquium today about his work on dessins d’enfants.  And in this talk there was a picture that surprised me.  It was one of the ones on the lower right of this poster.  Here, I’ll put in a screen shot:

Let me tell you what you’re looking at.  You are looking for elliptic curves E admitting a Belyi map f: E -> P^1, which is to say a map ramified only over 0,1, and infinity.  For each such map, the blue graph is f^{-1}([0,1]), the preimage of the line segment joining o and 1 in P^1(R).

In four of these cases, the graph is piecewise linear!  I didn’t know there were examples like this.  Don’t know if this is easy, but:  for which Belyi maps (of any genus, not just genus 1) is f^{-1}([0,1]) a union of geodesics?

Tagged ,

## “The Great Ph.D. Scam” (or: Academy Plight Song)

Thanks to the Wayback Machine, here’s my piece from the Boston Phoenix on the MLA, the first feature piece I ever wrote for publication, twenty-one years ago last month.

Who knows if the Wayback Machine is forever?  Just in case, I’m including the text of the piece here.

The Phoenix gave this piece its title, which I think is too fighty.  My title was “Academy Plight Song.”  (Get it?)

I think this holds up pretty well!  (Except if I were writing this today I wouldn’t attach so much physical description to every woman with a speaking part.)

Melani McAlister, the new hire at GWU who appears in the opening scene, is still there as a tenured professor in 2018.  And all these years later, she’s still interested in helping fledgling academics navigate the world of scholarly work; her page “Thinking Twice about Grad School” is thorough, honest, humane, and just great.

Here’s the piece!

The great PhD scam
by Jordan Ellenberg

“We dangle our three magic letters before the eyes of these predestined victims, and they swarm to us like moths to an electric light. They come at a time of life when failure can no longer be repaired easily and when the wounds it leaves are permanent . . . ”
— William James
“The Ph.D. Octopus,” 1903

By nine o’clock, more than 200 would-be professors have piled into the Cotillion Ballroom South at the Sheraton Washington hotel, filling every seat and spilling over into the standing space behind the chairs. They’re young and old, dressed up and down, black and white and other (though mostly white). They’re here to watch Melani McAlister, a 1996 PhD in American Civilization from Brown, explain to a committee of five tenured professors why she ought to be hired at Indiana University.

Everybody looks nervous except McAlister. That’s because, unlike almost everyone else here, she doesn’t need a job; she’s an assistant professor at George Washington University. This interview is a mock-up, a performance put on to inform and reassure the crowd of job-seekers. As McAlister cleanly fields questions about her thesis and her pedagogical strategy, the people in the audience frown and nod, as if mentally rehearsing their own answers to the similar questions they’ll be asked in days to come.

This is night one of the 112th annual meeting of the Modern Language Association, the national organization of professors of English, comparative literature, and living foreign languages. Ten thousand scholars are here in Washington, DC, to attend panels, renew acquaintances, and, most important, to fill open faculty positions. A tenure-track job typically attracts hundreds of applicants; of these, perhaps a dozen will be offered interviews at the MLA; and from that set a handful will be called back for on-campus interviews. For the people who are here “on the market,” that is, trying to become professors of English and so forth, the MLA is the gate to heaven. And, as everyone in the room is aware, the gate is swinging shut.

## Fagone on Cash WinFall

Great, thoroughly reported piece by Jason Fagone (author of The Woman Who Smashed Codes, which everybody says is great) about the Cash WinFall Massachusetts lottery story, which I wrote about at length in How Not To Be Wrong.  My chapter focused on a group of MIT students who became high-volume bettors; Fagone spends more time with Michigan retiree Jerry Selbee, and gets lots of information on the story I wasn’t able to uncover.  That’s why it’s good to have an actual journalist cover these stories!

## A Supposedly Fun Thing (a book review)

I wrote a review of David Foster Wallace’s book A Supposedly Fun Thing I’ll Never Do Again in 1997 for the late great Boston Phoenix, whose archives don’t seem to be online anymore.  (SOB)

But I have a pdf copy, so here it is, for my own reference, and yours if for some reason you need it!

I should have anticipated this and downloaded all my Phoenix stuff. The first pieces I ever reported were there, a short one about a Michael Moore rally and a long one about the MLA. They’re gone. But wait! I was able to recover the MLA piece from the WayBack Machine.  Thanks, WayBack Machine!  I’ll post that later.

## Suriya Gunasekar, optimization geometry, loss minimization as dynamical system

Awesome SILO seminar this week by Suriya Gunasekar of TTI Chicago.  Here’s the idea, as I understand it.  In a classical optimization problem, like linear regression, you are trying to solve a problem which typically has no solution (draw a line that passes through every point in this cloud!) and the challenge is to find the best approximate solution.  Algebraically speaking:  you might be asked to solve

$Ax = b$

for x; but since x may not be in the image of the linear transformation A, you settle for minimizing

$||Ax-b||$

in whatever norm you like (L^2 for standard linear regression.)

In many modern optimization problems, on the other hand, the problem you’re trying to solve may have a lot more degrees of freedom.  Maybe you’re setting up an RNN with lots and lots and lots of parameters.  Or maybe, to bring this down to earth, you’re trying to pass a curve through lots of points but the curve is allowed to have very high degree.  This has the advantage that you can definitely find a curve that passes through all the points.  But it also has the disadvantage that you can definitely find a curve that passes through all the points.  You are likely to overfit!  Your wildly wiggly curve, engineered to exactly fit the data you trained on, is unlikely to generalize well to future data.

Everybody knows about this problem, everybody knows to worry about it.  But here’s the thing.  A lot of modern problems are of this form, and yet the optima we find on training data often do generalize pretty well to test data!  Why?

Make this more formal.  Let’s say for the sake of argument you’re trying to learn a real-valued function F, which you hypothesize is drawn from some giant space X.  (Not necessarily a vector space, just any old space.)  You have N training pairs (x_i, y_i), and a good choice for F might be one such that F(x_i) = y_i.  So you might try to find F such that

$F(x_i) = y_i$

for all i.  But if X is big enough, there will be a whole space of functions F which do the trick!  The solution set to

$F(\mathbf{x}) = \mathbf{y}$

will be some big subspace F_{x,y} of X.  How do you know which of these F’s to pick?

One popular way is to regularize; you decide that some elements of X are just better than others, and choose the point of F_{x,y} that optimizes that objective.  For instance, if you’re curve-fitting, you might try to find, among those curves passing through your N points, the least wiggly one (e.g. the one with the least total curvature.)  Or you might optimize for some combination of hitting the points and non-wiggliness, arriving at a compromise curve that wiggles only mildly and still passes near most of the points.  (The ultimate version of this strategy would be to retreat all the way back to linear regression.)

But it’s not obvious what regularization objective to choose, and maybe trying to optimize that objective is yet another hard computational problem, and so on and so on.  What’s really surprising is that something much simpler often works pretty well.  Namely:  how would you find F such that F(x) = y in the first place?  You would choose some random F in X, then do some version of gradient descent.  Find the direction in the tangent space to X at F that decreases $||F(\mathbf{x})-\mathbf{y}||$ most steeply, perturb F a bit in that direction, lather, rinse, repeat.

If this process converges, it ought to get you somewhere on the solution space F_{x,y}. But where?  And this is really what Gunasekar’s work is about.  Even if your starting F is distributed broadly, the distribution of the spot where gradient descent “lands” on F_{x,y} can be much more sharply focused.  In some cases, it’s concentrated on a single point!  The “likely targets of gradient descent” seem to generalize better to test data, and in some cases Gunasekar et al can prove gradient descent likes to find the points on F_{x,y} which optimize some regularizer.

I was really struck by this outlook.  I have tended to think of function learning as a problem of optimization; how can you effectively minimize the training loss ||F(x)  – y||?  But Gunasekar asks us instead to think about the much richer mathematical structure of the dynamical system of gradient descent on X guided by the loss function.  (Or I should say dynamical systems; gradient descent comes in many flavors.)

The dynamical system has a lot more stuff in it!  Think about iterating a function; knowing the fixed points is one thing, but knowing which fixed points are stable and which aren’t, and knowing which stable points have big basins of attraction, tells you way more.

What’s more, the dynamical system formulation is much more natural for learning problems as they are so often encountered in life, with streaming rather than static training data.  If you are constantly observing more pairs (x_i,y_i), you don’t want to have to start over every second and optimize a new loss function!  But if you take the primary object of study to be, not the loss function, but the dynamical system on the hypothesis space X, new data is no problem; your gradient is just a longer and longer sum with each timestep (or you exponentially deweight the older data, whatever you want my friend, the world is yours.)

Anyway.  Loved this talk.  Maybe this dynamical framework is the way other people are already accustomed to think of it but it was news to me.

Slides for a talk of Gunasekar’s similar to the one she gave here

“Characterizing Implicit Bias in terms of Optimization Geometry” (2018)

“Convergence of Gradient Descent on Separable Data” (2018)

A little googling for gradient descent and dynamical systems shows me that, unsurprisingly, Ben Recht is on this train.

## Scott Walker and the noncommutativity of Wisconsin statute, part II

Hey so remember last month, when the Walker administration didn’t want to fill two empty legislative seats, so they decided to treat the state law forbidding this as if it said something else?

Here, I’ll recap.  The law, statute 8.50 (4) (d), says:

Any vacancy in the office of state senator or representative to the assembly occurring before the 2nd Tuesday in May in the year in which a regular election is held to fill that seat shall be filled as promptly as possible by special election.

The state has decided to pretend the law says, instead:

Any vacancy in the office of state senator or representative to the assembly occurring in the year in which a regular election is held to fill that seat, before the 2nd Tuesday in May shall be filled as promptly as possible by special election.

In other words, the state’s claim is that a special election is required only if the vacancy occurs between January 1 and the 2nd Tuesday of May in an election year.  Whereas what the actual law says is that an election is to be called if there’s a vacancy any time before that 2nd Tuesday in May, i.e. as long as there’s enough time to call an election and have the new officeholder participate meaningfully in legislating.

Six voters in the affected districts have sued the governor.  There’s a hearing in the Dane County Circuit Court this week, on March 22.

The state has issued its response to the petition.

I’ve read the response.  It upset me.  It really upset me!  Not because I even care that much about whether we hold these elections!  But because the people whose job it is to uphold our state’s laws don’t care what those laws are.

The state’s leading argument is “mootness,” which goes like this: “we’ve now delayed this long enough that voters would not longer get any meaningful benefit from the state fulfilling the law’s requirements, so the claim that we have to fulfill the law’s requirements doesn’t stand.”

That might work!

Then it gets really interesting.  Here’s a passage from the response:

Under Wis. Stat. §8.50(4)(d), the Governor has a positive and plain duty to call a special election only when a vacancy occurs in the year of a general election from January 1 until the 2nd Tuesday in May.  Because the vacancies here did not occur in that year, Governor Walker has no positive and plain duties to call special elections.

See what they did?  They switched it!  They switched the order of the clauses in the statute to make it say what it does not, in fact, say!  Not satisfied with that, they added the language about January 1, which isn’t present in the law!

Won’t the judge ask them about this?  Won’t the judge want to know what possessed the state to “paraphrase” a law by moving words around and adding language, instead of quoting the language of the statute itself?

The response then goes on to explain why their interpretation of the law “makes sense.”  What they in fact do is explain why it makes sense that a special election isn’t required for vacancies taking place after May of the election year (the point on which their claim agrees with the law).  They are silent on why it makes sense that a special election isn’t required before January 1 of the election year.  Because that doesn’t make sense.

Maybe the screwiest part of all of this is that the statute in question uses language that appears again and again in Wisconsin code.  Look, here’s how 59.10(3)(e) authorizes special elections for vacancies on county boards:

The board may, if a vacancy occurs before June 1 in the year preceding expiration of the term of office, order a special election to fill the vacancy.

According to the state’s account, this means that special elections are authorized only if the vacancy occurs in the year preceding the election year.

If that’s the case, nobody told Sauk County, where a special election was ordered in August 2016 to fill a vacant seat on the county board.  It’s hard to doubt there are many such examples — all unauthorized by state law, according to the Walker administration’s current claim.

How could Brad Schimel have put his name to this?

(Update:  here’s the plaintiffs’ response to the state’s response.)

Tagged , , ,

## Bike/ski weekend

Last week, for the first time in my life, I bought a new bike.  For the last twelve years I’ve been riding a Trek hybrid I bought used when I moved here.  Before that, from about 1992 through 2005, I was on my mom’s 1967 Schwinn Breeze, which looked exactly like this one.

Anyway:  I got a new bike.  I got CJ one too.  Then AB was upset but she doesn’t get a new bike because she is growing very very fast and probably won’t be able to sit on the next bike she gets for more than a couple of years.  So we went to Dreambikes and got her a new used bike, knobby tires, shocks on the front fork, very cool.  The three of us took a spin around Wingra yesterday, about 7 miles, which is AB’s record for a non-stop ride.

Today was the last day of the season at Cascade Mountain, and my kids for the first time in many months and no activities scheduled, and the high for today was 55 degrees, and who doesn’t like to ski in shirtsleeves?  So off we went.  We were worried it would be packed.  But it was empty.  I guess everyone else in Wisconsin was using the first warm day of pre-spring to do outdoor activities not involving ice and snow.  But those who were there were festive.  There were a lot of guys in flannel shirts open with bare chest underneath; is that a look?  Several people in tutus.  A guy who played the guitar while skiing down the mountain.  A skiing Pikachu.

By mid-afternoon it was like skiing on a snowcone.  Huge puddles in the lift line.  But we had a great time.  If I were particular about the quality of my skiiing I wouldn’t be skiing in Wisconsin, would I?

Tagged , ,

## Scott Walker and the Let’s Eat Grandma theory of legislative interpretation

How do you know when to call a special election for an empty legislative seat in Wisconsin?  It’s right there in the statutes, 8.50 (4) (d):

Any vacancy in the office of state senator or representative to the assembly occurring before the 2nd Tuesday in May in the year in which a regular election is held to fill that seat shall be filled as promptly as possible by special election. However, any vacancy in the office of state senator or representative to the assembly occurring after the close of the last regular floorperiod of the legislature held during his or her term shall be filled only if a special session or extraordinary floorperiod of the legislature is called or a veto review period is scheduled during the remainder of the term. The special election to fill the vacancy shall be ordered, if possible, so the new member may participate in the special session or floorperiod.

Pretty clear, right?  If a Senate or Assembly seat comes open before May of election year,  the governor has to call a special election, unless the last legislative session has already taken place and no extra legislative business is scheduled before November.  You hold an election unless the duration of the vacancy would be so short as to make the election essentially meaningless.

There are two seats in the Capitol open as we speak, the Senate seat formerly held by Frank Lasee and the Assembly seat once occupied Keith Ripp; both of them left to take jobs in the Walker administration in January.  But the governor has asserted that no special election will be held, and residents of those districts will go unrepresented in the legislature for almost a full year.

What’s Walker’s excuse for ignoring the law?  Are you sitting down?  The state’s claim is that the phrase “in the year” does not refer to “May,” but rather “any vacancy.”  So a vacancy arising in March 2018 is required by law to be filled “as promptly as possible” by state law, despite the severely limited amount of lawmaking the new representative would be have a chance to undertake; but if an assembly rep drops dead on the second day of the legislative term, the governor can leave the seat empty for two whole years if he wants.

I kid you not! That is the claim!

Do you think that’s really what the law says?

As this long, well-researched WisContext article makes clear, Walker’s “interpretation” of the law is, well, a novelty.  For fifty years, Wisconsin has been filling legislative vacancies promptly by special elections.  Most of these elections, according to Scott Walker, were optional, some kind of gubernatorial whim.  And it’s definitely not the case that the governor is leaving the seats empty because he’s spooked by the current lust-to-vote of Wisconsin’s Democratic electorate, which has already cost Republicans a long-held seat in Senate District 10.

The Walker administration would like us to read the law as if the phrases came in the opposite order:

Any vacancy in the office of state senator or representative to the assembly occurring in the year in which a regular election is held to fill that seat, before the 2nd Tuesday in May

But English is non-commutative; that sentence says one thing, and 8.50 (4)(d) says a different thing.

Even an extra comma would make Walker’s interpretation reasonable:

Any vacancy in the office of state senator or representative to the assembly occurring before the 2nd Tuesday in May, in the year in which a regular election is held to fill that seat

Commas change meaning.  As the old T-shirt says:  let’s eat grandma!

I suppose we should count ourselves lucky.  Given the syntactic latitude Walker has granted himself, where a prepositional phrase can wander freely throughout a sentence modifying whatever catches its fancy, he might have claimed a special selection is required only if a legislative vacancy occurs in May of an election year!  That would make just as much sense as the interpretation Walker’s claiming now.  Which is to say:  none.

What’s the remedy here?  I’m not sure there is one.  Someone in one of the affected districts could sue the state, but I don’t think there’s any prospect a lawsuit would conclude in time to make any difference.  I can’t see a court ordering an emergency halt to a legislative session on the grounds that two seats were illegally unfilled.

So there’s not much to stop the governor from breaking state law in this way.  Except natural human embarrassment.  A government that has lost the capacity to be embarrassed can be very difficult to constrain.

Update, Feb 26:  Looks like I was wrong to say nobody was going to do anything about this!  A group of voters in the affected districts, represented by the National Democratic Redistricting Committee, sued Governor Walker today.  Good for them.

Update:  I’ve learned from lawyer friends that the principle that a phrase like “in the year” is understood to modify the thing it’s close to, not some other clause floating elsewhere across the sentence, has a name:  it is “the rule of the last antecedent.”

Tagged , , , ,

## David English Revisited

I never realized that David English of Somerville MA, besides being a prolific writer of letters to the editor, was a weirdo artist of the 1950s!