]]>

Here’s an equivalent formulation: If G is a graph and V(G) its vertex set, I try to find a function f: V(G) -> R^d, for some d, such that

|f(x) – f(y)| = 1 whenever x and y are adjacent.

This is called a *unit distance embedding*, for obvious reasons.

The *hypersphere* number t(G) of the graph is the radius of the smallest sphere containing a unit distance embedding of G. Computing t(G) is equivalent to computing the Lovász number, but let’s not worry about that now. I want to generalize it a bit. We say a finite sequence (t_1, t_2, t_3, … ,t_d) is *big enough* for G if there’s a unit-distance embedding of G contained in an ellipsoid with major radii t_1^{1/2}, t_2^{1/2}, .. t_d^{1/2}. (We could also just consider infinite sequences with all but finitely many terms nonzero, that would be a little cleaner.)

Physically I think of it like this: the graph is trying to fold itself into Euclidean space and fit into a small region, with the constraint that the edges are rigid and have to stay length 1.

Sometimes it can fold a lot! Like if it’s bipartite. Then the graph can totally fold itself down to a line segment of length 1, with all the black vertices going to one end and the white vertices going to the other. And the big enough sequences are just those with some entry bigger than 1.

On the other hand, if G is a complete graph on k vertices, a unit-distance embedding has to be a simplex, so certainly anything with k of the t_i of size at least 1-1/k is big enough. (Is that an if and only if? To know this I’d have to know whether an ellipse containing an equilateral triangle can have a radius shorter than that of the circumcircle.)

Let’s face it, it’s confusing to think about ellipsoids circumscribing embedded graphs, so what about instead we define t(p,G) to be the minimum value of the L^p norm of (t_1, t_2, …) over ellipsoids enclosing a unit-distance embedding of G.

Then a graph has a unit-distance embedding in the plane iff t(0,G) <= 2. And t(oo,G) is just the hypersphere number again, right? If G has a k-clique then t(p,G) >= t(p,K_k) for any p, while if G has a k-coloring (i.e. a map to K_k) then t(p,G) <= t(p,K_k) for any n. In particular, a regular k-simplex with unit edges fits into a sphere of squared radius 1-1/k, so t(oo,G) < 1-1/k.

So… what’s the relation between these invariants? Is there a graph with t(0,G) = 2 and t(oo,G) > 4/5? If so, there would be a non-5-colorable unit distance graph in the plane. But I guess the relationship between these various “norms” feels interesting to me irrespective of any relation to plane-coloring. What is the max of t(oo,G) with t(0,G)=2?

The intermediate t(p,G) all give functions which upper-bound clique number and lower-bound chromatic number; are any of them interesting? Are any of them easily calculable, like the Lovász number?

**Remarks:**

- I called this post “What is the Lovász number of the plane?” but the question of “how big can t(oo,G) be if t(0,G)=2”? is more a question about finite subgraphs of the plane and their Lovász numbers. Another way to ask “What is the Lovász number of the plane” would be to adopt the point of view that the Lovász number of a graph has to do with extremizers on the set of positive semidefinite matrices whose (i,j) entry is nonzero only when i and j are adjacent vertices or i=j. So there must be some question one could ask about the space of positive semidefinite symmetric kernels K(x,y) on R^2 x R^2 which are supported on the locus ||x-y||=1 and the diagonal, which question would rightly be called “What is the Lovász number of the plane?” But I’m not sure what it is.
- Having written this, I wonder whether it might be better, rather than thinking about enclosing ellipsoids of a set of points in R^d, just to think of the n points as an nxd matrix X and compute the singular values of X^T X, which would be kind of an “approximating ellipsoid” to the points. Maybe later I’ll think about what that would measure. Or you can!

]]>

The idea: given a set S of points in the plane, its unit distance graph G_S is the graph whose vertices are S and where two points are adjacent if they’re at distance 1 in the plane. If you can find S such that G_S has chromatic number k, then the chromatic number of the plane is at least k. And de Grey finds a set of 1,567 points whose unit distance graph can’t be 4-colored.

It’s known that the chromatic number of the plane is at most 7. Idle question: is there any chance of a “polynomial method”-style proof that there is no subset S of the plane whose unit distance graph has chromatic number 7? Such a graph would have a lot of unit distances, and ruling out lots of repetitions of the same distance is something the polynomial method can in principle do.

Though be warned: as far as I know the polynomial method has generated no improvement so far on older bounds on the unit distance problem (“how many unit distances can there be among pairs drawn from S?”) while it has essentially solved the distinct distance problem (“how few distinct distances can there be among pairs drawn from S?”)

]]>

]]>

Let me tell you what you’re looking at. You are looking for elliptic curves E admitting a Belyi map f: E -> P^1, which is to say a map ramified only over 0,1, and infinity. For each such map, the blue graph is f^{-1}([0,1]), the preimage of the line segment joining o and 1 in P^1(R).

In four of these cases, the graph is piecewise linear! I didn’t know there were examples like this. Don’t know if this is easy, but: for which Belyi maps (of any genus, not just genus 1) is f^{-1}([0,1]) a union of geodesics?

]]>Who knows if the Wayback Machine is forever? Just in case, I’m including the text of the piece here.

The Phoenix gave this piece its title, which I think is too fighty. My title was “Academy Plight Song.” (Get it?)

I think this holds up pretty well! (Except if I were writing this today I wouldn’t attach so much physical description to every woman with a speaking part.)

Melani McAlister, the new hire at GWU who appears in the opening scene, is still there as a tenured professor in 2018. And all these years later, she’s still interested in helping fledgling academics navigate the world of scholarly work; her page “Thinking Twice about Grad School” is thorough, honest, humane, and just great.

Here’s the piece!

**The great PhD scam**

by Jordan Ellenberg

“We dangle our three magic letters before the eyes of these predestined victims, and they swarm to us like moths to an electric light. They come at a time of life when failure can no longer be repaired easily and when the wounds it leaves are permanent . . . ”

— William James

“The Ph.D. Octopus,” 1903

By nine o’clock, more than 200 would-be professors have piled into the Cotillion Ballroom South at the Sheraton Washington hotel, filling every seat and spilling over into the standing space behind the chairs. They’re young and old, dressed up and down, black and white and other (though mostly white). They’re here to watch Melani McAlister, a 1996 PhD in American Civilization from Brown, explain to a committee of five tenured professors why she ought to be hired at Indiana University.

Everybody looks nervous except McAlister. That’s because, unlike almost everyone else here, she doesn’t need a job; she’s an assistant professor at George Washington University. This interview is a mock-up, a performance put on to inform and reassure the crowd of job-seekers. As McAlister cleanly fields questions about her thesis and her pedagogical strategy, the people in the audience frown and nod, as if mentally rehearsing their own answers to the similar questions they’ll be asked in days to come.

This is night one of the 112th annual meeting of the Modern Language Association, the national organization of professors of English, comparative literature, and living foreign languages. Ten thousand scholars are here in Washington, DC, to attend panels, renew acquaintances, and, most important, to fill open faculty positions. A tenure-track job typically attracts hundreds of applicants; of these, perhaps a dozen will be offered interviews at the MLA; and from that set a handful will be called back for on-campus interviews. For the people who are here “on the market,” that is, trying to become professors of English and so forth, the MLA is the gate to heaven. And, as everyone in the room is aware, the gate is swinging shut.

McAlister is a slight, pretty woman with a trim hairdo and a trace of North Carolina in her speech. All business, she explains how Steve Martin’s song “King Tut” “viewed Tut’s `blackness’ as a commodity, a cultural style to be mobilized for the reconstruction of white masculinity.” McAlister specializes in cultural studies, a lately dominant strain of thought in English departments which aims to question and decode culture by “reading” both literary and extraliterary “texts,” Steve Martin included. Cultural studies is usually lumped with deconstructionism, Marxist and feminist criticism, semiotics, and other allied fields under the rubric of theory; the term has no fixed definition, but it’s safe to think of it as anything in the curriculum that makes George Will tug at his collar and cough.

Theory is also difficult for the layman to read, assigning technical and often unintuitive meanings to common words such as commodity, construction, figure, and spectacle. Like any specialized vocabulary, the language of cultural studies functions in part to assert the speaker’s authority, to keep the in group in and the out group out. Tonight, McAlister is definitively telling the audience they’re in. She deftly mixes the language of theory with enough explanation to make the masculinity of Tut intelligible to the large body of job-seekers with more classical interests — all the while conveying the impression that the explanation is, of course, unnecessary, that she’s reminding the audience of things they already know.

“It’s pornographic,” Todd Gilman, a post-doctoral fellow at Harvard, says (approvingly) of the mock interview; another candidate’s interview, he says, is in real life “one of those moments that no one has access to.” And, as in pornography, the distinction between watching this performance and fantasizing about one’s own performance is intentionally blurred. McAlister’s physical smallness; her straightforward, rural-inflected speech; even the peppily truncated spelling of her first name make her wholly unthreatening and unalienating to the audience — a figure with whom everyone here can feel free to identify. You’re meant to walk out of this thinking: “I have a great résumé. I am a cultural-studies jock. I’m smart and pretty and the focus of everyone’s desire.” And from the rapt look of the audience, it seems to be working.

The real interviews start tomorrow morning.

Except for a brief boom in the late 1980s, the academic job market in the humanities has been shrinking for 20 years, and the flow of new PhDs has been on the rise for almost a decade. The National Research Council estimates that 923 people received PhDs in English in the 1993-’94 academic year (an increase of 20 percent from the trough in 1986-’87) and of these, only 42 percent are known to have tenure-track academic jobs now. The numbers for foreign languages are approximately the same.

Where are the rest? About a fifth are in non-tenure-track full-time positions; in other words, still on the market, this year or in years to come. Six percent have left the academy altogether. Ten percent are unemployed. Another eight percent are untrackable. And 10 percent hold part-time appointments: these are the adjunct faculty, working semester to semester, without benefits, often teaching courses at two or three colleges at once.

According to the American Association of University Professors, adjuncts make up close to half of college faculty members in the United States. With little time for research or publication, adjuncts are cut off from the main avenues for professional advancement. “I’m working like a dog just to have enough money to live on,” says John Maguire, 49, who taught six English comp courses last fall, split between Berklee and Babson College. This semester he has five courses; next summer — a comparative vacation — just two. It adds up to about $39,000 a year, with no benefits. (According to Maguire, Berklee has agreed to a contract providing benefits to part-time workers, but is delaying implementation by “dither[ing] about the language.”) Is he here on the market? “Yeah,” he says. “Just like I’m on the market to win the Megabucks.”

With such poor prospects ahead, why do new hopefuls keep entering the pipeline? It’s certainly not the comforts of graduate school. Suppose, for instance, you start the English PhD program at Boston University. First of all, you can’t just start; if you don’t already have a master’s degree, you’ll have to complete BU’s one-year MA, for which you will almost certainly have to pay tuition: at the moment, $20,702. After that, you’ve got about a 50-50 chance of being allowed to enter the doctoral program. (One BU student told me that the passage from the first program to the second was presented to him as “a formality” — the university, he says, “uses that large MA class to fund itself.”) For your first four years of doctoral study, you’ll teach one course a semester and receive a $9500 stipend.

After that, the money disappears. Theoretically, you’ve finished your dissertation and are ready to graduate. One student estimated that one in five people are done after four years; another said she’d known one such person, ever. Suppose you’re not done. You can still get a job as a lecturer and try to finish your degree while making $2600 a course — no benefits. You’ll be fighting for the lecturerships not only with your colleagues but with PhDs from other schools who didn’t get full-time work, whom you’ll probably begin to resent and hate, both because they’re competing with you and because they remind you of a possible future you’re trying to ignore. Students at BU feel their situation is worse than the norm, but it’s not much better anywhere; at Yale and the University of California, it’s been bad enough to drive the graduate students to strike.

Everyone at the MLA conference agrees there’s a job crisis in the humanities, but they don’t agree on much more. Even the most basic terms of arguing about the subject remain, in a popular formulation here, “sites of contestation.” Incompatible stories compete, Rashomon-like, to explain the same dead body.

One story, apparently the most popular: because of a demographic contraction and declining state funding of higher education, the academic labor market has shrunk, and shrunk for good. Faculty members have nonetheless sanctioned the increasing production of graduate students, even opening new PhD programs. Why? Robert Holub, a professor of German at UC Berkeley, writes in the MLA journal Profession: ” . . . professors profit directly from graduate students, who teach classes professors do not want to teach, populate seminars so that professors can indulge in their specialties, and do research professors do not want to do (or pay for).” From this point of view, the solution to the job problem is primarily a difficult exercise of will; faculties should allow marginal graduate programs to be closed, and cut admissions to the ones that remain, so that people will no longer be trained for jobs that don’t exist.

Another story of the work shortage: there is no work shortage. This leftist critique holds that to apply the supply-and-demand logic of capitalism to the academy is to give up on what makes the academy important. “The job market is a contestable idea,” says a graduate student from CUNY. “It’s not natural; it can be opposed; and it can be reversed.” Grover Furr, a professor at Montclair State and member of the MLA Radical Caucus, puts it this way: “These are politically created circumstances — and so the political dimensions of the response are more important than the economic dimension.” That is, the first story contains the implicit, and wrong, assumption that the job problem is the result of God-given economic forces, when, in fact, there’s lots of work for everyone to do — it’s just not available in the form of jobs. Here the administrators are the guilty parties; it’s their decision not to replace retiring faculty, or to replace them with cheaper adjunct labor. In this analysis, cutting graduate enrollment is the worst possible policy. Instead, the faculty, adjuncts, and graduate students ought to make common cause, endorsing unionization of student employees and faculty, and resisting the efforts of state legislatures and university administrators to reshape the university along corporate lines and fiscal rationales.

Then there’s the right-wing story, in which the humanities are in trouble because the current mainstream of scholarship — that is, scholarship informed by theory — is a fraud, perpetrated by leftist professors embittered by their marginalization in the public sphere. The reason there are so few jobs for English professors is that English professors are selling a product students don’t want, and which legislatures are (rightly) reluctant to subsidize. People like Dinesh D’Souza, Roger Kimball, and William Bennett sold a lot of books on this theme during the so-called culture wars of the early 1990s, and it still makes a reliable column for conservative op-edders like the Washington Post’s Jonathan Yardley, who, on the third day of the conference, wrote sternly that Shakespeare and Toni Morrison were not “equivalent.” To the conservatives, the special language of theory is just obscurantism — an attempt to assert an authority that hasn’t been earned. And any knowledge that’s to be gained from reading Steve Martin (or radically re-reading Shakespeare) is knowledge not worth having. No one at the MLA was espousing this point of view, at least not audibly, but the popularity of the conservative story has had its effect on morale. “What we do is such a caricaturable quantity,” one graduate student told me. “We all live in fear that people think we’re doing intellectual masturbation and that our difficult work isn’t appreciated. . . . You’re suddenly beset with the feeling that no one understands or cares about what you’re working on.”

The MLA has its own story to tell about the job problem. By their count, the number of jobs listed in English was up 13 percent this year, and the worst of the shortage may be over. “We know that there are going to be larger numbers in college — a boomlet — by the turn of the century,” says Phyllis Franklin, the MLA’s executive director. “We also know our society values advanced education. This is a fact.”

No one who wasn’t speaking for the MLA seemed to share this view.

Much of each day’s activity centers on the 747 panel talks. The speakers are usually job-seekers or resume-building junior faculty, with the occasional big name like Harvard’s Marjorie Garber pitching in. The panels I visit aren’t thrilling; the talks seem repetitive and the audience inattentive. It reminds me of being in a college class that meets right before lunch. A graduate student explains it to me this way: compressing an hour lecture into a 20-minute lecture is very hard. Stretching a five-minute insight into a 20-minute lecture is very easy.

Jonathan Yardley notwithstanding, identity politics don’t dominate; there’s lots of Shakespeare happening, not so much Toni Morrison. Science is very big: panels are convening on “Poetry and Computers: Digital Poetics” and “Virtual Worlds with Real People” and “Millennium or Apocalypse? The Golem and Other Jewish Science Fiction.” You get the feeling that the scholars here are a little envious of the scientists, with their uncontested authority to speak on their own subjects. (There’s no Yardley chiding the chemists.) That authority has been a sore point ever since 1995, when Alan Sokal, a physicist at NYU, successfully placed a hoax article in the literary-theory journal Social Text. Bad PR by Social Text’s editors, with the aid of some sensational reporting, turned the low-grade scandal into a major embarrassment for the humanities. It’s no wonder so many of the scholars are shy of me. Three different people ask me if I’m writing “the usual hatchet job.”

On the third day of the conference, the Forum on the Job Market and the Future of the Profession draws about half as many people as the mock interview to a room twice as big. Cary Nelson, a professor at the University of Illinois at Urbana and author of the forthcoming Manifesto of a Tenured Radical, delivers a crowd-pleasing stump speech attacking the MLA’s response to the job crisis, quoting less-than-fervent statements on the matter by several MLA officials. Nelson has a big voice, and a thickety, Old Testament beard, which he makes the most of; his remonstrations sound straight from a peevish God. With their timidity and unwillingness to offend tenured faculty, Nelson declares, “[MLA leaders] have aided and abetted the crisis . . . the disaster they normalized. They have moved on to the wistful contemplation of their graduate students — alas, poor Yorick, he hath not a job.” Nelson’s own prescription is a “Twelve-Step Program for Academia,” including such strong medicine as unionization of graduate students, shutting down “marginal” PhD programs, and a push toward retirement for faculty of a certain age.

He’s particularly hard on Elaine Showalter, a professor at Princeton and second vice-president of the MLA, who has argued more than once against what she calls “’60s tactics” in academic activism. “Showalter believes,” Nelson says cuttingly, “that a graduate education in the humanities is . . . broadening.” The implication is clear: for Showalter and the MLA, a PhD is just another part of rich kids’ social initiation into the world of rich adults, like a Grand Tour, or tennis lessons.

As soon as he’s done, Showalter, a small, sturdy woman with a schoolteacher’s hairdo and a silver sun brooch, shoots up from the crowd, her hand extending upward and a pen extending from her hand.

“The solutions are not going to be an issue of exploitative, seductive, cruel princes of the MLA against a kind of plebiscite,” she says. But for much of the crowd, that seems to be exactly the issue involved. The question period turns out to be more of a cri de coeur period, as audience members stand and testify to what everyone already knows: that the dire future has already arrived, and that no one seems to be doing much about it. There’s a broad feeling that the MLA represents the tenured professors, not the graduate students who make up a third of its membership, certainly not the part-timers. A woman who’s been an adjunct in New Jersey for 20 years stands up: “We are accountable for what we eat, what we wear, what we say, and what we teach.” I’ve never before seen someone so mad as to actually tremble. Suddenly a mystery novel I’d read just before coming, Murder at the MLA, makes a lot more sense. “People who are taking money and pretending to teach [graduate students] that there will be positions are accountable.” (“It’s not fair,” Phyllis Franklin says later, looking beleaguered. “In point of fact . . . we have been making every kind of effort to help those people. There’s one thing the MLA can’t do, and that’s make jobs.”)

As for Elaine Showalter, she really does think a graduate education is broadening, and she’s not apologetic about it. A PhD in English, she tells me, allows you “to write your first book under someone’s supervision”; it makes you a better journalist, a better high-school teacher, a better White House communications director, and so on. So there aren’t too many people being trained; there’s just too limited a perception of what they’re being trained for. “Cutting down,” she says, “is the least interesting, least imaginative, least effective solution.”

But graduate students tend to resent the suggestion that they should be satisfied with work outside the academy. “We didn’t go through this for seven years to work for Motorola,” one speaker says. A graduate student from Columbia: “Name one job this prepares you for other than scholar and teacher. It hurts you, because you’re overeducated, and they wonder why you failed at your last career. There are lots of other jobs, but I could have done them seven years ago with a BA.” Another graduate student told me that when he found out last year he was going to be a father, he sent out dozens of job applications in the industry he’d left to go back to school. One company sent a sympathetic letter but no offer. The others told him his PhD made him overqualified. Now he’s at the MLA, on the market for an academic job. “The fact is,” he tells me, “this is the only plank I can afford to walk.”

So why do people still enter doctoral programs? Nelson, in Manifesto, makes the crucial point that “the job market’s blunt message . . . will not in any way mean [to undergraduates] what it means to graduate students whose careers are cut off in mid- stream . . . . To undergraduates the message will instead be partly symbolic, vaguely invoking disaster or impossibility, and partly incomprehensible.” Here’s the reason so many graduate students feel as if they hadn’t been warned just how bad the situation was. As Nelson puts it, graduate students “acquire an identity they did not have at the outset.” The undergraduate who hears the bad news reasons that if she can’t get an academic job she’ll do something else, and signs up. The person who, six years later, becomes part of the bad news may no longer be able to do anything else.

I asked a lot of graduate students what they would tell someone who was deciding whether to enter a doctoral program. Every one said the same thing: “Don’t do it unless you love it.”

The problem is that it doesn’t seem possible to love it, the way they mean, without doing it.

The 10,000 scholars here in DC are united in a difficult, invaluable, sometimes exciting communal project. Just by reading books (and movies, and menus, and Steve Martin songs) they’re finding out things about how people live — and about what people are, and how people decide what they are — that laymen like me don’t know, and can’t fully understand. But the scholars seem to labor under an unremitting fear that everything they do is worthless. You can see that fear in the protective secret language of cultural studies. You can see it in Murder at the MLA, in which the kind of knowledge English professors have turns out to be the kind that solves crimes. You can see it in the humanities’ conflicted relationship with the sciences. You can see it in the academy’s failure to mount any forceful response to the conservative critics of the early ’90s and to the administration “streamliners” of today.

John Guillory, an English professor at Johns Hopkins, spoke Saturday morning on “Rationales for Literary Study.” Guillory, whose book Cultural Capital has made him an authority on the way academic knowledge is produced and consumed, told a polite crowd that they had to quit worrying about being useful. The problems of the academy, he said, wouldn’t be solved until English professors (and then the public) found a way to value literary knowledge for itself, the way we value, say, knowledge about mathematics.

He didn’t seem to make a big impression, but it’s Guillory I’m thinking about on Sunday afternoon, with the conference winding to a close, as I sit in a little panel room waiting for a hopeful Renaissancist grad student to start his piece on Donne. A woman behind me says, “Don’t spread the word about the Thackeray revival. Right now I am the Thackeray revival.” I’m trying to imagine a world where English was more like mathematics. A reporter’s job at the MLA wouldn’t be to pass judgment on the field but to relay, with suitable deference, the past year’s breakthroughs. People would say things to English professors like “I don’t know how you do it — I never got the hang of reading.” Literary study would have its own E.O. Wilsons and Stephen Hawkings, beloved public shamans with stacks of honorary degrees. And with the rise in public standing would come kibbitzers and hangers-on; maybe Camille Paglia’s a first glimmering of that.

It would be different. The last of Cary Nelson’s 12 steps is “Popularize the Achievements of the Academy.” He’s right about that — it’s the only way to keep cost-cutting legislators at bay. But I’m not sure the scholars want to be popular. Like sensitive, dance-shy teens, they know you lose something, too. As the discourse becomes public, it coarsens and shrinks; it has to fit inside an op-ed piece or a New Yorker cartoon. Is that a price the scholars will be willing to pay? I can’t tell, but I think they’ll have to decide soon.

Out in the hall, the exhibitors are packing up. As the room settles into attention, I get up to go — I have somewhere to be, across town.

“I think we can begin,” the moderator says.

]]>

But I have a pdf copy, so here it is, for my own reference, and yours if for some reason you need it!

I should have anticipated this and downloaded all my Phoenix stuff. The first pieces I ever reported were there, a short one about a Michael Moore rally and a long one about the MLA. They’re gone. But wait! I was able to recover the MLA piece from the WayBack Machine. Thanks, WayBack Machine! I’ll post that later.

]]>

for x; but since x may not be in the image of the linear transformation A, you settle for minimizing

in whatever norm you like (L^2 for standard linear regression.)

In many modern optimization problems, on the other hand, the problem you’re trying to solve may have a lot more degrees of freedom. Maybe you’re setting up an RNN with lots and lots and lots of parameters. Or maybe, to bring this down to earth, you’re trying to pass a curve through lots of points but the curve is allowed to have very high degree. This has the advantage that you can definitely find a curve that passes through all the points. But it also has the *disadvantage* that you can definitely find a curve that passes through all the points. You are likely to overfit! Your wildly wiggly curve, engineered to exactly fit the data you trained on, is unlikely to generalize well to future data.

Everybody knows about this problem, everybody knows to worry about it. But here’s the thing. A lot of modern problems are of this form, and yet the optima we find on training data often *do* generalize pretty well to test data! Why?

Make this more formal. Let’s say for the sake of argument you’re trying to learn a real-valued function F, which you hypothesize is drawn from some giant space X. (Not necessarily a vector space, just any old space.) You have N training pairs (x_i, y_i), and a good choice for F might be one such that F(x_i) = y_i. So you might try to find F such that

for all i. But if X is big enough, there will be a whole space of functions F which do the trick! The solution set to

will be some big subspace F_{x,y} of X. How do you know which of these F’s to pick?

One popular way is to *regularize*; you decide that some elements of X are just better than others, and choose the point of F_{x,y} that optimizes that objective. For instance, if you’re curve-fitting, you might try to find, among those curves passing through your N points, the least wiggly one (e.g. the one with the least total curvature.) Or you might optimize for some combination of hitting the points and non-wiggliness, arriving at a compromise curve that wiggles only mildly and still passes near most of the points. (The ultimate version of this strategy would be to retreat all the way back to linear regression.)

But it’s not obvious what regularization objective to choose, and maybe trying to optimize that objective is yet another hard computational problem, and so on and so on. What’s really surprising is that something much simpler often works pretty well. Namely: how would you find F such that F(x) = y in the first place? You would choose some random F in X, then do some version of *gradient descent*. Find the direction in the tangent space to X at F that decreases most steeply, perturb F a bit in that direction, lather, rinse, repeat.

If this process converges, it ought to get you somewhere on the solution space F_{x,y}. But where? And this is really what Gunasekar’s work is about. Even if your starting F is distributed broadly, the distribution of the spot where gradient descent “lands” on F_{x,y} can be much more sharply focused. In some cases, it’s concentrated on a single point! The “likely targets of gradient descent” seem to generalize better to test data, and in some cases Gunasekar et al can prove gradient descent likes to find the points on F_{x,y} which optimize some regularizer.

I was really struck by this outlook. I have tended to think of function learning as a problem of optimization; how can you effectively minimize the training loss ||F(x) – y||? But Gunasekar asks us instead to think about the much richer mathematical structure of the *dynamical system* of gradient descent on X guided by the loss function. (Or I should say dynamical systems; gradient descent comes in many flavors.)

The dynamical system has a lot more stuff in it! Think about iterating a function; knowing the fixed points is one thing, but knowing which fixed points are stable and which aren’t, and knowing which stable points have big basins of attraction, tells you way more.

What’s more, the dynamical system formulation is much more natural for learning problems as they are so often encountered in life, with streaming rather than static training data. If you are constantly observing more pairs (x_i,y_i), you don’t want to have to start over every second and optimize a new loss function! But if you take the primary object of study to be, not the loss function, but the dynamical system on the hypothesis space X, new data is no problem; your gradient is just a longer and longer sum with each timestep (or you exponentially deweight the older data, *whatever you want my friend*, the world is yours.)

Anyway. Loved this talk. Maybe this dynamical framework is the way other people are already accustomed to think of it but it was news to me.

Slides for a talk of Gunasekar’s similar to the one she gave here

“Characterizing Implicit Bias in terms of Optimization Geometry” (2018)

“Convergence of Gradient Descent on Separable Data” (2018)

A little googling for gradient descent and dynamical systems shows me that, unsurprisingly, Ben Recht is on this train.

]]>