## The existence of designs

The big news in combinatorics is this new preprint by Peter Keevash, which proves the existence of Steiner systems, or more generally combinatorial designs, for essentially every system of parameters where the existence of such a design isn’t ruled out on divisibility grounds.  Remarkable!

I’m not going to say anything about this paper except to point out that it has even more in it than is contained in the top-billed theorem; the paper rests on the probabilistic method, which in this case means, more or less, that Keevash shows that you can choose a “partial combinatorial design” in an essentially random way, and with very high probability it will still be “close enough” that by very careful modifications (or, as Keevash says, “various applications of the nibble” — I love the names combinatorists give their techniques) you can get all the way to the desired combinatorial design.

This kind of argument is very robust!  For instance, Keevash gets the following result, which in a way I find just as handsome as the result on designs.  Take a random graph on n vertices — that is, each edge is present with probability 1/2, all edges independent.  Does that graph have a decomposition into disjoint triangles?  Well, probably not, right?  Because a union of triangles has to have even degree at each vertex, while the random graph is going to have n/2 of its vertices with odd degree. (This is the kind of divisibility obstruction I mentioned in the first paragraph.)  In fact, this divisibility argument shows that if the graph can be decomposed as a union of triangles with M extra edges, M has to be at least n/4 with high probability, since that’s how many edges you would need just to dispose of the odd-degree vertices.  And what Keevash’s theorem shows is there really is (with high probability) a union of disjoint triangles that leaves only (1+o(1))(n/4) edges of the random graph uncovered!

More details elsewhere from Vuhavan and Gil Kalai.

## How much is the stacks project graph like a random graph?

Cathy posted some cool data yesterday coming from the new visualization features of the magnificent Stacks Project.  Summary:  you can make a directed graph whose vertices are the 10,445 tagged assertions in the Stacks Project, and whose edges are logical dependency.  So this graph (hopefully!) doesn’t have any directed cycles.  (Actually, Cathy tells me that the Stacks Project autovomits out any contribution that would create a logical cycle!  I wish LaTeX could do that.)

Given any assertion v, you can construct the subgraph G_v of vertices which are the terminus of a directed path starting at v.  And Cathy finds that if you plot the number of vertices and number of edges of each of these graphs, you get something that looks really, really close to a line.

Why is this so?  Does it suggest some underlying structure?  I tend to say no, or at least not much — my guess is that in some sense it is “expected” for graphs like this to have this sort of property.

Because I am trying to get strong at sage I coded some of this up this morning. One way to make a random directed graph with no cycles is as follows:  start with N edges, and a function f on natural numbers k that decays with k, and then connect vertex N to vertex N-k (if there is such a vertex) with probability f(k).  The decaying function f is supposed to mimic the fact that an assertion is presumably more likely to refer to something just before it than something “far away” (though of course the stack project is not a strictly linear thing like a book.)

Here’s how Cathy’s plot looks for a graph generated by N= 1000 and f(k) = (2/3)^k, which makes the mean out-degree 2 as suggested in Cathy’s post.

Pretty linear — though if you look closely you can see that there are really (at least) a couple of close-to-linear “strands” superimposed! At first I thought this was because I forgot to clear the plot before running the program, but no, this is the kind of thing that happens.

Is this because the distribution decays so fast, so that there are very few long-range edges? Here’s how the plot looks with f(k) = 1/k^2, a nice fat tail yielding many more long edges:

My guess: a random graph aficionado could prove that the plot stays very close to a line with high probability under a broad range of random graph models. But I don’t really know!

Update:  Although you know what must be happening here?  It’s not hard to check that in the models I’ve presented here, there’s a huge amount of overlap between the descendant graphs; in fact, a vertex is very likely to be connected all but c of the vertices below it for a suitable constant c.

I would guess the Stacks Project graph doesn’t have this property (though it would be interesting to hear from Cathy to what extent this is the case) and that in her scatterplot we are not measuring the same graph again and again.

It might be fun to consider a model where vertices are pairs of natural numbers and (m,n) is connected to (m-k,n-l) with probability f(k,l) for some suitable decay.  Under those circumstances, you’d have substantially less overlap between the descendant trees; do you still get the approximately linear relationship between edges and nodes?

## In which I have a quarter-million friends of friends on Facebook

One of the privacy options Facebook allows is “restrict to friends of friends.”  I was discussing with Tom Scocca the question of how many people this actually amounts to.  FB doesn’t seem to offer an easy way to get a definitive accounting, so I decided to use the new Facebook Graph Search to make a quick and dirty estimate.  If you ask it to show you all the friends of your friends, it just tells you that there are more than 1000, but doesn’t supply an exact number.  If you want a count, you have to ask it something more specific, like “How many friends of my friends are named Constance?”

In my case, the answer is 25.

So what does that mean?  Well, according to the amazing NameVoyager, between 100 and 300 babies per million are named Constance, at least in the birthdate range that contains most of Facebook’s user base and, I expect, most of my friends-of-friends (herafter, FoFs) as well.  So under the assumption that my FoFs are as likely as the average American to be named Constance, there should be between 85,000 and 250,000 FoFs.

That assumption is massively unlikely, of course; name choices have strong correlations with geography, ethnicity, and socioeconomic thingamabobs.  But you can just do this redundantly to get a sense of what’s going on.  59 of my FoFs are named Marianne, a name whose frequency ranges from 150-300 parts per million; that suggests a FoF range of about 200-400K.

I did this for a few names (50 Geralds, 18 Charitys (Charities??)) and the overlaps of the ranges seemed to hump at around 250,000, so that’s my vague estimate for the number.

Bu then I remembered that there was actually a paper about this on the arXiv, “The Anatomy of the Facebook Graph,” by Ugander, Karrer, Backstrom, and Marlow, which studies exactly this question.  They found something which is, to me, rather surprising; that the number of FoFs grows approximately linearly in the number of friends.  The appropriate coefficients have surely changed since 2011, but they get a good fit with

#FoF = 355(#friends) – 15057.

For me, with 680 friends, that’s 226,343.  Good fit!

This 2012 study from Pew (on which Marlow is also an author) studies a sample in which the respondents had a mean 245 Facebook friends, and finds that the mean number of FoFs was 156,569.  Interestingly, the linear model from the earlier paper gives only 72,000, though to my eye it looks like 245 is well within the range where the fit to the line is very good.

The math question this suggests:  in the various random-graph models that people like to use to study social networks, what is the mean size of the 2-neighborhood of x (i.e. the number of FoFs) conditional on x having degree k?  Is it ever linear in k, or approximately linear over some large range of k?

## Humanities Hackathon

Had a great time today talking graph theory with a roomful of students and faculty in the humanities at the Humanities Hackathon.  Here’s a (big .ppt file) link to my slides.  One popular visualization was this graph of baby boys’ names from 2011, where two names are adjacent if their popularity profile across 12 representative states is very similar.  (For example, names similar to “Malachi” on this measure include “Ashton” and “Kaden,” while names similar to “Patrick” include “Thomas,” “John,” “Sean,” and “Ryan.”)

The visualization is by the open-source graph-viz tool gephi.

I came home only to encounter this breathless post from the Science blog about a claim that you can use network invariants (e.g. clustering coefficient, degree distribution, correlation of degree between adjacent nodes) to distinguish factually grounded narratives like the Iliad from entirely fictional ones like Harry Potter.  The paper itself is not so convincing.  For instance, its argument on “assortativity,” the property that high-degree nodes tend to be adjacent to one another, goes something like this:

Real-life social networks tend to be assortative, in the sense that the number of friends I have is positively correlated with the number of friends my friends have.

The social network they write down for the Iliad isn’t assortative, so they remove all the interactions classified as “hostile,” and then it is.

The social network for Beowulf isn’t assortative, so they remove all the interactions classified as “hostile,” and then it still isn’t, so they take out Beowulf himself, and then it is, but just barely.

Conclusion: The social networks of Beowulf and the Iliad are assortative, just like real social networks.

Digital humanities can be better than this!

Tagged , , ,

## The hardest Rush Hour position

It takes 93 moves to solve, per this paper by Collette, Raskin, and Servais.  I tried it and got nowhere.

You can think of the space of all possible configurations of vehicles as, well, a configuration space, not unlike the configuration spaces of disks in a box.  But here there is a bit less topology; the space is just a graph, with two configurations made adjacent if one can be reached from the other by making a single move.  The connected component of configuration space containing the “hardest case” shown here has 24,132 vertices.

I wonder what this graph looks like?   What does the path of the cars look like as you traverse the 93-step path; do most of the cars traverse most of their range?  How many of the possible configurations of the 13 vehicles (constrained to stay in the given rows and columns, and in the same linear order when two share a row or column) are actually contained in this component?  Maybe Matt Kahle knows.  By the way, another Matt Kahle-like fact is that among the list of the hardest configurations are some which are not so dense at all, like this one with only 9 cars.  It looks like it should be easy, but apparently it takes 83 moves to solve!

## Expander graphs, gonality, and variation of Galois representations

Suppose you have a 1-dimensional family of polarized abelian varieties — or, just to make things concrete, an abelian variety A over Q(t) with no isotrivial factor.

You might have some intuition that abelian varieties over Q don’t usually have rational p-torsion points — to make this precise you might ask that A_t[p](Q) be empty for “most” t.

In fact, we prove (among other results of a similar flavor) the following strong version of this statement.  Let d be an integer, K a number field, and A/K(t) an abelian variety.  Then there is a constant p(A,d) such that, for each prime p > p(A,d), there are only finitely many t such that A_t[p] has a point over a degree-d extension of K.

The idea is to study the geometry of the curve U_p parametrizing pairs (t,S) where S is a p-torsion point of A_t.  This curve is a finite cover of the projective line; if you can show it has genus bigger than 1, then you know U_p has only finitely many K-rational points, by Faltings’ theorem.

But we want more — we want to know that U_p has only finitely many points over degree-d extensions of K.  This can fail even for high-genus curves:  for instance, the curve

C:   y^2 = x^100000 + x + 1

has really massive genus, but choosing any rational value of x yields a point on C defined over a quadratic extension of Q.  The problem is that C is hyperelliptic — it has a degree-2 map to the projective line.  More generally, if U_p has a degree-d map to P^1,  then U_p has lots of points over degree-d extensions of K.  In fact, Faltings’ theorem can be leveraged to show that a kind of converse is true.

So the relevant task is to show that U_p admits no map to P^1 of degree less than d; in other words, its gonality is at least d.

Now how do you show a curve has large gonality?  Unlike genus, gonality isn’t a topological invariant; somehow you really have to use the geometry of the curve.  The technique that works here is one we learned from an paper of Abramovich; via a theorem of Li and Yau, you can show that the gonality of U_p is big if you can show that the Laplacian operator on the Riemann surface U_p(C) has a spectral gap.  (Abramovich uses this technique to prove the g=1 version of our theorem:  the gonality of classical modular curves increases with the level.)

We get a grip on this Laplacian by approximating it with something discrete.  Namely:  if U is the open subvariety of P^1 over which A has good reduction, then U_p(C) is an unramified cover of U(C), and can be identified with a finite-index subgroup H_p of the fundamental group G = pi_1(U(C)), which is just a free group on finitely many generators g_1, … g_n.  From this data you can cook up a Cayley-Schreier graph, whose vertices are cosets of H_p in G, and whose edges connect g H with g_i g H.  Thanks to work of Burger, we know that this graph is a good “combinatorial model” of U_p(C); in particular, the Laplacian of U_p(C) has a spectral gap if and only if the adjacency matrix of this Cayley-Schreier graph does.

At this point, we have reduced to a spectral problem having to do with special subgroups of free groups.  And if it were 2009, we would be completely stuck.  But it’s 2010!  And we have at hand a whole spray of brand-new results thanks to Helfgott, Gill, Pyber, Szabo, Breuillard, Green, Tao, and others, which guarantee precisely that Cayley-Schreier graphs of this kind, (corresponding to finite covers of U(C) whose Galois closure has Galois group a perfect linear group over a finite field) have spectral gap; that is, they are expander graphs. (Actually, a slightly weaker condition than spectral gap, which we call esperantism, is all we need.)

Sometimes you think about a problem at just the right time.  We would never have guessed that the burst of progress in sum-product estimates in linear groups would make this the right time to think about Galois representations in 1-dimensional families of abelian varieties, but so it turned out to be.  Our good luck.

## Hard discs, configuration spaces, kissing spheres, the 15 puzzle

Persi Diaconis has a lovely article in this month’s Bulletin of the AMS about the “Markov chain Monte Carlo revolution.”  I especially liked the discussion of the “hard disk model” for fluids, which was new to me.  Suppose X is a region in the plane and r a small positive number; then we can denote by Conf(X,n,r) the subspace of X^n consisting of n-tuples (x_1, … x_n) such that the discs of radius r centered at the x_i don’t intersect.  A natural statistic on Conf(X,n,r) is the “packing density”

$\rho = n \pi r^2 / \mbox{area}(X)$

the proportion of the area of X which is taken up by the discs.  You can imagine Conf(X,n,r) as a model for the space of states of a substance whose molecules are represented by the discs — the molecules are forbidden to overlap, but are otherwise allowed to move freely.  When $\rho$ is small, the space looks a lot like Conf(X,n,0), the configuration space of n distinct points in R.  But when $\rho$ is large (more than about .71) the system undergoes a “Kirkwood phase transition” and the balls are constrained to look something like a lattice packing.  You might think of the former case as modeling a liquid, and the latter as modeling a solid.  See Figure 5 of Persi’s paper for a nice illustration of this phenomenon.

Apparently, we don’t know very much about the topology of Conf(X,n,r).  Of course, for very large r the configuration space is empty.  When r = 0 it is the usual configuration space of n distinct points on X.  You might imagine carrying out a kind of Morse-theoretic decomposition of Conf(X,n,0) as follows:  for each n-tuple (x_1, … x_n) of distinct points on X, write

$f(x_1, \ldots, x_n) = \max_{ij} d(x_i,x_j)^{-1}$

where d(x,x’) is the distance between points x and x’ of X.  Then f seems a bit like a Morse function, and you can imagine building up Conf(X,n,0) by starting with a $0$-cell at each local minimum of f, then attaching 1-cells for each critical point with one negative eigenvalue, and so on.  Of course, f is badly non-smooth!  But you can imagine, I guess, replacing the maximum by an $L_p$ for suitably large p and seeing what happens…

The most primitive topological question you can ask is whether Conf(X,n,r) is connected — that is, given two configurations of n hard r-discs in X, is there always room to move from one configuration to the other?

One special case of this question is, to my eye, particularly charming: given a configuration of discs, can you permute the discs arbitrarily?  When the discs are packed almost as tightly as possible, you might expect a negative answer; but in practice, it seems that a very small amount of wiggle room is enough to re-order the discs any way you want.

Example 1:  Suppose X is a unit sphere and r is pi/6 radians.  Then a configuration of n disjoint r-discs on X is equivalent to a configuration of n disjoint unit spheres tangent to X.  The “kissing spheres problem,” which goes back at least to Newton, asks:  what is the maximal number of disjoint unit spheres that can be tangent to X?  The answer is 12; for instance, one can arrange these spheres at the vertices of a regular icosahedron.  This leaves a little room for the spheres to roll around; and it turns out that this small leeway is enough to allow one to permute the spheres by an arbitrary element of S_12!  John Baez has a nice discussion of this in a very early episode of This Week’s Finds.

Example 2: Let X be a unit square, and take r = 1/8 and n = 15; then you can cut X into 16 squares of sidelength 1/2, and place a disc at the center of all the squares but the one in the lower left corner.  (In the interest of writing this post quickly, I’ll leave you to draw your own picture of this.)  Now this packing of discs is pretty tight — at the outset, only two of the discs have room to move.  But it turns out that here, too, one can permute the discs in an almost arbitrary way; you can get any permutation in the alternating group A_15.  In fact, this problem is nothing but a continuous version of the Sam Loyd “15” puzzle.

If you wanted to get a general sense of what was going on here, you might discretize the problem in the following way:  suppose X is a graph with N vertices, n of which are covered by stones.  You are allowed to move any stone to an adjacent empty vertex.  Can you permute the stones in any way you like?  If N = n, the answer is clearly no; there are no allowable rooms.  But if you allow any wiggle room at all — that is, if you let n = N-1 — the answer, somewhat suprisingly to me, is essentially yes.  This is a theorem of Wilson (“Graph Puzzles, Homotopy, the Alternating Group”) from 1974:  if a graph X cannot be made disconnected by the removal of a singular vertex, and is not an N-cycle, then the N-1 stones can be permuted by any permutation in the alternating group, with one exception when N = 7.

What is this strange exceptional 7-vertex graph?  See if you can figure it out!  Or look here.