## Pandemic blog 20: R_0, random graphs, and the L_2 norm

People are talking about R_0. It’s the number that you wish were below 1. Namely: it is the number of people, on average, that a carrier of SARS-CoV-2 infects during the course of their illness. All the interventions we’re undertaking are designed to shrink that number. Because if it’s bigger than 1, the epidemic grows exponentially until it infects a substantial chunk of the population, and if it’s smaller, the epidemic dies out.

But not everybody has the same R_0! According to some of the epdemiologists writing about COVID, this matters. It matters, for instance, to the question of how far into the population the infection gets before it starts to burn itself out for lack of new susceptible hosts (“herd immunity”) and to the question of how much of the population eventually falls ill.

Here’s an easy way to see that heterogeneity can make a difference. Suppose your population consists of two towns of the same size with no intermunicipal intercourse whatsoever. The first town has an R_0 of 1.5 and the second an R_0 of 0.3. Then the mean R_0 of the whole population is 0.9. But this epidemic doesn’t die out; it spreads to cover much of the population of Contagiousville.

You can read Tim’s interesting thread yourself, but here’s the main idea. Say your population has size N. You make a graph out of the pandemic by placing an edge between vertices i and j if one of the corresponding people infects the other. (Probably better to set this up in a directed way, but I didn’t.) Or maybe slightly better to say: you place an edge if person i and person j interact in a manner such that, were either to enter the interaction infected, both would leave that way. If one person in this graph gets infected, the reach of the infection is the connected component of the corresponding vertex. So how big is that component?

The simplest way to set this up is to connect each pair of vertices with probability c/n, all such choices made independently. This is an Erdos-Renyi random graph. And the component structure of this graph has a beautiful well-known theory; if c > 1, there is a giant component which makes up a positive proportion of the vertices, and all other components are very small. The size of this component is nx, where x is the unique positive number such that $x = 1 - e^{-cx}$.

If c < 1, on the other hand, there is no big component, so the pandemic is unlikely to reach much of the population. (Correspondingly, the equation above has no nonzero solution.)

It is fair to be skeptical of this model, which is completely static and doesn’t do anything fancy, but let me just say this — the most basic dynamic model of epidemic spread, the SIR model, has an endstate where the proportion of the population that’s been infected is the unique positive x such that $x = 1 - e^{-R_0x}$.

Which looks pretty familiar!

Now what happens if you want to take into account that the population isn’t actually an undifferentiated mass? Let’s say, for instance, that your county has a red town and a blue town, each with population n/2. Two people in the red town have a probability of 2/n of being connected, while two people in the blue town have a probability of just 1/n of being connected, and a red-blue pair is connected with probability 1/n. (This kind of random graph is called a stochastic block model, if you want to go look at papers about it.) So the typical red-town person is going to infect 1 fellow red-towner and 0.5 blue-towners, for an R_0 of 1.5, while the blue-towner is going to have an R_0 of 1.

Here’s the heuristic for figuring out the size of the big component. Suppose x is the proportion of the red town in the big component of the graph, and y is the proportion of the blue town in the big component. Take a random red person; what’s the change they’re in the big component? Well, the chance they’re not connected to any of the xn/2 red-towners in the big component is $(1-2/n)^{xn/2} = e^{-1}$

(oh yeah did I mention that n was infinity?) and the chance that they’re not connected to any of the blue-towners in the big component is $(1-1/n)^{yn/2} = e^{-(1/2)y}$

so all in all you get $x = 1 - e^{-(x + (1/2)y}$

and by the same token you would get $y = 1-e^{-((1/2)x + (1/2)y)}$

and now you have two equations that you can solve for x and y! In fact, you find x = 47% and y = 33%. So just as you might expect, the disease gets farther in the spreadier town.

And you might notice that what we’re doing is just matrix algebra! If you think of (x,y) as a vector v, we are solving $v = \mathbf{1} - e^{-Av}$

where “exponentiation” of a vector is interpreted coordinatewise. You can think of this as finding a fixed point of a nonlinear operator on vectors.

When does the outbreak spread to cover a positive proportion of the population? There’s a beautiful theorem of Bollobas, Janssen, and Riordan that tells you: you get a big component exactly when the largest eigenvalue λ of A, the so-called Perron-Frobenius eigenvalue, is larger than 1. In the case of the matrix studied above, the two eigenvalues are about 1.31 and 0.19. You might also note that in the early stages of the epidemic, when almost everyone in the network is susceptible, the spread in each town will be governed by repeated multiplication of a small vector by A, and the exponential rate of growth is thus also going to be given by λ.

It would be cool if the big eigenvalue also told you what proportion of the vertices are in the giant component, but that’s too much to ask for. For instance, we could just replace A with a diagonal matrix with 1.31 and 0.19 on the diagonal; then the first town gets 43% infected and the second town completely disconnected from the first, gets 0.

What is the relationship between the Perron-Frobenius eigenvalue and the usual “mean R_0” definition? The eigenvalue can be thought of as $\max_{v} v^T A v / v^T v$

while the average R_0 is exactly $\mathbf{1}^T A \mathbf{1} / n$

where 1 is the all-ones vector. So we see immediately that λ is bounded below by the average R_0, but it really can be bigger; indeed, this is just what we see in the two-separated-towns example we started with, where R_0 is smaller than 1 but λ is larger.

I don’t see how to work out any concise description of the size of the giant component in terms of the symmetric matrix, even in the simplest cases. As we’ve seen, it’s not just a function of λ. The very simplest case might be that where A has rank 1; in other words, you have some division of the population into equal sized boxes, and each box has its own R_0, and then the graph is constructed in a way that is “Erdos-Renyi but with a constraint on degrees” — I think there are various ways to do this but the upshot is that the matrix A is rank 1 and its (i,j) entry is R_0(i) R_0(j) / C where C is the sum of the R_0 in each box. The eigenvalues of A are all zero except for the big one λ, which is equal to the trace, which is $\mathbf{E} R_0^2 / \mathbf{E} R_0$

or, if you like, mean(R_0) + variance(R_0)/mean(R_0); so if the average R_0 is held fixed, this gets bigger the more R_0 varies among the population.

And if you look back at that Wikipedia page about the giant component, you’ll see that this is the exact threshold they give for random graphs with specified degree distribution, citing a 2000 paper of Molloy and Reid. Or if you look at Lauren Meyers’s 2005 paper on epidemic spread in networks, you will find the same threshold for epidemic outbreak in section 2. (The math here is descended from work of Molloy-Reed and this much-cited paper of Newman, Strogatz, and Watts.) Are typical models of “random graphs with specified degree distribution” are built to have rank 1 in this sense? I think so — see e.g. this sentence in Newman-Strogatz-Watts: “Another quantity that will be important to us is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex.”

At any rate, even in this rank 1 case, even for 2×2 matrices, it’s not clear to me how to express the size of the giant component except by saying it’s a nonzero solution of $v = 1 - e^{Av}$. Does the vector v have anything do do with the Perron-Frobenius eigenvector? Challenge for the readers: work this out!

I did try a bunch of randomly chosen 6×6 matrices and plot the overall size of the giant component against λ, and this is what I got:

The blue line shows the proportion of the vertices that get infected if the graph were homogeneous with parameter λ. Which makes me think that thinking of λ as a good proxy for R_0 is not a terrible idea; it seems like a summary statistic of A which is pretty informative about the giant component. (This graph suggests maybe that among graphs with a given λ, the homogeneous one actually has the biggest giant component? Worth checking.)

I should hasten to say that there’s a lot of interest in the superspreader phenomenon, where a very small (probability -> 0) set of vertices has very large (superlinear in n) number of contacts. Meyers works out a bunch of cases like this and I think they are not well modeled by what I’m talking about here. (Update: I wrote another post about this! Indeed, I think the tight connection shown in the chart between λ and size of giant component is not going to persist when there’s extreme heterogeneity of degree.)

A more technical note: the result of Bollobas et al is much more general; there’s no reason the vertices have to be drawn from finitely many towns of equal size; you can instead have the types of vertices drawn from whatever probability space M you like, and then have the probability of an edge between an vertex x and a vertex y be W(x,y) for some symmetric function on M^2; nowadays this is called the “graphon” point of view. Now the matrix is replaced by an operator on functions: $A_Wf(x) = \int_M f(y)W(x,y)$,

the probability g(x) that a vertex of type x is in the giant component is a solution of the integral equation $g = 1-e^{-Ag}$

and a giant component exists just when the operator norm $||A_W||_2$ is greater than 1. This is the kind of analysis you’d want to use if you wanted to really start taking geography into account. For instance, take the vertices to be random points in a disc and let W(x,y) be a decreasing function of |x-y|, modeling a network where infection likelihood is a decreasing function of distance. What is the norm of the operator A_W in this case? I’ll bet my harmonic analyst friends know, if any of them are reading this. But I do not.

Update: Now I know, because my harmonic analyst friend Brian Street told me it’s just the integral over W(x,y) over y, which is the same for all y (well, at least it is if we’re on the whole of R^d.) Call that number V. He gave a very nice Fourier-theoretic argument but now that I know the answer I’m gonna explain it in terms of the only part of math I actually understand, 2×2 matrices. Here’s how it goes. In this model, each vertex has the same expected number of neighbors, namely that integral V above. But to say every vertex has the same expected number of neighbors is to say that 1 is an eigenvector for A. If 1 were any eigenvector other than Perron-Frobenius, it would be orthogonal to Perron-Frobenius, which it can’t be because both have positive entries, so it is Perron-Frobenius, so λ = V.

In fact I had noticed this fact in one of the papers I looked at while writing this (that if the matrix had all row-sums the same, the long-term behavior didn’t depend on the matrix) but didn’t understand why until just this minute. So this is kind of cool — if the kind of heterogeneity the network exhibits doesn’t cause different kinds of vertices to have different mean degree, you can just pretend the network is homogeneous with whatever mean R_0 it has. This is a generalization of the fact that two towns with no contact which have the same R_0 can be treated as one town with the same R_0 and you don’t get anything wrong.

## The Lovasz number of the plane is about 3.48287

As seen in this comment on Polymath and explicated further in Fernando de Oliveira Filho’s thesis, section 4.4.

I actually spent much of today thinking about this so let me try to explain it in a down-to-earth way, because it involved me thinking about Bessel functions for the first time ever, surely a life event worthy of recording.

So here’s what we’re going to do.  As I mentioned last week, you can express this problem as follows:  suppose you have a map h: R^2 -> V, for some normed vector space V, which is a unit-distance embedding; that is, if |x-x’|_{R^2} = 1, then |h(x)-h(x’)|_V = 1.  (We don’t ask that h is an isometry, only that it preserves the distance-1 set.)

Then let t be the radius of the smallest hypersphere in V containing h(R^2).

Then any graph embeddable in R^2 with all edges of length 1 is sent to a unit-distance graph in V contained in the hyperplane of radius t; this turns out to be equivalent to saying the Lovasz number of G (ok, really I mean the Lovasz number of the complement of G) is at most 1/(1-2t).  So we want to show that t is bounded below 1, is the point.  Or rather:  we can find a V and a map from R^2 to V to make this the case.

So here’s one!  Let V be the space of L^2 functions on R^2 with the usual inner product.  Choose a square-integrable function F on R^2 — in fact let’s normalize to make F^2 integrate to 1 — and for each a in R^2 we let h(a) be the function F(x-a).

We want the distance between F(x-a) and F(x-b) to be the same for every pair of points at distance 1 from each other; the easiest way to arrange that is to insist that F(x) be a radially symmetric function F(x) = f(|x|); then it’s easy to see that the distance between F(x-a) and F(x-b) in V is a function G(a-b) which depends only on |a-b|.  We write $g(r) = \int_{\mathbf{R}^2} F(x)F(x-r) dx$

so that the squared distance between F(x) and F(x-r) is $\int F(x)^2 dx - 2 \int F(x)F(x-r) dx + \int F(x-r)^2 dx = 2(1-g(r))$.

In particular, if two points in R^2 are at distance 1, the squared distance between their images in V is 2(1-g(1)).  Note also that g(0) is the square integral of F, which is 1.

What kind of hypersphere encloses all the points F(x-a) in V?  We can just go ahead and take the “center” of our hypersphere to be 0; since |F| = 1, every point in h(R^2) lies in (indeed, lies on) the sphere of radius 1 around the origin.

Hey but remember:  we want to study a unit-distance embedding of R^2 in V.  Right now, h sends unit distances to the distance 2(1-g(1)), whatever that is.  We can fix that by scaling h by the square root of that number.  So now h sends unit distances to unit distances, and its image is enclosed in a hypersphere of radius

2(1-g(1))^{-1}

The more negative g(1) is, the smaller this sphere is, which means the more we can “fold” R^2 into a small space.  Remember, the relationship between hypersphere number and Lovasz theta is $2t + 1/\theta = 1$

and plugging in the above bound for the hypersphere number, we find that the Lovasz theta number of R^2, and thus the Lovasz theta number of any unit-distance graph in R^2, is at most

1-1/g(1).

So the only question is — what is g(1)?

Well, that depends on what g is.

Which depends on what F is.

Which depends on what f is.

And of course we get to choose what f is, in order to make g(1) as negative as possible.

How do we do this?  Well, here’s the trick.  The function G is not arbitrary; if it were, we could make g(1) whatever we wanted.  It’s not hard to see that G is what’s called a positive definite function on R^2.  And moreover, if G is positive definite, there exists some f giving rise to it.  (Roughly speaking, this is the fact that a positive definite symmetric matrix has a square root.)  So we ask:  if G is a positive definite (radially symmetric) function on R^2, and g(0) = 1, how small can g(1) be?

And now there’s an old theorem of (Wisconsin’s own!) Isaac Schoenberg which helpfully classifies the positive definite functions on R^2; they are precisely the functions G(x) = g(|x|) where g is a mixture of scalings of the Bessel function $J_0$: $g(r) = \int_0^\infty J_0(ur) A(u)$

for some everywhere nonnegative A(u).  (Actually it’s more correct to say that A is a distribution and we are integrating J_0(ur) against a non-decreasing measure.)

So g(1) can be no smaller than the minimum value of J_0 on [0,infty], and in fact can be exactly that small if you let A become narrowly supported around the minimum argument.  This is basically just taking g to be a rescaled version of J_0 which achieves its minimum at 1.  That minimum value is about -0.4, and so the Lovasz theta for any unit-distance subgraph on the plane is bounded above by a number that’s about 1 + 1/0.4 = 3.5.

To sum up:  I give you a set of points in the plane, I connect every pair that’s at distance 1, and I ask how you can embed that graph in a small hypersphere keeping all the distances 1.  And you say:  “Oh, I know what to do, just assign to each point a the radially symmetrized Bessel function J_0(|x-a|) on R^2, the embedding of your graph in the finite-dimensional space of functions spanned by those Bessel translates will do the trick!”

That is cool!

Remark: Oliveira’s thesis does this for Euclidean space of every dimension (it gets more complicated.)  And I think (using analysis I haven’t really tried to understand) he doesn’t just give an upper bound for the Lovasz number of the plane as I do in this post, he really computes that number on the nose.

Update:  DeCorte, Oliveira, and Vallentin just posted a relevant paper on the arXiv this morning!

Tagged , ,

## What is the Lovasz number of the plane?

There are lots of interesting invariants of a graph which bound its chromatic number!  Most famous is the Lovász number, which asks, roughly:  I attach vectors v_x to each vertex x such that v_x and v_y are orthogonal whenever x and y are adjacent, I try to stuff all those vectors into a small cone, the half-angle of the cone tells you the Lovász number, which is bigger and bigger as the smallest cone gets closer and closer to a hemisphere.

Here’s an equivalent formulation:  If G is a graph and V(G) its vertex set, I try to find a function f: V(G) -> R^d, for some d, such that

|f(x) – f(y)| = 1 whenever x and y are adjacent.

This is called a unit distance embedding, for obvious reasons.

The hypersphere number t(G) of the graph is the radius of the smallest sphere containing a unit distance embedding of G.  Computing t(G) is equivalent to computing the Lovász number, but let’s not worry about that now.  I want to generalize it a bit.  We say a finite sequence (t_1, t_2, t_3, … ,t_d) is big enough for G if there’s a unit-distance embedding of G contained in an ellipsoid with major radii t_1^{1/2}, t_2^{1/2}, .. t_d^{1/2}.  (We could also just consider infinite sequences with all but finitely many terms nonzero, that would be a little cleaner.)

Physically I think of it like this:  the graph is trying to fold itself into Euclidean space and fit into a small region, with the constraint that the edges are rigid and have to stay length 1.

Sometimes it can fold a lot!  Like if it’s bipartite.  Then the graph can totally fold itself down to a line segment of length 1, with all the black vertices going to one end and the white vertices going to the other.  And the big enough sequences are just those with some entry bigger than 1.

On the other hand, if G is a complete graph on k vertices, a unit-distance embedding has to be a simplex, so certainly anything with k of the t_i of size at least 1-1/k is big enough.   (Is that an if and only if?  To know this I’d have to know whether an ellipse containing an equilateral triangle can have a radius shorter than that of the circumcircle.)

Let’s face it, it’s confusing to think about ellipsoids circumscribing embedded graphs, so what about instead we define t(p,G) to be the minimum value of the L^p norm of (t_1, t_2, …) over ellipsoids enclosing a unit-distance embedding of G.

Then a graph has a unit-distance embedding in the plane iff t(0,G) <= 2.  And t(oo,G) is just the hypersphere number again, right?  If G has a k-clique then t(p,G) >= t(p,K_k) for any p, while if G has a k-coloring (i.e. a map to K_k) then t(p,G) <= t(p,K_k) for any n.  In particular, a regular k-simplex with unit edges fits into a sphere of squared radius 1-1/k, so t(oo,G) < 1-1/k.

So… what’s the relation between these invariants?  Is there a graph with t(0,G) = 2 and t(oo,G) > 4/5?  If so, there would be a non-5-colorable unit distance graph in the plane.  But I guess the relationship between these various “norms” feels interesting to me irrespective of any relation to plane-coloring.  What is the max of t(oo,G) with t(0,G)=2?

The intermediate t(p,G) all give functions which upper-bound clique number and lower-bound chromatic number; are any of them interesting?  Are any of them easily calculable, like the Lovász number?

Remarks:

1.  I called this post “What is the Lovász number of the plane?” but the question of “how big can t(oo,G) be if t(0,G)=2”? is more a question about finite subgraphs of the plane and their Lovász numbers.  Another way to ask “What is the Lovász number of the plane” would be to adopt the point of view that the Lovász number of a graph has to do with extremizers on the set of positive semidefinite matrices whose (i,j) entry is nonzero only when i and j are adjacent vertices or i=j.  So there must be some question one could ask about the space of positive semidefinite symmetric kernels K(x,y) on R^2  x R^2 which are supported on the locus ||x-y||=1 and the diagonal, which question would rightly be called “What is the Lovász number of the plane?” But I’m not sure what it is.
2. Having written this, I wonder whether it might be better, rather than thinking about enclosing ellipsoids of a set of points in R^d, just to think of the n points as an nxd matrix X and compute the singular values of X^T X, which would be kind of an “approximating ellipsoid” to the points.  Maybe later I’ll think about what that would measure.  Or you can!

## When random people give money to random other people

A post on Decision Science about a problem of Uri Wilensky‘s has been making the rounds:

Imagine a room full of 100 people with 100 dollars each. With every tick of the clock, every person with money gives a dollar to one randomly chosen other person. After some time progresses, how will the money be distributed?

People often expect the distribution to be close to uniform.  But this isn’t right; the simulations in the post show clearly that inequality of wealth rapidly appears and then persists (though each individual person bobs up and down from rich to poor.)  What’s going on?  Why would this utterly fair and random process generate winners and losers?

Here’s one way to think about it.  The possible states of the system are the sets of nonnegative integers (m_1, .. m_100) summing to 10,000; if you like, the lattice points inside a simplex.  (From now on, let’s write N for 100 because who cares if it’s 100?)

The process is a random walk on a graph G, whose vertices are these states and where two vertices are connected if you can get from one to the other by taking a dollar from one person and giving it to another.  We are asking:  when you run the random walk for a long time, where are you on this graph?  Well, we know what the stationary distribution for random walk on an undirected graph is; it gives each vertex a probability proportional to its degree.  On a regular graph, you get uniform distribution.

Our state graph G isn’t regular, but it almost is; most nodes have degree N, where by “most” I mean “about 1-1/e”; since the number of states is $N^2 + N - 1 \choose N-1$

and, of these, the ones with degree N are exactly those in which nobody’s out of money; if each person has a dollar, the number of ways to distribute the remaining N^2 – N dollars is $N^2 - 1 \choose N-1$

and so the proportion of states where someone’s out of money is about $\frac{(N^2 - 1)^N}{(N^2 + N - 1)^N} \sim (1-1/N)^N \sim 1/e$.

So, apart from those states where somebody’s broke, in the long run every possible state is equally likely;  we are just as likely to see $9,901 in one person’s hands and everybody else with$1 as we are to see exact equidistribution again.

What is a random lattice point in this simplex like?  Good question!  An argument just like the one above shows that the probability nobody goes below \$c is on order e^-c, at least when c is small relative to N; in other words, it’s highly likely that somebody’s very nearly out of money.

If X is the maximal amount of money held by any player, what’s the distribution of X?  I didn’t immediately see how to figure this out.  You might consider the continuous version, where you pick a point at random from the real simplex $(x_1, .. x_N) \in \mathbf{R}^N: \sum x_i = N^2$.

Equivalently; break a stick at N-1 randomly chosen points; what is the length of the longest piece?  This is a well-studied problem; the mean size of the longest piece is about N log N.  So I guess I think maybe that’s the expected value of the net worth of the richest player?

But it’s not obvious to me whether you can safely approximate the finite problem by its continuous limit (which corresponds to the case where we keep the number of players at N but reduce the step size so that each player can give each other a cent, or a picocent, or whatever.)

What happens if you give each of the N players just one dollar?  Now the uniformity really breaks down, because it’s incredibly unlikely that nobody’s broke.  The probability distribution on the set of (m_1, .. m_N) summing to N assigns each vector a probability proportional to the size of its support (i.e. the number of m_i that are nonzero.)  That must be a well-known distribution, right?  What does the corresponding distribution on partitions of N look like?

Update:  Kenny Easwaran points out that this is basically the same computation physicists do when they compute the Boltzmann distribution, which was new to me.

Tagged , ,

## Metric chromatic numbers and Lovasz numbers

In the first post of this series I asked whether there was a way to see the Lovasz number of a graph as a chromatic number.  Yes!  I learned it from these notes, written by Big L himself.

Let M be a metric space, and let’s assume for simplicity that M has a transitive group of isometries.  Now write r_M(n) for the radius of the smallest ball containing n points whose pairwise distances are all at least 1.  (So this function is controlling how sphere-packing works in M.)

Let Γ be a graph.  By an M-coloring of Γ we now mean a map from v(Γ) to M such that adjacent vertices are at distance at least 1.  Write χ_Γ(M) for the radius of the smallest disc containing an M-coloring of Γ.  Then we can think of r^{-1}(χ_Γ(M)) as a kind of “M-chromatic number of Γ.”  Scare quotes are because r isn’t necessarily going to be an analytic function or anything; if I wanted to say something literally correct I guess I would say the smallest integer n such that r_M(n) >= χ_Γ(M).

The M-chromatic number is less than the usual chromatic number χ_Γ;  more precisely,

χ_Γ(M) <= r_M(χ_Γ)

Easy:  if there’s an n-coloring of Γ, just compose it with the map from [n] to M of radius r_M(n).  Similary, if ω_Γ is the clique number of Γ, we have

r_M(ω_Γ) <= χ_Γ(M)

because a k-clique can’t be embedded in a ball of radius smaller than r_M(k).

So this M-chromatic number gives a lower bound for the chromatic number and an upper bound for the clique number, just as the Lovasz number does, and just as the fractional chromatic number does.

Example 1:  Lovasz number.  Let M be the sphere in infinite-dimensional Euclidean space.  (Or |Γ|-dimensional Euclidean space, doesn’t matter.)  For our metric use (1/sqrt(2)) Euclidean distance, so that orthogonal vectors are at distance 1 from each other.  If n points are required at pairwise distance at least 1, the closest way to pack them is to make them orthonormal (I didn’t check this, surely easy) and in this case they sit in a ball of radius 1-sqrt(1/2n) around their center of mass.  So r_M(n) = 1 – sqrt(1/2n).  Define t(Γ) to be the real number such that $1 - \sqrt{1/2t(\Gamma)} = \chi_\Gamma(M)$.

Now I was going to say that t(Γ) is the Lovasz theta number of Γ, but that’s not exactly the definition; that would be the definition if I required the embedding to send adjacent vertices to points at distance exactly 1.  The answer to this MO question suggests that an example of Schrijver might actually separate these invariants, but I haven’t checked.

So I guess let’s say t(Γ) is a “Lovasz-like number” which is between the clique number and the chromatic number.  And like the Lovasz number, but unlike clique and chromatic numbers, it’s super-easy to compute.  An embedding of v(Γ) in the sphere, up to rotation, is specified by the pairwise distance matrix, which can be an arbitrary postive definite symmetric nxn matrix A with 1’s on the diagonal.  Each edge of Γ now gives an inequality a_{ij} > 1.  When you’re optimizing over a space cut out by linear inequalities in the space of psd matrices, you’re just doing semidefinite programming.  (I am punting a little about how to optimize “radius” but hopefully maximum distance of any vector from center of mass is good enough?)

(Note:  you know what, I’ll bet you can take an embedding like this, subtract a small multiple of the center of mass from all the vectors, and get an embedding of v(Γ) in n-space with all angles between adjacent vectors slightly obtuse, and probably this ends up being exactly the same thing as the vector chromatic number defined in the paper I linked to earlier.)

Where is example 2?  It was supposed to be about the fractional chromatic number but then I realized the way I was setting this up wasn’t correct.  The idea is to let M_b be the space of infinite bit strings with exactly b 1’s and use (1/2b) Hamming distance, so that the distance-1 requirement becomes a requirement that two b-element subsets be disjoint.  But I don’t think this quite fits into the framework I adopted at the top of the post.  I’ll circle back to this if I end up having what to say.

## Coloring graphs with polynomials

More chromatic hoonja-doonja!

Suppose you have a bunch of tokens of different colors and weights.  X_1 colors of weight 1 tokens, X_2 colors of weight 2 tokens, etc.

Let Γ be a graph.  A (weighted) b-coloring of Γ is an assignment to each vertex of a set of tokens with total weight b, such that adjacent vertices have no tokens in common.  Let χ_Γ(X_1, … X_b) be the number of b-colorings of Γ.  I made up this definition but I assume it’s in the literature somewhere.

First of all, χ_Γ(X_1, … X_b) is a polynomial.

Is this multivariable “chromatic polynomial” of any interest?  Well, here’s one place it comes up.  By a degree-b polynomial coloring of Γ we mean an assignment of a monic squarefree degree d polynomial in R[x] to each vertex of Γ, so that adjacent vertices are labeled with coprime polynomials.   Now let U_b(Γ) be the manifold parametrizing degree-b colorings of Γ.

Then the Euler characteristic of U_b(Γ) is χ_Γ(-1,1,0,…0).

I worked this out via the same kind of Lefschetz computation as in the previous post, but once you get the answer, you can actually derive this as a corollary of Stanley’s theorem.  And it was presumably already known.

By the way:  let V_n be the free vector space spanned by the b-colorings of Γ where all the tokens have weight 1; these are called fractional colorings sometimes.  Then S_n acts on V_n by permutation of colors.  The character of this action is a class function on S_n.  More precisely, it is

χ_Γ(X_1, … X_b)

where X_i is now interpreted as a class function on S_n, sending a permutation to the number of i-cycles in its cycle decomposition.  Of course the real thing going on behind the scenes is that the V_n form a finitely generated FI-module.

## Counting acyclic orientations with topology

Still thinking about chromatic polynomials.   Recall: if Γ is a graph, the chromatic polynomial χ_Γ(n) is the number of ways to color the vertices of Γ in which no two adjacent vertices have the same color.

Fact:  χ_Γ(-1) is the number of acyclic orientations of Γ.

This is a theorem of Richard Stanley from 1973.

Here’s a sketch of a weird proof of that fact, which I think can be made into an actual weird proof.  Let U be the hyperplane complement $\mathbf{A}^|\Gamma| - \bigcup_{ij \in e(\Gamma)} (z_i = z_j)$

Note that |U(F_q)| is just the number of colorings of Γ by elements of F_q; that is,  χ_Γ(q).  More importantly, the Poincare polynomial of the manifold U(C) is (up to powers of -1 and t) χ_Γ(-1/t).  The reason |U(F_q)| is  χ_Γ(q) is that Frobenius acts on H^i(U) by q^{-i}.  (OK, I switched to etale cohomology but for hyperplane complements everything’s fine.)  So what should  χ_Γ(-1) mean?  Well, the Lefschetz trace formula suggests you look for an operator on U(C) which acts as -1 on the H^1, whence as (-1)^i on the H^i.  Hey, I can think of one — complex conjugation!  Call that c.

Then Lefchetz says χ_Γ(-1) should be the number of fixed points of c, perhaps counted with some index.  But careful — the fixed point locus of c isn’t a bunch of isolated points, as it would be for a generic diffeo; it’s U(R), which has positive dimension!  But that’s OK; in cases like this we can just replace cardinality with Euler characteristic.  (This is the part that’s folkloric and sketchy.)  So

χ(U(R)) = χ_Γ(-1)

at least up to sign.  But U(R) is just a real hyperplane complement, which means all its components are contractible, so the Euler characteristic is just the number of components.  What’s more:  if (x_1, … x_|Γ|) is a point of U(R), then x_i – x_j is nonzero for every edge ij; that means that the sign of x_i – x_j is constant on every component of U(R).  That sign is equivalent to an orientation of the edge!  And this orientation is obviously acyclic.  Furthermore, every acyclic orientation can evidently be realized by a point of U(R).

To sum up:  acyclic orientations are in bijection with the connected components of U(R), which by Lefschetz are χ_Γ(-1) in number.

## Metric chromatic numbers

Idle thought.  Let G be a (loopless) graph, let M be a metric space, and let b be a nonnegative real number.  Then let Conf(G,M,b) be the space of maps from the vertices of the graph to M such that no two adjacent vertices are within b of each other.

When b=0 and G is the complete graph K_n, this is the usual ordered configuration space of n points on M.  When G is the empty graph on n vertices, it’s just M^n.  When M is a set of N points with the discrete metric, Conf(G,M,b) is the same thing for every b;  a set of points whose cardinality is χ_G(N), the chromatic polynomial of G evaluated at N.  When M is a box, Conf(G,M,b) is the “discs in a box” space I blogged about here.  If M is (Z/2Z)^k with Hamming distance, you are asking about how many ways you can supply G with k 2-colorings so that no edge is monochromatic in more than k-b-1 of them.

Update:  Ian Agol links in the comments to this paper about Conf(G,M,0) by Eastwood and Huggett; the paper points out that the Euler characteristic of Conf(G,M,0) computes χ_G(N) whenever M has Euler characteristic N; so M being N points works, but so does M = CP^{N-1}, and that’s the case they study.

When b=0 and G is the complex plane, Conf(G,C,0) is the complement of the graphic arrangement of G; its Poincare polynomial is  t^-{|G|} χ_G(-1/t).  Lots of graphs have the same chromatic polynomial (e.g. all trees do) but do they have homeomorphic Conf(G,C,0)?

This is fun to think about!  Is Conf(G,M,0), for various manifolds other than discrete sets of points, an interesting invariant of a graph?

You can think of

vol(Conf(G,M,b)) vol(M)^{-n}

as a sort of analogue of the chromatic polynomial, especially when b is small; when M = C, for instance, Conf(G,M,b) is just the complement of tubular neighborhoods around the hyperplanes in the graphical arrangement, and its volume should be roughly b^|G|χ_G(1/b) I think.  When b gets big, this function deviates from the chromatic polynomial, and in particular you can ask when it hits 0.

In other words:  you could define an M-chromatic number χ(G,M) to be the smallest B such that Conf(G,M,1/B) is nonempty.  When M is a circle S^1 with circumference 1, you can check that χ(G,M) is at least the clique number of G and at most the chromatic number.  If G is a (2n+1)-cycle, the clique number is 2, the chromatic number is 3, and the S^1-chromatic number is 2+1/n, if I did this right.  Does it have anything to do with the Lovasz number, which is also wedged between clique number and chromatic number?  Relevant here:  the vector chromatic number, which is determined by χ(G,S^{v(G)-1}), and which is in fact a lower bound for the Lovasz number.

## My Erdos-Bacon-Sabbath number is 11

I am pleased to report that I have an Erdös-Bacon-Sabbath number.

My Erdös number is 3; has been for a while, probably always will be.  I wrote a paper with Mike Bennett and Nathan Ng about solutions to A^4 + B^2 = C^p; Mike wrote with Florian Luca; Luca wrote with Erdös.

A while back, I shot a scene for the movie Gifted.  I’m not on the IMDB page yet, but I play against type as “Professor.”  Also in this movie is Octavia Spencer, who was in Beauty Shop (2005) with Kevin Bacon.  So my Bacon number is 2.

That gives me an Erdös-Bacon number of 5; already pretty high on the leaderboard!

Of course it then fell to me to figure out whether I have a Sabbath number.  Here’s the best chain I could make.

I once played guitar on “What Goes On” with my friend Jay Michaelson‘s band, The Swains, at Brownies.

Jay performed with Ezra Lipp “sometime in 2000,” he reports.

From here we use the Six Degrees of Black Sabbath tool, written by Paul Lamere at EchoNest (now part of the Spotify empire.)

The Black Crowes backed up Jimmy Page at a concert in 1999.

Page played with David Coverdale in Coverdale.Page.

David Coverdale was in Deep Purple with Glenn Hughes of Black Sabbath.

So my Sabbath number is 6, and my Erdos-Bacon-Sabbath number is 11.

## What I learned at the Joint Math Meetings

Another Joint Meetings in the books!  My first time in San Antonio, until last weekend the largest US city I’d never been to.  (Next up:  Jacksonville.)  A few highlights:

• Ngoc Tran, a postdoc at Austin, talked about zeroes of random tropical polynomials.  She’s proved that a random univariate tropical polynomial of degree n has about c log n roots; this is the tropical version of an old theorem of Kac, which says that a random real polynomial of degree n has about c log n real roots.  She raised interesting further questions, like:  what does the zero locus of a random tropical polynomial in more variables look like?  I wonder:  does it look anything like the zero set of a random band-limited function on the sphere, as discussed by Sarnak and Wigman?  If you take a random tropical polynomial in two variables, its zero set partitions the plane into polygons, which gives you a graph by adjacency:  what kind of random graph is this?
• Speaking of random graphs, have you heard the good news about L^p graphons?  I missed the “limits of discrete structures” special session which had tons of talks about this, but I ran into the always awesome Henry Cohn, who gave me the 15-minute version.  Here’s the basic idea.  Large dense graphs can be modeled by graphons; you take a symmetric function W from [0,1]^2 to [0,1], and then your procedure for generating a random graph goes like this. Sample n points x_1,…x_n uniformly from [0,1] — these are your vertices.  Now put an edge between x_i and x_j with probability W(x_i,x_j) = W(x_j,x_i).  So if W is constant with value p, you get your usual Erdös-Renyi graphs, but if W varies some, you can get variants of E-R, like the much-beloved stochastic blockmodel graphs, that have some variation of edge density.  But not too much!  These graphon graphs are always going to have almost all vertices with degree linear in n.  That’s not at all like the networks you encounter in real life, which are typically sparse (vertex degrees growing sublinearly in n, or even being constant on average) and typically highly variable in degree (e.g. degrees following a power law, not living in a band of constant multiplicative width.)  The new theory of L^p graphons is vastly more general.  I’ve only looked at this paper for a half hour but I feel like it’s the answer to a question that’s always bugged me; what are the right descriptors for the kinds of random graphs that actually occur in nature?  Very excited about this, will read it more, and will give a SILO seminar about it on February 4, for those around Madison.
• Wait, I’ve got still one more thing about random graphs!  Russ Lyons gave a plenary about his work with Angel and Kechris about unique ergodicity of the action of the automorphism group of the random graph.  Wait, the random graph? I thought there were lots of random graphs!  Nope — when you try to define the Erdös-Renyi graph on countably many vertices, there’s a certain graph (called “the Rado graph”) to which your random graph is isomorphic with probability 1!  What’s more, this is true — and it’s the same graph — no matter what p is, as long as it’s not 0 or 1!  That’s very weird, but proving it’s true is actually pretty easy.  I leave it an exercise.
• Rick Kenyon gave a beautiful talk about his work with Aaron Abrams about “rectangulations” — decompositions of a rectangle into area-1 subrectangles.  Suppose you have a weighted directed graph, representing a circuit diagram, where the weights on the edges are the conductances of the corresponding wires.  It turns out that if you fix the energy along each edge (say, to 1) and an acyclic orientation of the edges, there’s a unique choice of edge conductances such that there exists a Dirichlet solution (i.e. an energy-minimizing assignment of a voltage to each node) with the given energies.  These are the fibers of a rational map defined over Q, so this actually gives you an object over a (totally real) algebraic number field for each acyclic orientaton.  As Rick pointed out, this smells a little bit like dessins d’enfants!  (Though I don’t see any direct relation.)  Back to rectangulations:  it turns out there’s a gadget called the “Smith Diagram” which takes a solution to the Dirichlet problem on the graph  and turns it into a rectangulation, where each edge corresponds to a rectangle, the area of the rectangle is the energy contributed by the current along that edge, the aspect ratio of the rectangle is the conductance, the bottom and top faces of the rectangle correspond to the source and target nodes, the height of a face is the voltage at that node, and etc.  Very cool!  Even cooler when you see the pictures.  For a 40×40 grid, it looks like this: