Category Archives: math

Fox-Neuwirth-Fuks cells, quantum shuffle algebras, and Malle’s conjecture for function fields: a new old paper

I have a new paper up on the arXiv today with TriThang Tran and Craig Westerland, “Fox-Neuwirth-Fuks cells, quantum shuffle algebras, and Malle’s conjecture for function fields.”

There’s a bit of a story behind this, but before I tell it, let me say what the paper’s about. The main result is an upper bound for the number of extensions with bounded discriminant and fixed Galois group of a rational function field F_q(t). More precisely: if G is a subgroup of S_n, and K is a global field, we can ask how many degree-n extensions of K there are whose discriminant is at most X and whose Galois closure has Galois group G. A long-standing conjecture of Malle predicts that this count is asymptotic to c X^a (log X)^b for explicitly predicted exponents a and b. This is a pretty central problem in arithmetic statistics, and in general it still seems completely out of reach; for instance, Bhargava’s work allows us to count quintic extensions of Q, and this result was extended to global fields of any characteristic other than 2 by Bhargava, Shankar, and Wang. But an asymptotic for the number of degree 6 extensions would be a massive advance.

The point of the present paper is to prove upper bounds for counting field extensions in the case of arbitrary G and rational function fields K = F_q(t) with q prime to and large enough relative to |G|; upper bounds which agree with Malle’s conjecture up to the power of log X. I’m pretty excited about this! Malle’s conjecture by now has very robust and convincing heuristic justification, but there are very few cases where we actually know anything about G-extensions for any but very special classes of finite groups G. There are even a few very special cases where the method gives both upper and lower bounds (for instance, A_4-extensions over function fields containing a cube root of 3.)

The central idea, as you might guess from the authors, is to recast this question as a problem about counting F_q-rational points on moduli spaces of G-covers, called Hurwitz spaces; by the Grothendieck-Lefschetz trace formula, we can bound these point counts if we can bound the etale Betti numbers of these spaces, and by comparison between characteristic p and characteristic 0 we can turn this into a topological problem about bounding cohomology groups of the braid group with certain coefficients.

Actually, let me say what these coefficients are. Let c be a subset of a finite group G closed under conjugacy, k a field, and V the k-vectorspace spanned by c. Then V^{\otimes n} is spanned by the set of n-tuples (g_1, … , g_n) in c^n, and this set carries a natural action of the braid group, where twining strand i past strand i+1 corresponds to the permutation

(g_1, \ldots, g_n) \rightarrow (g_1, \ldots, g_{i+1}, g_{i+1}^{-1} g_i g_{i+1}, \ldots, g_n).

So for each n we have a representation of the braid group Br_n, and it turns out that everything we desire would be downstream from good bounds on

\dim H^i(Br_n, V^{\otimes n})

So far, this is the same strategy (expressed a little differently) than was used in our earlier paper with Akshay Venkatesh to get results towards the Cohen-Lenstra conjecture over F_q(t). That paper concerned itself with the case where G was a (modestly generalized) dihedral group; there was a technical barrier that prevented us from saying anything about more general groups, and the novelty of the present paper is to find a way past that restriction. I’m not going to say very much about it here! I’ll just say it turns out that there’s a really nice way to package the cohomology groups above — indeed, even more generally, whenever V is a braided vector space, you have these braid group actions on the tensor powers, and the cohomology groups can be packaged together as the Ext groups over the quantum shuffle algebra associated to V. And it is this quantum shuffle algebra (actually, mostly its more manageable subalgebra, the Nichols algebra) that the bulk of this bulky paper studies.

But now to the story. You might notice that the arXiv stamp on this paper starts with 17! So yes — we have claimed this result before. I even blogged about it! But… that proof was not correct. The overall approach was the same as it is now, but our approach to bounding the cohomology of the Nichols algebra just wasn’t right, and we are incredibly indebted to Oscar Randall-Williams for making us aware of this.

For the last six years, we’ve been working on and off on fixing this. We kept thinking we had the decisive fix and then having it fall apart. But last spring, we had a new idea, Craig came and visited me for a very intense week, and by the end I think we were confident that we had a route — though getting to the present version of the paper occupied months after that.

A couple of thoughts about making mistakes in mathematics.

  • I don’t think we really handled this properly. Experts in the field certainly knew we weren’t standing by the original claim, and we certainly told lots of people this in talks and in conversations, and I think in general there is still an understanding that if a preprint is sitting up on the arXiv for years and hasn’t been published, maybe there’s a reason — we haven’t completely abandoned the idea that a paper becomes more “official” when it’s refereed and published. But the right thing to do in this situation is what we did with an earlier paper with an incorrect proof — replaced the paper on arXiv with a placeholder saying it was inaccurate, and issued a public announcement. So why didn’t we do that? Probably because we were constantly in a state of feeling like we had a line on fixing the paper, and we wanted to update it with a correct version. I don’t actually think that’s a great reason — but that was the reason.
  • When you break a bone it never exactly sets back the same way. And I think, having gotten this wrong before, I find it hard to be as self-assured about it as I am about most things I write. It’s long and it’s grainy and it has a lot of moving parts. But we have checked it as much as it’s possible for us to check it, over a long period of time. We understand it and we think we haven’t missed anything and so we think it’s correct now. And there’s no real alternative to putting it out into the world and saying we think it’s correct now.
Tagged , , , , , ,

Little did I know (real analysis edition)

I just finished teaching Math 521, undergraduate real analysis. I first took on this course as an emergency pandemic replacement, and boy did I not know how much I would like teaching it! You get a variety of students — our second and third year math majors, econ majors aiming for top Ph.D. programs, financial math people, CS people — students learning analysis for all kinds of reasons.

A fun thing about teaching outside my research area is encountering weird little facts I don’t know at all — facts which, were they of equal importance and obscurity and size and about algebra, I imagine I would just know. For instance, I was talking about the strategy of the Riemann integral, before launching into the formal definition, as “you are trying to find a sequence of step functions which are getting closer and closer to f, because step functions are the ones you a priori know how to integrate.” But do Riemann-integrable functions actually have sequences of step functions converging to them uniformly? No! It turns out the class of functions which are uniform limits of step functions is called the regulated functions and an equivalent characterization of regulated functions is that the right and left limits f(x+) and f(x-) exist for any x.

Tagged ,

Compactness as groupwork

Compactness is one of the hardest things in the undergrad curriculum to teach, I find. It’s very hard for students to grasp that the issue is not “is there a finite collection of open sets that covers K” but rather “for every collection of open sets that covers K, some finite subcollection covers K.” This year I came up with a new metaphor. I asked the students to think about how, when a professor assigns a group project, there’s always a very small subgroup of the students who does all the work. That’s sort of how compactness works! Yes, the professor assigned infinitely many open sets to the project, but actually most of them are not really contributing. And it doesn’t matter if some small set of students could do the project; what matters is that some small group among the students assigned to the project is going to end up doing it! And this seems to happen — my students nodded in agreement — no matter how the professor forms the group.

Tagged ,

There’s only one thing that I know how to do well

Last week I moderated (virtually) a discussion at Stanford between my poetry friend Stephanie Burt and my category theory friend Emily Riehl, on the topic of “identity” — specifically the question of how, in lyric poetry and in mathematics, one addresses the complex topic of what we do when we identify; whether this means “identifying with” a character in a song or poem or story, or identifying two objects which are not equal but which, in Poincare’s phrase, we “call by the same name.”

What I realized after the fact is that, as in so many other matters, the most succinct formulation is in a They Might Be Giants lyric:

There’s only one thing that I know how to do well
I’ve often been told that you only can do what you know how to do well
And that’s be you,
Be what you’re like,
Be like yourself

Surely this points to three different notions that appeared in the discussion:

  • “be you” — to say that you are you is to assert equality
  • “be what you’re like” — that is, have exactly the properties that you have and no others — an assertion of indiscernibility
  • “be like yourself” — this is the assertion of relation (here denoted a “likeness”) between two entities that licenses us, following Poincare, in calling them by the same name — that is, an assertion of equivalence

Here’s YouTube of the discussion:

And here’s YouTube of the They Might Be Giants song, “Whistling in the Dark.” I remember seeing them play this in the fall of 1989, at the Paradise Rock Club, before the album came out, a song nobody had heard. A revelation!

Tagged , , , , ,

Why won’t anyone teach her math?

Lots of discussion in my feeds about this Daily Princetonian piece, “Why won’t anyone teach me math?” by first-year student Abigail Rabieh. She just took Math 202, an intro to linear algebra, and the experience was so lousy she’s done with Princeton math for good. That’s bad!

So what was wrong with Rabieh’s class?

“Though I passed MAT 202 class just fine, my experience in it was miserable. The way the course was run did not at all set up students to succeed — or even learn math. For example, though we were provided with practice problems to prepare for our exams, we were never given solutions. My class consistently begged my professor for these, yet all he could say was that not providing them was departmental policy, and it was out of his control.

This begs the question: what interest does a department have in making it impossible to study? Study materials are given so that students can learn the course material and prepare adequately for the exam. Solution sets are part of this: to properly learn, one needs to be able to identify their mistakes and understand why they are wrong. This struggle was reflected in our exam averages, which were, respectively, in the 50s, the 60s, and the 30s.

I am far from the only person who felt this way about my class. MAT 202 has an abysmal rating of 2.71 on during the spring 2020-2021 semester. The evaluations on the Office of the Registrar’s website are no better. Students described the course as “disheartening” and said they “lost a lot of respect for the Math department after taking this course.” The advice that came up again and again in many reviews was: “Don’t take this class unless you have to.”

A lot of math teachers instinctively reacted to this with defensiveness, and I was one of them. After all, what’s so bad here? You hand out practice problems for an exam because you want students to do the problems, not because you want them to read the solutions; the mechanism is that the student works all the problems they can and then asks in office hours or review session about the problems they couldn’t do. I don’t think it’s bad to include solutions, but I would never say that not doing so makes it “impossible to study.” Student evals, well — the literature on their unreliability is so vast and I’m so lazy that I’ll only just vaguely gesture at it. And of course, most people taking Math 202 are not taking it for intellectual broadening, as Rabieh admirably was; they are taking it because somebody told them they had to. That makes the evaluations impossible to compare with those for a course people take on purpose. And as for those exam scores, well — a median in the 30s is too low, that would mean I’d made the exam much too hard. A median in the 60s, on the other hand, seems fine to me, an indication that I’d written a test with real challenges but which people could mostly do.

But you know what? Our students, especially our first year students, don’t know that unless we tell them! A student who got into Princeton, or for that matter a student who got into UW-Madison, has probably never gotten a 60/100 on a test in their entire life. No wonder it’s demoralizing!

What we have here, as they say, is a failure to communicate. Rabieh came away feeling like her teacher didn’t care whether she learned linear algebra. I’m sure that’s not the case. But I think we often don’t explicitly demonstrate that care in our classrooms. It makes a difference! We are asking the students to care about our stuff and if we want them to respond, we have to show them that we care about their stuff. What do I mean by that, explicitly? I mean that if we think the median score on an exam is going to be in the 60s, we should tell students in advance that we expect that and talk about our reasons for writing the exam that way! I mean that we should ask for student input on how the course is going before the semester is over — three weeks in, send out a survey asking for suggestions, and then talk to the class about the feedback you got, showing them you want their input while you can still make use of it. It means that if you teach a crappy lecture one day — it happens! — be open about that the next time and talk about how you intend to present a better one next time. And I feel like these are mostly “things I already do,” which could just be self-flattery on my part, so let me add this: it might be that showing students we care could mean making nice prepared slides like the professors in all their non-math classes do, instead of just showing up and writing on the blackboard. (Doing this would be a huge change for me and it exhausts me to think about it — but would it be the right thing to do?)

We don’t really talk about this stuff when we talk about teaching. We mostly talk about content and logistics; in what order should we present the material, how much should we cover, how many quizzes should we give, what should our grading policy be, should we hand out solution sets for practice problems? That’s how we think about what good teaching is, and that’s how our students think about what good teaching is, and that’s why that’s the language Rabieh reached for in her article when she wanted to explain why she had such a bad time. But I don’t think it’s actually the problem.

I’ll bet her teacher did care. Most of us do. But it often doesn’t show; let’s say it out loud! And strive for a classroom where we’re working as partners towards a goal, not just trying to get to the end without feeling like we’ve failed.


“God, as the Pythagoreans said, is a geometer-but not an algebraist”

A remark of Simone Weil, which I learned from Karen Olsson’s book The Weil Conjectures.

In the original, “Pour moi, je pense bien que Dieu, selon la parole pythagoricienne, est un géomètre perpétuel – mais non pas un algébriste.” Here’s the letter to her brother from which this is taken.

I wanted to write more in Shape about the complicated moral weights people assigned to the difference/tension between algebra and geometry. I wrote a little about what Poincaré thought about it, who saw the two subjects as representing two temperaments, both indispensable, though any given mathematician might possess them in different proportions.

But then there is this, from S.I. Segal’s “Topologists in Hitler’s Germany”

“the … Nazi movement saw “truly German” mathematics as intuitive and tied to nature, often geometrical, and certainly not axiomatic. Axiomatics, “logic chopping”, too great abstraction, was Franco-Jewish.”

It is grimly funny to imagine the Nazi ideologues locating in abstract algebra and the insolubility of certain equations in radicals the ultimate origins of rootless cosmopolitanism.

Pandemic blog 24: enter the gamma

I blogged last week about how to think about “R_0,” the constant governing epidemic growth, when different people in the network had different transmissibility rates.

Today, inspired by Kai Kupferschmidt’s article in Science, I look another look at what happens when the transmission rates vary a lot among people. And I learned something new! So let me set that down.

First of all, et me make one point which is silly but actually has mathematical content. Suppose 90% of the entire population is entirely immune to the disease, and the other 10% each encounter 20 people, sharing extensive air with each one . Since only 2 of those 20 are going to be susceptible, the dynamics of this epidemic are the same as that of an epidemic with an R_0 of 2. So if you look at the exponential growth at the beginning of the epidemic, you would say to yourself “oh, the growth factor is 2, so that’s R_0, we should hit herd immunity at about 50% and end up covering about 80% of the population,” but no, because the population that’s relevant to the epidemic is only 10% the total population! So, to your surprise, the epidemic would crest at 5% prevalence and die out with only 8% of people having been infected.

So extreme heterogeneity really matters — the final spread of the epidemic can be completely decoupled from R_0 (if what we mean by R_0 is the top eigenvalue like last time, which measures the early-epidemic exponential rate of spread.)

In my last post, I included a graph of how spread looked in non-heterogeneous populations generated by 6×6 random matrices I chose randomly, and the graph showed that the top eigenvalue and the eventual spread were strongly coupled to each other. But if you choose a random 6×6 matrix the entries are probably not actually going to be that far apart! So I think this was a little misleading. If the transmissibility has a really long tail, things may be different, as the silly example shows. What follows is a somewhat less silly example.

The model of heterogeneity used in this famous paper seems to be standard. You take transmissibility to be a random variable drawn from a gamma distribution with mean B and shape parameter k. (I had to look up what this was!) The variance is B^2/k. As k goes to infinity, this approaches a variable which is exactly B with probability 1, but for k close to 0, the variable is often near zero but occasionally much larger than B. Superspreaders!

Just like in the last post, we are going to completely jettison reality and make this into a static problem about components of random graphs. I am less confident once you start putting in rare high-transmission events that these two models stay coupled together, but since the back-of-the-envelope stuff I’m doing here seems to conform with what the epidemiologists are getting, let’s go with it. In case you don’t feel like reading all the way to the end, the punchline is that on these kinds of models, you can have early exponential growth that looks like R_0 is 2 or 3, but an epidemic that peters out with a very small portion of people infected; the “herd immunity is closer than we think” scenario, as seen in this preprint of Gomes et al.

Let’s also stick with the “rank 1” case because it’s what’s in the paper I linked and there are already interesting results there. Write X for our gamma-distributed random variable.

Then, sticking with the notation from the last post, the mean number of transmissions per person, the “average R_0”, is

(\mathbf{E} X)^2 = B^2

(I guess I wrote the last post in terms of matrices, where the average R_0 was just the sum of the entries of the matrix A, or \mathbf{1}^T A \mathbf{1}; here the “matrix” A should be thought of as a rank 1 thing w w^T where w is a vector with entries sampled from X.)

The top eigenvalue is just the trace of the matrix, since all the other eigenvalues are 0, and that is

{\mathbf E} X^2 = B^2(1+1/k).

Note already that this is a lot bigger than the average R_0 when k is small! In particular, there are lots of random graphs of this type which have a giant component but average degree < 2; that’s because they have a lot of isolated vertices, I suppose.

So what’s the size of the giant component in a graph like this? As always we are solving an integral equation

f = 1 - e^{-Af}

for a function f on measure space, where A is the “matrix” expressing the transmission. In fact, a function on measure space is just a random variable, and the rank-1 operator A sends Y to E(XY)X. The rank-1-ness means we can turn this into a problem about real numbers inteadd of random variables; we know Af = aX for some real number a; applying A to both sides of the above equation we then have

aX = \mathbf{E}(X(1-e^{-aX}))X


a =  \mathbf{E}(X(1-e^{-aX}))

But the latter expectation is something you can explicitly compute for a gamma-distributed variable! It just involves doing some integrals, which I rarely get to do! I’ll spare you the computation but it ends up being

a = B(1-aB/k)^{-(k+1)}

which you can just solve for a, and then compute E(1-e^{-aX}) if you want to know the total proportion of the population in the giant component. If k is really small — and Adam Kucharski, et al, back in April, wrote it could be as low as 0.1 — then you can get really small giant components even with a fast exponential rate. For instance, take B = 0.45 and k = 0.1; you get a top eigenvalue of 2.2, not inconsistent with the growth rates we saw for unimpeded COVID, but only 7.3% of the population touched by the infection! Another way to put it is that if you introduce the infection to a random individual, the chance of an outbreak is only 7%. As Lloyd-Smith says in the Nature paper, this is a story that makes “disease extinction more likely and outbreaks rarer but more explosive.” Big eigenvalue decouples from eventual spread.

(By the way, Kucharski’s book, The Rules of Contagion, is really good — already out in the UK, coming out here in the US soon — I blurbed it!)

What’s going on here, of course, is that with k this low, your model is that the large majority of people participate in zero interactions of the type likely to cause transmission. Effectively, it’s not so different from the silly example we started with, where 90% of the population enjoyed natural immunity but the other 10% were really close talkers. So having written all this, I’m not sure I needed to have done all those integrals to make this point. But I find it soothing to while away an hour doing integrals.

I don’t know whether I think k=0.1 is realistic, and of course, as Adam K. explained to me by email, who is a superspreader may change with time and context; so 7% is probably too low, since it’s not like once the infection “uses up” the superspreaders there can’t possibly be any more. Probably the variance of propensity to transmit over time should either actually be modeled dynamically or proxied by letting X be a less strongly skewed distribution representing “time average of propensity to transmit” or something like that.

In any event, this does make me feel much more favorable towards the idea that unmitigated spread would end up infecting less than half of the population, not 80% or more. (It does not make me favorable towards unmitigated spread.)

Tagged , ,

Pandemic blog 20: R_0, random graphs, and the L_2 norm

People are talking about R_0. It’s the number that you wish were below 1. Namely: it is the number of people, on average, that a carrier of SARS-CoV-2 infects during the course of their illness. All the interventions we’re undertaking are designed to shrink that number. Because if it’s bigger than 1, the epidemic grows exponentially until it infects a substantial chunk of the population, and if it’s smaller, the epidemic dies out.

But not everybody has the same R_0! According to some of the epdemiologists writing about COVID, this matters. It matters, for instance, to the question of how far into the population the infection gets before it starts to burn itself out for lack of new susceptible hosts (“herd immunity”) and to the question of how much of the population eventually falls ill.

Here’s an easy way to see that heterogeneity can make a difference. Suppose your population consists of two towns of the same size with no intermunicipal intercourse whatsoever. The first town has an R_0 of 1.5 and the second an R_0 of 0.3. Then the mean R_0 of the whole population is 0.9. But this epidemic doesn’t die out; it spreads to cover much of the population of Contagiousville.

I learned an interesting way to think about this heuristically from Tim Gowers on Twitter:

You can read Tim’s interesting thread yourself, but here’s the main idea. Say your population has size N. You make a graph out of the pandemic by placing an edge between vertices i and j if one of the corresponding people infects the other. (Probably better to set this up in a directed way, but I didn’t.) Or maybe slightly better to say: you place an edge if person i and person j interact in a manner such that, were either to enter the interaction infected, both would leave that way. If one person in this graph gets infected, the reach of the infection is the connected component of the corresponding vertex. So how big is that component?

The simplest way to set this up is to connect each pair of vertices with probability c/n, all such choices made independently. This is an Erdos-Renyi random graph. And the component structure of this graph has a beautiful well-known theory; if c > 1, there is a giant component which makes up a positive proportion of the vertices, and all other components are very small. The size of this component is nx, where x is the unique positive number such that

x = 1 - e^{-cx}.

If c < 1, on the other hand, there is no big component, so the pandemic is unlikely to reach much of the population. (Correspondingly, the equation above has no nonzero solution.)

It is fair to be skeptical of this model, which is completely static and doesn’t do anything fancy, but let me just say this — the most basic dynamic model of epidemic spread, the SIR model, has an endstate where the proportion of the population that’s been infected is the unique positive x such that

x = 1 - e^{-R_0x}.

Which looks pretty familiar!

Now what happens if you want to take into account that the population isn’t actually an undifferentiated mass? Let’s say, for instance, that your county has a red town and a blue town, each with population n/2. Two people in the red town have a probability of 2/n of being connected, while two people in the blue town have a probability of just 1/n of being connected, and a red-blue pair is connected with probability 1/n. (This kind of random graph is called a stochastic block model, if you want to go look at papers about it.) So the typical red-town person is going to infect 1 fellow red-towner and 0.5 blue-towners, for an R_0 of 1.5, while the blue-towner is going to have an R_0 of 1.

Here’s the heuristic for figuring out the size of the big component. Suppose x is the proportion of the red town in the big component of the graph, and y is the proportion of the blue town in the big component. Take a random red person; what’s the change they’re in the big component? Well, the chance they’re not connected to any of the xn/2 red-towners in the big component is

(1-2/n)^{xn/2} = e^{-1}

(oh yeah did I mention that n was infinity?) and the chance that they’re not connected to any of the blue-towners in the big component is

(1-1/n)^{yn/2} = e^{-(1/2)y}

so all in all you get

x = 1 - e^{-(x + (1/2)y}

and by the same token you would get

y = 1-e^{-((1/2)x + (1/2)y)}

and now you have two equations that you can solve for x and y! In fact, you find x = 47% and y = 33%. So just as you might expect, the disease gets farther in the spreadier town.

And you might notice that what we’re doing is just matrix algebra! If you think of (x,y) as a vector v, we are solving

v = \mathbf{1} - e^{-Av}

where “exponentiation” of a vector is interpreted coordinatewise. You can think of this as finding a fixed point of a nonlinear operator on vectors.

When does the outbreak spread to cover a positive proportion of the population? There’s a beautiful theorem of Bollobas, Janssen, and Riordan that tells you: you get a big component exactly when the largest eigenvalue λ of A, the so-called Perron-Frobenius eigenvalue, is larger than 1. In the case of the matrix studied above, the two eigenvalues are about 1.31 and 0.19. You might also note that in the early stages of the epidemic, when almost everyone in the network is susceptible, the spread in each town will be governed by repeated multiplication of a small vector by A, and the exponential rate of growth is thus also going to be given by λ.

It would be cool if the big eigenvalue also told you what proportion of the vertices are in the giant component, but that’s too much to ask for. For instance, we could just replace A with a diagonal matrix with 1.31 and 0.19 on the diagonal; then the first town gets 43% infected and the second town completely disconnected from the first, gets 0.

What is the relationship between the Perron-Frobenius eigenvalue and the usual “mean R_0” definition? The eigenvalue can be thought of as

\max_{v} v^T A v / v^T v

while the average R_0 is exactly

\mathbf{1}^T A \mathbf{1} / n

where 1 is the all-ones vector. So we see immediately that λ is bounded below by the average R_0, but it really can be bigger; indeed, this is just what we see in the two-separated-towns example we started with, where R_0 is smaller than 1 but λ is larger.

I don’t see how to work out any concise description of the size of the giant component in terms of the symmetric matrix, even in the simplest cases. As we’ve seen, it’s not just a function of λ. The very simplest case might be that where A has rank 1; in other words, you have some division of the population into equal sized boxes, and each box has its own R_0, and then the graph is constructed in a way that is “Erdos-Renyi but with a constraint on degrees” — I think there are various ways to do this but the upshot is that the matrix A is rank 1 and its (i,j) entry is R_0(i) R_0(j) / C where C is the sum of the R_0 in each box. The eigenvalues of A are all zero except for the big one λ, which is equal to the trace, which is

\mathbf{E} R_0^2 / \mathbf{E} R_0

or, if you like, mean(R_0) + variance(R_0)/mean(R_0); so if the average R_0 is held fixed, this gets bigger the more R_0 varies among the population.

And if you look back at that Wikipedia page about the giant component, you’ll see that this is the exact threshold they give for random graphs with specified degree distribution, citing a 2000 paper of Molloy and Reid. Or if you look at Lauren Meyers’s 2005 paper on epidemic spread in networks, you will find the same threshold for epidemic outbreak in section 2. (The math here is descended from work of Molloy-Reed and this much-cited paper of Newman, Strogatz, and Watts.) Are typical models of “random graphs with specified degree distribution” are built to have rank 1 in this sense? I think so — see e.g. this sentence in Newman-Strogatz-Watts: “Another quantity that will be important to us is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex.”

At any rate, even in this rank 1 case, even for 2×2 matrices, it’s not clear to me how to express the size of the giant component except by saying it’s a nonzero solution of v = 1 - e^{Av}. Does the vector v have anything do do with the Perron-Frobenius eigenvector? Challenge for the readers: work this out!

I did try a bunch of randomly chosen 6×6 matrices and plot the overall size of the giant component against λ, and this is what I got:

The blue line shows the proportion of the vertices that get infected if the graph were homogeneous with parameter λ. Which makes me think that thinking of λ as a good proxy for R_0 is not a terrible idea; it seems like a summary statistic of A which is pretty informative about the giant component. (This graph suggests maybe that among graphs with a given λ, the homogeneous one actually has the biggest giant component? Worth checking.)

I should hasten to say that there’s a lot of interest in the superspreader phenomenon, where a very small (probability -> 0) set of vertices has very large (superlinear in n) number of contacts. Meyers works out a bunch of cases like this and I think they are not well modeled by what I’m talking about here. (Update: I wrote another post about this! Indeed, I think the tight connection shown in the chart between λ and size of giant component is not going to persist when there’s extreme heterogeneity of degree.)

A more technical note: the result of Bollobas et al is much more general; there’s no reason the vertices have to be drawn from finitely many towns of equal size; you can instead have the types of vertices drawn from whatever probability space M you like, and then have the probability of an edge between an vertex x and a vertex y be W(x,y) for some symmetric function on M^2; nowadays this is called the “graphon” point of view. Now the matrix is replaced by an operator on functions:

A_Wf(x) = \int_M f(y)W(x,y),

the probability g(x) that a vertex of type x is in the giant component is a solution of the integral equation

g = 1-e^{-Ag}

and a giant component exists just when the operator norm ||A_W||_2 is greater than 1. This is the kind of analysis you’d want to use if you wanted to really start taking geography into account. For instance, take the vertices to be random points in a disc and let W(x,y) be a decreasing function of |x-y|, modeling a network where infection likelihood is a decreasing function of distance. What is the norm of the operator A_W in this case? I’ll bet my harmonic analyst friends know, if any of them are reading this. But I do not.

Update: Now I know, because my harmonic analyst friend Brian Street told me it’s just the integral over W(x,y) over y, which is the same for all y (well, at least it is if we’re on the whole of R^d.) Call that number V. He gave a very nice Fourier-theoretic argument but now that I know the answer I’m gonna explain it in terms of the only part of math I actually understand, 2×2 matrices. Here’s how it goes. In this model, each vertex has the same expected number of neighbors, namely that integral V above. But to say every vertex has the same expected number of neighbors is to say that 1 is an eigenvector for A. If 1 were any eigenvector other than Perron-Frobenius, it would be orthogonal to Perron-Frobenius, which it can’t be because both have positive entries, so it is Perron-Frobenius, so λ = V.

In fact I had noticed this fact in one of the papers I looked at while writing this (that if the matrix had all row-sums the same, the long-term behavior didn’t depend on the matrix) but didn’t understand why until just this minute. So this is kind of cool — if the kind of heterogeneity the network exhibits doesn’t cause different kinds of vertices to have different mean degree, you can just pretend the network is homogeneous with whatever mean R_0 it has. This is a generalization of the fact that two towns with no contact which have the same R_0 can be treated as one town with the same R_0 and you don’t get anything wrong.

Tagged , , , ,

Pandemic blog 11: Why do curves bend?

When you plot the number of reported deaths from COVID on a log scale you get pictures that look like this one, by John Burn-Murdoch at the Financial Times:

A straight line represents exponential growth, which is what one might expect to see in the early days of a pandemic according to baby models. You’ll note that the straight line doesn’t last very long, thank goodness; in just about every country the line starts to bend. Why are COVID deaths concave? There are quite a few possible reasons.

  1. Suppression is working. When pandemic breaks out, countries take measures to suppress transmission, and people take their own measures over and above what their governments do. (An analysis by Song Gao of our geography department of cellphone location data shows that Wisconsinites median distance traveled from home decreased by 50% even before the governor issued a stay-at-home order.) That should slow the rate of exponential growth — hopefully, flip it to exponential decay.
  2. Change in reporting. Maybe we’re getting better at detecting COVID deaths; if on day 1, only half of COVID deaths were reported as same, while now we’re accurately reporting them all, we’d see a spuriously high slope at the beginning of the outbreak. (The same reasoning applies to the curve for number of confirmed cases; at the beginning, the curve grows faster than the true number of infections as testing ramps up.)
  3. COVID is getting less lethal. This is the whole point of “flattening the curve” — with each week that passes, hospitals are more ready, we have more treatment options and fuller knowledge of which of the existing treatments best suits which cases.
  4. Infection has saturated the population. This is the most controversial one. The baby model (where by baby I mean SIR) tells you that the curve bends as the number of still-susceptible people starts to really drop. The consensus seems to be we’re nowhere near that yet, and almost everyone (in the United States, at least) is still susceptible. But I guess one should be open to the possibility that there are way more asymptomatic people than we think and half the population is already infected; or that for some reason a large proportion of the population carries natural immunity so 1% of population infected is half the susceptible population.
  5. Heterogeneous growth rate. I came across this idea in a post by a physicist (yeah, I know, but it was a good post!) which I can’t find now — sorry, anonymous physicist! There’s not one true exponential growth rate; different places may have different slopes. Just for the sake of argument, suppose a bunch of different locales all start with the same number of deaths, and suppose the rate of exponential growth is uniformly distributed between 0 and 1; then the total deaths at time t is \int^1_0 e^{\alpha t} d \alpha which is (1/t)(e^t - 1). The log of that function has positive second derivative; that is, it tends to make the curve bend up rather than down! That makes sense; with heterogeneous rates of exponential growth, you’ll start with some sort of average of the rates but before long the highest rate will dominate.

I’m sure I’ve skipped some curve-bending factors; propose more in comments!

Tagged , ,

Commutativity, with fractions

Talking to AB about multiplying rational numbers. She understands the commutativity of multiplication of integers perfectly well. But I had forgotten that commutativity in the rational setting is actually conceptually harder! That four sixes is six fours you can conceptualize by thinking of a rectangular array, or something equivalent to that. But the fact that seven halves is the same thing as seven divided by two doesn’t seem as “natural” to her. (Is that even an instance of commutativity? I think of the first as 7 x 1/2 and the second as 1/2 x 7.)

%d bloggers like this: