## Pandemic blog 31: farmers’ market

First trip back to the Westside Community Market, which in ordinary times is an every Saturday morning trip for me. It feels like a model for people just sitting down and figuring out how to arrange for people to do the things they want to do in a way that minimizes transmission. We don’t have to eliminate every chance for someone to get COVID. If we cut transmissions to a third of what it would otherwise be, that doesn’t mean a third as many people get COVID — it means the pandemic dies out instead of exploding. Safe is impossible, safer is important!

They’ve reorganized everything so that the stalls are farther apart. Everybody’s wearing masks, both vendors and customers. There are several very visible hand-washing stations. Most of the vendors now take credit cards through Square, and at least one asked me to pay with Venmo. It’s easy for people to keep their distance (though the vendors told me it was more crowded earlier in the morning.)

And of course it’s summer, the fields are doing what the fields do, the Flyte Farm blueberries, best in Wisconsin, are ready — I bought five pounds, and four containers of Murphy Farms cottage cheese. All you need is those two things for the perfect Wisconsin summer meal.

## Pandemic blog 30: opening day

I have been generally feeling: it is OK to start relaxing restrictions on in-person contact, because there seems some decent chance that barring the most infectiogenic scenarios might be enough to keep outbreaks small and manageable. And that still might be true, in some contexts; in Dane County, we had a big spike of cases when the bars re-opened, and when the bars shut down again, the case spike went away, and hasn’t come back, though people are certainly out and about. But statewide, cases are growing and growing, and the situation is much worse in the South. I would fight back if you said this was a predictable consequence; nothing about this disease is predictable with any confidence. It could have worked. But I wouldn’t fight you if you said it was an expectable consequence, the consequence you thought most likely.

Similarly, if you rigorously jettison everyone with a demonstrated ability to play baseball from your team, and sign a collection of promising young players but keep them off the roster in order to avoid starting their service time, and then put that team on the field against major league competition, you might find that the nobodies and never-weres and used-to-bes find it within themselves to go on a scrappy “Why not?” run of success; or you might, as an expectable consequence, give up eight doubles and get beat 13-2.

Tagged , ,

## Pandemic blog 28: Smart Restart

What’s going to happen to school in the fall? Madison schools are talking about having two days on, three days off, with half the kids going on Monday and Tuesday and half on Thursday and Friday.

I think if we open anything it has to be schools. And it seems pretty clear we are not not opening anything. If there’s no school, how are people with young kids supposed to work?

There’s decent evidence that young kids are less likely to get infected with COVID, less likely to spread it, and drastically less likely to become seriously ill from it — so I don’t think it’s crazy to hope that you can bring kids together in school without taking too much of a hit to public health.

What about college? UW-Madison is proposing a “Smart Restart” plan in which students come back to dorms, on-campus instruction starts in a limited fashion (big classes online, small classes taught in big rooms with students sitting far apart.) A lot of my colleagues are really unhappy with the fact that we’re proposing to bring students back to campus at all. I’m cautiously for it. I am not going to get into the details because more detail-oriented people than me have thought about them a lot, and I’m just sitting here blogging on Independence Morning.

But three non-details:

1. Given the high case numbers among college students in Madison now, just from normal college student socializing, it’s not clear to me that asking them to come to class is going to make a notable difference in how much COVID spread the student population generates.
2. Any plan that says “Protect the most vulnerable populations, like old people, but let young healthy people do what they want” that doesn’t include “vulnerable people who can’t safely do their jobs because their workplaces are full of young, healthy, teeming-with-COVID people get paid to stay home” is not a plan. We can’t make 65-year-old teachers teach in person and we can’t make diabetic teachers teach in person and we can’t make teachers with elderly relatives in the household teach in person.
3. Any plan for re-opening schools has to have pretty clear guidelines for what triggers a reverse of course. We cannot figure out what’s safe, or “safe enough,” by pure thought; at some point we have to try things. But a re-opening plan that doesn’t include a re-closing plan is also not a plan.

Tagged , , ,

## Pandemic blog 27: Impossible Stroganoff

We are down to once every three weeks at Trader Joe’s (I fill two whole carts with stuff, it’s an undertaking) which we supplement with other kinds of food purchases in between. I’m unhappy with the conditions industrial meatpackers are putting their workers in, so I’m picking up meat curbside at Conscious Carnivore, our local meat-from-nearby-farms-you’re-supposed-to-feel-vaguely-OK-about supplier. We get shipments from Imperfect Foods, which I’m a little concerned is some kind of hedge-fund-backed grocery store destruction scheme but helps fill in the gaps. And the really exciting food news is that Impossible Foods, the substitute meat company I learned about from my old math team buddy Mike Eisen, is now delivering!

This stuff is by far the most realistic fake ground beef in existence. We served Impossible cheeseburgers at CJ’s bar mitzvah and a member of the ritual committee was so convinced he was ready to pull the fire alarm and evacuate the shul for de-trayfing. Since I don’t cook milk and meat together in the house, there are a lot of dishes that just don’t happen at home. And one of them — which I’ve been waiting years to make — is my favorite dish from childhood, “hamburger stroganoff.”

This dish comes from Peg Bracken’s protofeminist masterpiece, the I Hate To Cook Book. Is that book forgotten by younger cooks? It’s decidedly out of style. Maybe it was even out of style then; my mom, I always felt, made hamburger stroganoff grudgingly. It involves canned soup. But it is one of the most delicious things imaginable and readers, the Impossible version is almost indistinguishable from the real thing.

Here’s Peg Bracken’s obituary, which leads with the famous lines from this famous recipe:

Start cooking those noodles, first dropping a bouillon cube into the noodle water. Brown the garlic, onion and crumbled beef in the oil. Add the flour, salt, paprika and mushrooms, stir, and let it cook five minutes while you light a cigarette and stare sullenly at the sink.

And here’s the recipe itself. If you’re vegetarianizing this, you can just use cream of mushroom soup for the cream of chicken and replace the bouillon with some salt (or veggie stock, if that’s your bag.)

8 ounces Noodles, uncooked
1 cube Beef Bouillon
1 clove Garlic,minced
1/3 cup Onion, chopped
2 tablespoons Cooking oil
1 pound Ground Beef
2 tablespoons Flour
2 teaspoons Salt
1/2 teaspoon Paprika
6 ounces Mushrooms
1 can Cream of Chicken Soup, undiluted
1 cup Sour Cream
1 handful Parsley, chopped

Start cooking those noodles, first dropping a boullion cube into the noodle water.
Brown the garlic, onion, and crumbled beef in the oil.
Add the flour, salt, paprika, and mushrooms, stir, and let it cook five minutes while you light a cigarette and stare sullenly at the sink.
Then add the soup and simmer it–in other words, cook on low flame under boiling point–ten minutes.
Now stir in the sour cream–keeping the heat low, so it won’t curdle–and let it all heat through.
To serve it, pile the noodles on a platter, pile the Stroganoff mix on top of the noodles, and sprinkle chopped parsley around with a lavish hand.

## Pandemic blog 26: writing

I was supposed to turn in a manuscript for my new (general-audience book) last week. It’s not finished. But I’ve written a lot of it during the pandemic. Of course it is very hard to be “productive” in the usual way, with the kids here all day. But being in the house all day is somehow the right setup for book-writing, maybe because it so clearly separates life now from my usual life where I am neither staying in the house nor writing a book.

I think the pages I’m putting out are good. As usual, the process of writing is causing me to learn new things faster than I can put them in the book and indeed there is now too much material to actually go in the book, but that means, at any rate, I can be selective and pick just the best.

Tagged , ,

## Pandemic blog 24: enter the gamma

I blogged last week about how to think about “R_0,” the constant governing epidemic growth, when different people in the network had different transmissibility rates.

Today, inspired by Kai Kupferschmidt’s article in Science, I look another look at what happens when the transmission rates vary a lot among people. And I learned something new! So let me set that down.

First of all, et me make one point which is silly but actually has mathematical content. Suppose 90% of the entire population is entirely immune to the disease, and the other 10% each encounter 20 people, sharing extensive air with each one . Since only 2 of those 20 are going to be susceptible, the dynamics of this epidemic are the same as that of an epidemic with an R_0 of 2. So if you look at the exponential growth at the beginning of the epidemic, you would say to yourself “oh, the growth factor is 2, so that’s R_0, we should hit herd immunity at about 50% and end up covering about 80% of the population,” but no, because the population that’s relevant to the epidemic is only 10% the total population! So, to your surprise, the epidemic would crest at 5% prevalence and die out with only 8% of people having been infected.

So extreme heterogeneity really matters — the final spread of the epidemic can be completely decoupled from R_0 (if what we mean by R_0 is the top eigenvalue like last time, which measures the early-epidemic exponential rate of spread.)

In my last post, I included a graph of how spread looked in non-heterogeneous populations generated by 6×6 random matrices I chose randomly, and the graph showed that the top eigenvalue and the eventual spread were strongly coupled to each other. But if you choose a random 6×6 matrix the entries are probably not actually going to be that far apart! So I think this was a little misleading. If the transmissibility has a really long tail, things may be different, as the silly example shows. What follows is a somewhat less silly example.

The model of heterogeneity used in this famous paper seems to be standard. You take transmissibility to be a random variable drawn from a gamma distribution with mean B and shape parameter k. (I had to look up what this was!) The variance is B^2/k. As k goes to infinity, this approaches a variable which is exactly B with probability 1, but for k close to 0, the variable is often near zero but occasionally much larger than B. Superspreaders!

Just like in the last post, we are going to completely jettison reality and make this into a static problem about components of random graphs. I am less confident once you start putting in rare high-transmission events that these two models stay coupled together, but since the back-of-the-envelope stuff I’m doing here seems to conform with what the epidemiologists are getting, let’s go with it. In case you don’t feel like reading all the way to the end, the punchline is that on these kinds of models, you can have early exponential growth that looks like R_0 is 2 or 3, but an epidemic that peters out with a very small portion of people infected; the “herd immunity is closer than we think” scenario, as seen in this preprint of Gomes et al.

Let’s also stick with the “rank 1” case because it’s what’s in the paper I linked and there are already interesting results there. Write X for our gamma-distributed random variable.

Then, sticking with the notation from the last post, the mean number of transmissions per person, the “average R_0”, is

$(\mathbf{E} X)^2 = B^2$

(I guess I wrote the last post in terms of matrices, where the average R_0 was just the sum of the entries of the matrix A, or $\mathbf{1}^T A \mathbf{1}$; here the “matrix” A should be thought of as a rank 1 thing w w^T where w is a vector with entries sampled from X.)

The top eigenvalue is just the trace of the matrix, since all the other eigenvalues are 0, and that is

${\mathbf E} X^2 = B^2(1+1/k)$.

Note already that this is a lot bigger than the average R_0 when k is small! In particular, there are lots of random graphs of this type which have a giant component but average degree < 2; that’s because they have a lot of isolated vertices, I suppose.

So what’s the size of the giant component in a graph like this? As always we are solving an integral equation

$f = 1 - e^{-Af}$

for a function f on measure space, where A is the “matrix” expressing the transmission. In fact, a function on measure space is just a random variable, and the rank-1 operator A sends Y to E(XY)X. The rank-1-ness means we can turn this into a problem about real numbers inteadd of random variables; we know Af = aX for some real number a; applying A to both sides of the above equation we then have

$aX = \mathbf{E}(X(1-e^{-aX}))X$

or

$a = \mathbf{E}(X(1-e^{-aX}))$

But the latter expectation is something you can explicitly compute for a gamma-distributed variable! It just involves doing some integrals, which I rarely get to do! I’ll spare you the computation but it ends up being

$a = B(1-aB/k)^{-(k+1)}$

which you can just solve for a, and then compute E(1-e^{-aX}) if you want to know the total proportion of the population in the giant component. If k is really small — and Adam Kucharski, et al, back in April, wrote it could be as low as 0.1 — then you can get really small giant components even with a fast exponential rate. For instance, take B = 0.45 and k = 0.1; you get a top eigenvalue of 2.2, not inconsistent with the growth rates we saw for unimpeded COVID, but only 7.3% of the population touched by the infection! Another way to put it is that if you introduce the infection to a random individual, the chance of an outbreak is only 7%. As Lloyd-Smith says in the Nature paper, this is a story that makes “disease extinction more likely and outbreaks rarer but more explosive.” Big eigenvalue decouples from eventual spread.

(By the way, Kucharski’s book, The Rules of Contagion, is really good — already out in the UK, coming out here in the US soon — I blurbed it!)

What’s going on here, of course, is that with k this low, your model is that the large majority of people participate in zero interactions of the type likely to cause transmission. Effectively, it’s not so different from the silly example we started with, where 90% of the population enjoyed natural immunity but the other 10% were really close talkers. So having written all this, I’m not sure I needed to have done all those integrals to make this point. But I find it soothing to while away an hour doing integrals.

I don’t know whether I think k=0.1 is realistic, and of course, as Adam K. explained to me by email, who is a superspreader may change with time and context; so 7% is probably too low, since it’s not like once the infection “uses up” the superspreaders there can’t possibly be any more. Probably the variance of propensity to transmit over time should either actually be modeled dynamically or proxied by letting X be a less strongly skewed distribution representing “time average of propensity to transmit” or something like that.

In any event, this does make me feel much more favorable towards the idea that unmitigated spread would end up infecting less than half of the population, not 80% or more. (It does not make me favorable towards unmitigated spread.)

Tagged , ,

## Pandemic blog 23: why one published research finding is misleading

I really like John Ioannidis: his famous 2005 article “Why Most Published Research Findings are False” probably did more than any other paper to draw attention to the problems with blind use of p-value certification in medicine.

But he has a preprint up on medrxiv today that is really poorly done, so much so that it made me mad, and when I get mad, I blog.

Ioannidis has been saying for months that the COVID-19 pandemic, while bad, is not as bad as people think. Obviously this is true for some value of “people.” And I think he is right that the infection fatality rate, or IFR, is in most places not going to be as high as the 0.9% figure the March 16 Imperial College model used as an estimate. But Ioannidis has a much stronger claim; he thinks the IFR, in general, is going to be about 1 or 2 in a thousand, and in order to make that case, he has written a paper about twelve studies which show a high prevalence of antibodies in populations where not very many people have died. High prevalence of infection + few deaths = low IFR.

I think I am especially irritated with this paper because I agree that the IFR now looks lower than it looked two months ago, and I think it’s important to have good big-picture analysis to back that intuition up — and this isn’t it. There’s a lot wrong with this paper but I just want to focus on one thing that jumped out at me as especially wrong, and that is Ioannidis’s treatment of the Netherlands antibody study.

That study found that in blood donors, all ages 18-72 (Ioannidis says <70, not sure why), 2.7% showed immunity. Ioannidis reports this, then makes the following computation. About 15m of the 17m people in the Netherlands are under 70, so this suggests roughly 400,000 people in that age group had been infected, of whom only 344 had died at the time of the study, giving an IFR of a mere 0.09%. Some plague! Ioannidis puts this number in his table and counts it among those of which he writes “Seven of the 12 inferred IFRs are in the range 0.07 to 0.20 (corrected IFR of 0.06 to 0.16) which are similar to IFR values of seasonal influenza.”

But of course the one thing we really do know about COVID, in this sea of uncertainty, is that it’s much, much more deadly to old people. The IFR for people under 70 is not going to be a good estimate for the overall IFR.

I hashed out some numbers — it looks to me like, using the original March 16 Imperial College estimates, derived from Wuhan, you would derive an infection fatality rate of about 0.47% among people age 20-70. There are about 10.8m Dutch people in that range (I am taking all this from Wikipedia data on the age distribution of the Netherlands) so if 2.7% of those are infected, that’s about 300,000 infections, and 344 deaths in that group is about 0.11%. Lower than the Imperial estimate! But four times lower, not ten times lower.

What about the overall IFR? That, after all, is what Ioannidis’s paper is about. If you count the old people who died, the toll as of April 15 wasn’t 344, it was over 3100. If the 2.7% prevalence rate were accurate as a population-wide estimate, the total number of infected people would be about 460,000, for an IFR of 0.67%, more than seven times higher than the figure Ioannidis reports (though still a little lower than the 0.9% figure in the Imperial paper.) Now we definitely don’t know that the infection rate among old Dutch people is the same as it is in the overall population! But even if you suppose that every single person over 70 in the country is infected, that gets you to a little over 2 million infections, and an IFR of 0.15%. In other words, the number reported by Ioannidis is substantially lower than the theoretical minimum the IFR could actually be. And of course, it’s not the case that everybody over 70 already had COVID-19 in the middle of April. (For one thing, that would make the IFR for over-70s only slightly higher than the IFR overall, which contradicts the one thing about COVID we really know!)

There’s no fraud here, I hasten to say. Ioannidis tells you exactly what he’s doing. But he’s doing the wrong thing.

Tagged , , ,

## Pandemic blog 22: departures

This is the current list of every flight leaving Madison today:

(For those who don’t live here, under normal conditions there would be about 50 depatures a day to 15-20 destinations.)

Tagged , , ,

## Pandemic blog 21: we all look like freedom fighters

Wisconsin is slowly loosening its emergency health restrictions. Stores are allowed to open as long as they’re not in enclosed malls and no more than five customers are inside at once. People are moving around more than they were in April (though still quite a bit less than they were at the beginning of March):

The streets aren’t empty; last Sunday I walked over to Tim Yu‘s house to drop off a copy of an oral history of REM I knew he wanted to read, and everyone in the neighborhood was outside; I probably socalized more, sidewalk to porch, than I do on an ordinary Sunday. AB and I did a 25-mile ride, a new record for her, and there were plenty of people out on the bikepaths, unmasked. I played Frisbee with CJ at Wingra Park and a big group of teenagers was hanging out in close quarters, looking very much not like a family group.

On the other hand, at Trader Joe’s today, shoppers were making a visible effort to stay away from one another, and I counted only four people without masks. I overheard the Russian guy who works there say to one of this co-workers, “We all look like freedom fighters.”

I see this as a reasonable response to increased knowledge about the nature of the disease. Sustained indoor propinquity seems to be the dominant mechanism of transition.

Freedom fighters! The Wisconsin Supreme Court has struck down the state stay-at-home order issued by Governor Evers, except not exactly, because in order to find a reading of the statute that supported the outcome they asserted they had no beef with the governor’s order itself, only its implementation and enforcement by Andrea Palm, the State Health Secretary (or rather the State Health Secretary Designee because the Senate doesn’t feel like confirming anyone.) Anyway, as of now, nobody knows what the rules are. Some bars opened up and served crowds as normal. Seems like a bad idea. The smart political money in Wisconsin says this decision has nothing to do with COVID per se but is mostly an attempt to establish some precedent that the executive needs legislative approval to, well, execute things.

I don’t know what happens next. Maybe nothing. Stores were already open, people were already moving around. And large chunks of the state, including some of the places with the highest caseload like Green Bay, Kenosha, and Milwaukee, are still under county orders that the Supreme Court didn’t touch. Maybe people packing into newly open bars will create superspreading events and we’ll see a big wave of new cases and deaths in Waukesha and Platteville. And maybe they won’t! The main thing we know about COVID is we don’t know much about COVID. Why was there so much more spread in New York than there was in Chicago, and so much more in Chicago than in San Francisco? I don’t think there are any convincing answers. There’s graph theory in it, as in my last post, but it’s not just graph theory.

Wisconsin may very well not suffer any disastrous consequence from opening up with no real plan. But it’s hard to deny we’re taking a risk of a disastrous consequence. Let’s hope it doesn’t happen. That’s not a crazy hope. Most drunk drivers get home safe.

## Pandemic blog 20: R_0, random graphs, and the L_2 norm

People are talking about R_0. It’s the number that you wish were below 1. Namely: it is the number of people, on average, that a carrier of SARS-CoV-2 infects during the course of their illness. All the interventions we’re undertaking are designed to shrink that number. Because if it’s bigger than 1, the epidemic grows exponentially until it infects a substantial chunk of the population, and if it’s smaller, the epidemic dies out.

But not everybody has the same R_0! According to some of the epdemiologists writing about COVID, this matters. It matters, for instance, to the question of how far into the population the infection gets before it starts to burn itself out for lack of new susceptible hosts (“herd immunity”) and to the question of how much of the population eventually falls ill.

Here’s an easy way to see that heterogeneity can make a difference. Suppose your population consists of two towns of the same size with no intermunicipal intercourse whatsoever. The first town has an R_0 of 1.5 and the second an R_0 of 0.3. Then the mean R_0 of the whole population is 0.9. But this epidemic doesn’t die out; it spreads to cover much of the population of Contagiousville.

You can read Tim’s interesting thread yourself, but here’s the main idea. Say your population has size N. You make a graph out of the pandemic by placing an edge between vertices i and j if one of the corresponding people infects the other. (Probably better to set this up in a directed way, but I didn’t.) Or maybe slightly better to say: you place an edge if person i and person j interact in a manner such that, were either to enter the interaction infected, both would leave that way. If one person in this graph gets infected, the reach of the infection is the connected component of the corresponding vertex. So how big is that component?

The simplest way to set this up is to connect each pair of vertices with probability c/n, all such choices made independently. This is an Erdos-Renyi random graph. And the component structure of this graph has a beautiful well-known theory; if c > 1, there is a giant component which makes up a positive proportion of the vertices, and all other components are very small. The size of this component is nx, where x is the unique positive number such that

$x = 1 - e^{-cx}$.

If c < 1, on the other hand, there is no big component, so the pandemic is unlikely to reach much of the population. (Correspondingly, the equation above has no nonzero solution.)

It is fair to be skeptical of this model, which is completely static and doesn’t do anything fancy, but let me just say this — the most basic dynamic model of epidemic spread, the SIR model, has an endstate where the proportion of the population that’s been infected is the unique positive x such that

$x = 1 - e^{-R_0x}$.

Which looks pretty familiar!

Now what happens if you want to take into account that the population isn’t actually an undifferentiated mass? Let’s say, for instance, that your county has a red town and a blue town, each with population n/2. Two people in the red town have a probability of 2/n of being connected, while two people in the blue town have a probability of just 1/n of being connected, and a red-blue pair is connected with probability 1/n. (This kind of random graph is called a stochastic block model, if you want to go look at papers about it.) So the typical red-town person is going to infect 1 fellow red-towner and 0.5 blue-towners, for an R_0 of 1.5, while the blue-towner is going to have an R_0 of 1.

Here’s the heuristic for figuring out the size of the big component. Suppose x is the proportion of the red town in the big component of the graph, and y is the proportion of the blue town in the big component. Take a random red person; what’s the change they’re in the big component? Well, the chance they’re not connected to any of the xn/2 red-towners in the big component is

$(1-2/n)^{xn/2} = e^{-1}$

(oh yeah did I mention that n was infinity?) and the chance that they’re not connected to any of the blue-towners in the big component is

$(1-1/n)^{yn/2} = e^{-(1/2)y}$

so all in all you get

$x = 1 - e^{-(x + (1/2)y}$

and by the same token you would get

$y = 1-e^{-((1/2)x + (1/2)y)}$

and now you have two equations that you can solve for x and y! In fact, you find x = 47% and y = 33%. So just as you might expect, the disease gets farther in the spreadier town.

And you might notice that what we’re doing is just matrix algebra! If you think of (x,y) as a vector v, we are solving

$v = \mathbf{1} - e^{-Av}$

where “exponentiation” of a vector is interpreted coordinatewise. You can think of this as finding a fixed point of a nonlinear operator on vectors.

When does the outbreak spread to cover a positive proportion of the population? There’s a beautiful theorem of Bollobas, Janssen, and Riordan that tells you: you get a big component exactly when the largest eigenvalue λ of A, the so-called Perron-Frobenius eigenvalue, is larger than 1. In the case of the matrix studied above, the two eigenvalues are about 1.31 and 0.19. You might also note that in the early stages of the epidemic, when almost everyone in the network is susceptible, the spread in each town will be governed by repeated multiplication of a small vector by A, and the exponential rate of growth is thus also going to be given by λ.

It would be cool if the big eigenvalue also told you what proportion of the vertices are in the giant component, but that’s too much to ask for. For instance, we could just replace A with a diagonal matrix with 1.31 and 0.19 on the diagonal; then the first town gets 43% infected and the second town completely disconnected from the first, gets 0.

What is the relationship between the Perron-Frobenius eigenvalue and the usual “mean R_0” definition? The eigenvalue can be thought of as

$\max_{v} v^T A v / v^T v$

while the average R_0 is exactly

$\mathbf{1}^T A \mathbf{1} / n$

where 1 is the all-ones vector. So we see immediately that λ is bounded below by the average R_0, but it really can be bigger; indeed, this is just what we see in the two-separated-towns example we started with, where R_0 is smaller than 1 but λ is larger.

I don’t see how to work out any concise description of the size of the giant component in terms of the symmetric matrix, even in the simplest cases. As we’ve seen, it’s not just a function of λ. The very simplest case might be that where A has rank 1; in other words, you have some division of the population into equal sized boxes, and each box has its own R_0, and then the graph is constructed in a way that is “Erdos-Renyi but with a constraint on degrees” — I think there are various ways to do this but the upshot is that the matrix A is rank 1 and its (i,j) entry is R_0(i) R_0(j) / C where C is the sum of the R_0 in each box. The eigenvalues of A are all zero except for the big one λ, which is equal to the trace, which is

$\mathbf{E} R_0^2 / \mathbf{E} R_0$

or, if you like, mean(R_0) + variance(R_0)/mean(R_0); so if the average R_0 is held fixed, this gets bigger the more R_0 varies among the population.

And if you look back at that Wikipedia page about the giant component, you’ll see that this is the exact threshold they give for random graphs with specified degree distribution, citing a 2000 paper of Molloy and Reid. Or if you look at Lauren Meyers’s 2005 paper on epidemic spread in networks, you will find the same threshold for epidemic outbreak in section 2. (The math here is descended from work of Molloy-Reed and this much-cited paper of Newman, Strogatz, and Watts.) Are typical models of “random graphs with specified degree distribution” are built to have rank 1 in this sense? I think so — see e.g. this sentence in Newman-Strogatz-Watts: “Another quantity that will be important to us is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex.”

At any rate, even in this rank 1 case, even for 2×2 matrices, it’s not clear to me how to express the size of the giant component except by saying it’s a nonzero solution of $v = 1 - e^{Av}$. Does the vector v have anything do do with the Perron-Frobenius eigenvector? Challenge for the readers: work this out!

I did try a bunch of randomly chosen 6×6 matrices and plot the overall size of the giant component against λ, and this is what I got:

The blue line shows the proportion of the vertices that get infected if the graph were homogeneous with parameter λ. Which makes me think that thinking of λ as a good proxy for R_0 is not a terrible idea; it seems like a summary statistic of A which is pretty informative about the giant component. (This graph suggests maybe that among graphs with a given λ, the homogeneous one actually has the biggest giant component? Worth checking.)

I should hasten to say that there’s a lot of interest in the superspreader phenomenon, where a very small (probability -> 0) set of vertices has very large (superlinear in n) number of contacts. Meyers works out a bunch of cases like this and I think they are not well modeled by what I’m talking about here. (Update: I wrote another post about this! Indeed, I think the tight connection shown in the chart between λ and size of giant component is not going to persist when there’s extreme heterogeneity of degree.)

A more technical note: the result of Bollobas et al is much more general; there’s no reason the vertices have to be drawn from finitely many towns of equal size; you can instead have the types of vertices drawn from whatever probability space M you like, and then have the probability of an edge between an vertex x and a vertex y be W(x,y) for some symmetric function on M^2; nowadays this is called the “graphon” point of view. Now the matrix is replaced by an operator on functions:

$A_Wf(x) = \int_M f(y)W(x,y)$,

the probability g(x) that a vertex of type x is in the giant component is a solution of the integral equation

$g = 1-e^{-Ag}$

and a giant component exists just when the operator norm $||A_W||_2$ is greater than 1. This is the kind of analysis you’d want to use if you wanted to really start taking geography into account. For instance, take the vertices to be random points in a disc and let W(x,y) be a decreasing function of |x-y|, modeling a network where infection likelihood is a decreasing function of distance. What is the norm of the operator A_W in this case? I’ll bet my harmonic analyst friends know, if any of them are reading this. But I do not.

Update: Now I know, because my harmonic analyst friend Brian Street told me it’s just the integral over W(x,y) over y, which is the same for all y (well, at least it is if we’re on the whole of R^d.) Call that number V. He gave a very nice Fourier-theoretic argument but now that I know the answer I’m gonna explain it in terms of the only part of math I actually understand, 2×2 matrices. Here’s how it goes. In this model, each vertex has the same expected number of neighbors, namely that integral V above. But to say every vertex has the same expected number of neighbors is to say that 1 is an eigenvector for A. If 1 were any eigenvector other than Perron-Frobenius, it would be orthogonal to Perron-Frobenius, which it can’t be because both have positive entries, so it is Perron-Frobenius, so λ = V.

In fact I had noticed this fact in one of the papers I looked at while writing this (that if the matrix had all row-sums the same, the long-term behavior didn’t depend on the matrix) but didn’t understand why until just this minute. So this is kind of cool — if the kind of heterogeneity the network exhibits doesn’t cause different kinds of vertices to have different mean degree, you can just pretend the network is homogeneous with whatever mean R_0 it has. This is a generalization of the fact that two towns with no contact which have the same R_0 can be treated as one town with the same R_0 and you don’t get anything wrong.