I blogged last week about how to think about “R_0,” the constant governing epidemic growth, when different people in the network had different transmissibility rates.

Today, inspired by Kai Kupferschmidt’s article in Science, I look another look at what happens when the transmission rates vary *a lot* among people. And I learned something new! So let me set that down.

First of all, et me make one point which is silly but actually has mathematical content. Suppose 90% of the entire population is entirely immune to the disease, and the other 10% each encounter 20 people, sharing extensive air with each one . Since only 2 of those 20 are going to be susceptible, the dynamics of this epidemic are the same as that of an epidemic with an R_0 of 2. So if you look at the exponential growth at the beginning of the epidemic, you would say to yourself “oh, the growth factor is 2, so that’s R_0, we should hit herd immunity at about 50% and end up covering about 80% of the population,” but *no*, because the population that’s relevant to the epidemic is only 10% the total population! So, to your surprise, the epidemic would crest at 5% prevalence and die out with only 8% of people having been infected.

So extreme heterogeneity really matters — the final spread of the epidemic can be completely decoupled from R_0 (if what we mean by R_0 is the top eigenvalue like last time, which measures the early-epidemic exponential rate of spread.)

In my last post, I included a graph of how spread looked in non-heterogeneous populations generated by 6×6 random matrices I chose randomly, and the graph showed that the top eigenvalue and the eventual spread were strongly coupled to each other. But if you choose a random 6×6 matrix the entries are probably not actually going to be that far apart! So I think this was a little misleading. If the transmissibility has a really long tail, things may be different, as the silly example shows. What follows is a somewhat less silly example.

The model of heterogeneity used in this famous paper seems to be standard. You take transmissibility to be a random variable drawn from a gamma distribution with mean B and shape parameter k. (I had to look up what this was!) The variance is B^2/k. As k goes to infinity, this approaches a variable which is exactly B with probability 1, but for k close to 0, the variable is often near zero but occasionally much larger than B. Superspreaders!

Just like in the last post, we are going to *completely* jettison reality and make this into a static problem about components of random graphs. I am less confident once you start putting in rare high-transmission events that these two models stay coupled together, but since the back-of-the-envelope stuff I’m doing here seems to conform with what the epidemiologists are getting, let’s go with it. In case you don’t feel like reading all the way to the end, the punchline is that on these kinds of models, you can have early exponential growth that looks like R_0 is 2 or 3, but an epidemic that peters out with a very small portion of people infected; the “herd immunity is closer than we think” scenario, as seen in this preprint of Gomes et al.

Let’s also stick with the “rank 1” case because it’s what’s in the paper I linked and there are already interesting results there. Write X for our gamma-distributed random variable.

Then, sticking with the notation from the last post, the mean number of transmissions per person, the “average R_0”, is

(I guess I wrote the last post in terms of matrices, where the average R_0 was just the sum of the entries of the matrix A, or ; here the “matrix” A should be thought of as a rank 1 thing w w^T where w is a vector with entries sampled from X.)

The top eigenvalue is just the trace of the matrix, since all the other eigenvalues are 0, and that is

.

Note already that this is a lot bigger than the average R_0 when k is small! In particular, there are lots of random graphs of this type which have a giant component but average degree < 2; that’s because they have a lot of isolated vertices, I suppose.

So what’s the size of the giant component in a graph like this? As always we are solving an integral equation

for a function f on measure space, where A is the “matrix” expressing the transmission. In fact, a function on measure space is just a random variable, and the rank-1 operator A sends Y to **E**(XY)X. The rank-1-ness means we can turn this into a problem about real numbers inteadd of random variables; we know Af = aX for some real number a; applying A to both sides of the above equation we then have

or

But the latter expectation is something you can explicitly compute for a gamma-distributed variable! It just involves doing some integrals, which I rarely get to do! I’ll spare you the computation but it ends up being

which you can just solve for a, and then compute **E**(1-e^{-aX}) if you want to know the total proportion of the population in the giant component. If k is really small — and Adam Kucharski, et al, back in April, wrote it could be as low as 0.1 — then you can get really small giant components even with a fast exponential rate. For instance, take B = 0.45 and k = 0.1; you get a top eigenvalue of 2.2, not inconsistent with the growth rates we saw for unimpeded COVID, but only 7.3% of the population touched by the infection! Another way to put it is that if you introduce the infection to a random individual, the chance of an outbreak is only 7%. As Lloyd-Smith says in the Nature paper, this is a story that makes “disease extinction more likely and outbreaks rarer but more explosive.” Big eigenvalue decouples from eventual spread.

(By the way, Kucharski’s book, *The Rules of Contagion**,* is really good — already out in the UK, coming out here in the US soon — I blurbed it!)

What’s going on here, of course, is that with k this low, your model is that the large majority of people participate in zero interactions of the type likely to cause transmission. Effectively, it’s not so different from the silly example we started with, where 90% of the population enjoyed natural immunity but the other 10% were really close talkers. So having written all this, I’m not sure I needed to have done all those integrals to make this point. But I find it soothing to while away an hour doing integrals.

I don’t know whether I think k=0.1 is realistic, and of course, as Adam K. explained to me by email, who is a superspreader may change with time and context; so 7% is probably too low, since it’s not like once the infection “uses up” the superspreaders there can’t possibly be any more. Probably the variance of propensity to transmit over time should either actually be modeled dynamically or proxied by letting X be a less strongly skewed distribution representing “time average of propensity to transmit” or something like that.

In any event, this does make me feel much more favorable towards the idea that unmitigated spread would end up infecting less than half of the population, not 80% or more. (It does not make me favorable towards unmitigated spread.)

[…] this and I think they are not well modeled by what I’m talking about here. (Update: I wrote another post about this! Indeed, I think the tight connection shown in the chart between λ and size of giant […]

Thank you for the interesting post! Btw, an empirical study that I found to be interesting was conducted among pupils and staff (as well as their relatives) of a school in France where the virus apparently was being transmitted for quite a while without being detected. It’s probably one of the rare real-life datasets we can get that could give us some hints as to how the virus spreads unmitigated. roughly: Among those, who were tested, about 40% of the students and teachers were infected and about 60% of the other staff. Here is the link: https://www.medrxiv.org/content/10.1101/2020.04.18.20071134v1.full.pdf

I’m a little confused by the parametrization of transmissibility and the relationship to the “offspring distribution” of Lloyd-Smith et al (2005).

There, the gamma distribution of mean R0 and dispersion k is an underlying rate of total transmission from one person, and the number of offspring is Poisson with that rate, making the offspring distribution negative binomial with mean R0 and dispersion k. That is to say, the row sums of A will be distributed gamma(R0, k).

Here, you set the entries of A to ww^T with the entries of w drawn gamma(B,k), making the (off-diagonal) entries of A drawn from the product gamma(B,k) x gamma(B,k). This will not produce a negative binomial offspring distribution, or a gamma offspring rate.

There will still be overdispersion here, and the rest of the construction makes sense, but the interpretation of k will be somewhat different than in the literature.

You may be right? I am not 100% sure I have understood Lloyd-Smith et al’s graph model. In the model set up in the post, I think the row sums are distributed gamma(B^2,k) so I guess my impression was that this would match what Lloyd-Smith got but I could be wrong! At any rate they are both models with a strongly heterogeneous degree distribution that display this “fast but limited outbreak” behavior.

[…] the “overdispersion hypothesis.” This has been a thing for a while. Suppose the spread of COVID is highly heterogeneous, with only a few infected people producing […]