Compactness is one of the hardest things in the undergrad curriculum to teach, I find. It’s very hard for students to grasp that the issue is not “is there a finite collection of open sets that covers K” but rather “for every collection of open sets that covers K, some finite subcollection covers K.” This year I came up with a new metaphor. I asked the students to think about how, when a professor assigns a group project, there’s always a very small subgroup of the students who does all the work. That’s sort of how compactness works! Yes, the professor assigned infinitely many open sets to the project, but actually most of them are not really contributing. And it doesn’t matter if some small set of students could do the project; what matters is that some small group among the students assigned to the project is going to end up doing it! And this seems to happen — my students nodded in agreement — no matter how the professor forms the group.

Last week I moderated (virtually) a discussion at Stanford between my poetry friend Stephanie Burt and my category theory friend Emily Riehl, on the topic of “identity” — specifically the question of how, in lyric poetry and in mathematics, one addresses the complex topic of what we do when we identify; whether this means “identifying with” a character in a song or poem or story, or identifying two objects which are not equal but which, in Poincare’s phrase, we “call by the same name.”

What I realized after the fact is that, as in so many other matters, the most succinct formulation is in a They Might Be Giants lyric:

There’s only one thing that I know how to do well I’ve often been told that you only can do what you know how to do well And that’s be you, Be what you’re like, Be like yourself

Surely this points to three different notions that appeared in the discussion:

“be you” — to say that you are you is to assert equality

“be what you’re like” — that is, have exactly the properties that you have and no others — an assertion of indiscernibility

“be like yourself” — this is the assertion of relation (here denoted a “likeness”) between two entities that licenses us, following Poincare, in calling them by the same name — that is, an assertion of equivalence

Here’s YouTube of the discussion:

And here’s YouTube of the They Might Be Giants song, “Whistling in the Dark.” I remember seeing them play this in the fall of 1989, at the Paradise Rock Club, before the album came out, a song nobody had heard. A revelation!

Lots of discussion in my feeds about this Daily Princetonian piece, “Why won’t anyone teach me math?” by first-year student Abigail Rabieh. She just took Math 202, an intro to linear algebra, and the experience was so lousy she’s done with Princeton math for good. That’s bad!

So what was wrong with Rabieh’s class?

“Though I passed MAT 202 class just fine, my experience in it was miserable. The way the course was run did not at all set up students to succeed — or even learn math. For example, though we were provided with practice problems to prepare for our exams, we were never given solutions. My class consistently begged my professor for these, yet all he could say was that not providing them was departmental policy, and it was out of his control.

This begs the question: what interest does a department have in making it impossible to study? Study materials are given so that students can learn the course material and prepare adequately for the exam. Solution sets are part of this: to properly learn, one needs to be able to identify their mistakes and understand why they are wrong. This struggle was reflected in our exam averages, which were, respectively, in the 50s, the 60s, and the 30s.

I am far from the only person who felt this way about my class. MAT 202 has an abysmal rating of 2.71 on princetoncourses.com during the spring 2020-2021 semester. The evaluations on the Office of the Registrar’s website are no better. Students described the course as “disheartening” and said they “lost a lot of respect for the Math department after taking this course.” The advice that came up again and again in many reviews was: “Don’t take this class unless you have to.”

A lot of math teachers instinctively reacted to this with defensiveness, and I was one of them. After all, what’s so bad here? You hand out practice problems for an exam because you want students to do the problems, not because you want them to read the solutions; the mechanism is that the student works all the problems they can and then asks in office hours or review session about the problems they couldn’t do. I don’t think it’s bad to include solutions, but I would never say that not doing so makes it “impossible to study.” Student evals, well — the literature on their unreliability is so vast and I’m so lazy that I’ll only just vaguely gesture at it. And of course, most people taking Math 202 are not taking it for intellectual broadening, as Rabieh admirably was; they are taking it because somebody told them they had to. That makes the evaluations impossible to compare with those for a course people take on purpose. And as for those exam scores, well — a median in the 30s is too low, that would mean I’d made the exam much too hard. A median in the 60s, on the other hand, seems fine to me, an indication that I’d written a test with real challenges but which people could mostly do.

But you know what? Our students, especially our first year students, don’t know that unless we tell them! A student who got into Princeton, or for that matter a student who got into UW-Madison, has probably never gotten a 60/100 on a test in their entire life. No wonder it’s demoralizing!

What we have here, as they say, is a failure to communicate. Rabieh came away feeling like her teacher didn’t care whether she learned linear algebra. I’m sure that’s not the case. But I think we often don’t explicitly demonstrate that care in our classrooms. It makes a difference! We are asking the students to care about our stuff and if we want them to respond, we have to show them that we care about their stuff. What do I mean by that, explicitly? I mean that if we think the median score on an exam is going to be in the 60s, we should tell students in advance that we expect that and talk about our reasons for writing the exam that way! I mean that we should ask for student input on how the course is going before the semester is over — three weeks in, send out a survey asking for suggestions, and then talk to the class about the feedback you got, showing them you want their input while you can still make use of it. It means that if you teach a crappy lecture one day — it happens! — be open about that the next time and talk about how you intend to present a better one next time. And I feel like these are mostly “things I already do,” which could just be self-flattery on my part, so let me add this: it might be that showing students we care could mean making nice prepared slides like the professors in all their non-math classes do, instead of just showing up and writing on the blackboard. (Doing this would be a huge change for me and it exhausts me to think about it — but would it be the right thing to do?)

We don’t really talk about this stuff when we talk about teaching. We mostly talk about content and logistics; in what order should we present the material, how much should we cover, how many quizzes should we give, what should our grading policy be, should we hand out solution sets for practice problems? That’s how we think about what good teaching is, and that’s how our students think about what good teaching is, and that’s why that’s the language Rabieh reached for in her article when she wanted to explain why she had such a bad time. But I don’t think it’s actually the problem.

I’ll bet her teacher did care. Most of us do. But it often doesn’t show; let’s say it out loud! And strive for a classroom where we’re working as partners towards a goal, not just trying to get to the end without feeling like we’ve failed.

I wanted to write more in Shape about the complicated moral weights people assigned to the difference/tension between algebra and geometry. I wrote a little about what Poincaré thought about it, who saw the two subjects as representing two temperaments, both indispensable, though any given mathematician might possess them in different proportions.

But then there is this, from S.I. Segal’s “Topologists in Hitler’s Germany”

“the … Nazi movement saw “truly German” mathematics as intuitive and tied to nature, often geometrical, and certainly not axiomatic. Axiomatics, “logic chopping”, too great abstraction, was Franco-Jewish.”

It is grimly funny to imagine the Nazi ideologues locating in abstract algebra and the insolubility of certain equations in radicals the ultimate origins of rootless cosmopolitanism.

I blogged last week about how to think about “R_0,” the constant governing epidemic growth, when different people in the network had different transmissibility rates.

Today, inspired by Kai Kupferschmidt’s article in Science, I look another look at what happens when the transmission rates vary a lot among people. And I learned something new! So let me set that down.

First of all, et me make one point which is silly but actually has mathematical content. Suppose 90% of the entire population is entirely immune to the disease, and the other 10% each encounter 20 people, sharing extensive air with each one . Since only 2 of those 20 are going to be susceptible, the dynamics of this epidemic are the same as that of an epidemic with an R_0 of 2. So if you look at the exponential growth at the beginning of the epidemic, you would say to yourself “oh, the growth factor is 2, so that’s R_0, we should hit herd immunity at about 50% and end up covering about 80% of the population,” but no, because the population that’s relevant to the epidemic is only 10% the total population! So, to your surprise, the epidemic would crest at 5% prevalence and die out with only 8% of people having been infected.

So extreme heterogeneity really matters — the final spread of the epidemic can be completely decoupled from R_0 (if what we mean by R_0 is the top eigenvalue like last time, which measures the early-epidemic exponential rate of spread.)

In my last post, I included a graph of how spread looked in non-heterogeneous populations generated by 6×6 random matrices I chose randomly, and the graph showed that the top eigenvalue and the eventual spread were strongly coupled to each other. But if you choose a random 6×6 matrix the entries are probably not actually going to be that far apart! So I think this was a little misleading. If the transmissibility has a really long tail, things may be different, as the silly example shows. What follows is a somewhat less silly example.

The model of heterogeneity used in this famous paper seems to be standard. You take transmissibility to be a random variable drawn from a gamma distribution with mean B and shape parameter k. (I had to look up what this was!) The variance is B^2/k. As k goes to infinity, this approaches a variable which is exactly B with probability 1, but for k close to 0, the variable is often near zero but occasionally much larger than B. Superspreaders!

Just like in the last post, we are going to completely jettison reality and make this into a static problem about components of random graphs. I am less confident once you start putting in rare high-transmission events that these two models stay coupled together, but since the back-of-the-envelope stuff I’m doing here seems to conform with what the epidemiologists are getting, let’s go with it. In case you don’t feel like reading all the way to the end, the punchline is that on these kinds of models, you can have early exponential growth that looks like R_0 is 2 or 3, but an epidemic that peters out with a very small portion of people infected; the “herd immunity is closer than we think” scenario, as seen in this preprint of Gomes et al.

Let’s also stick with the “rank 1” case because it’s what’s in the paper I linked and there are already interesting results there. Write X for our gamma-distributed random variable.

Then, sticking with the notation from the last post, the mean number of transmissions per person, the “average R_0”, is

(I guess I wrote the last post in terms of matrices, where the average R_0 was just the sum of the entries of the matrix A, or ; here the “matrix” A should be thought of as a rank 1 thing w w^T where w is a vector with entries sampled from X.)

The top eigenvalue is just the trace of the matrix, since all the other eigenvalues are 0, and that is

.

Note already that this is a lot bigger than the average R_0 when k is small! In particular, there are lots of random graphs of this type which have a giant component but average degree < 2; that’s because they have a lot of isolated vertices, I suppose.

So what’s the size of the giant component in a graph like this? As always we are solving an integral equation

for a function f on measure space, where A is the “matrix” expressing the transmission. In fact, a function on measure space is just a random variable, and the rank-1 operator A sends Y to E(XY)X. The rank-1-ness means we can turn this into a problem about real numbers inteadd of random variables; we know Af = aX for some real number a; applying A to both sides of the above equation we then have

or

But the latter expectation is something you can explicitly compute for a gamma-distributed variable! It just involves doing some integrals, which I rarely get to do! I’ll spare you the computation but it ends up being

which you can just solve for a, and then compute E(1-e^{-aX}) if you want to know the total proportion of the population in the giant component. If k is really small — and Adam Kucharski, et al, back in April, wrote it could be as low as 0.1 — then you can get really small giant components even with a fast exponential rate. For instance, take B = 0.45 and k = 0.1; you get a top eigenvalue of 2.2, not inconsistent with the growth rates we saw for unimpeded COVID, but only 7.3% of the population touched by the infection! Another way to put it is that if you introduce the infection to a random individual, the chance of an outbreak is only 7%. As Lloyd-Smith says in the Nature paper, this is a story that makes “disease extinction more likely and outbreaks rarer but more explosive.” Big eigenvalue decouples from eventual spread.

What’s going on here, of course, is that with k this low, your model is that the large majority of people participate in zero interactions of the type likely to cause transmission. Effectively, it’s not so different from the silly example we started with, where 90% of the population enjoyed natural immunity but the other 10% were really close talkers. So having written all this, I’m not sure I needed to have done all those integrals to make this point. But I find it soothing to while away an hour doing integrals.

I don’t know whether I think k=0.1 is realistic, and of course, as Adam K. explained to me by email, who is a superspreader may change with time and context; so 7% is probably too low, since it’s not like once the infection “uses up” the superspreaders there can’t possibly be any more. Probably the variance of propensity to transmit over time should either actually be modeled dynamically or proxied by letting X be a less strongly skewed distribution representing “time average of propensity to transmit” or something like that.

In any event, this does make me feel much more favorable towards the idea that unmitigated spread would end up infecting less than half of the population, not 80% or more. (It does not make me favorable towards unmitigated spread.)

People are talking about R_0. It’s the number that you wish were below 1. Namely: it is the number of people, on average, that a carrier of SARS-CoV-2 infects during the course of their illness. All the interventions we’re undertaking are designed to shrink that number. Because if it’s bigger than 1, the epidemic grows exponentially until it infects a substantial chunk of the population, and if it’s smaller, the epidemic dies out.

But not everybody has the same R_0! According to some of the epdemiologists writing about COVID, this matters. It matters, for instance, to the question of how far into the population the infection gets before it starts to burn itself out for lack of new susceptible hosts (“herd immunity”) and to the question of how much of the population eventually falls ill.

Here’s an easy way to see that heterogeneity can make a difference. Suppose your population consists of two towns of the same size with no intermunicipal intercourse whatsoever. The first town has an R_0 of 1.5 and the second an R_0 of 0.3. Then the mean R_0 of the whole population is 0.9. But this epidemic doesn’t die out; it spreads to cover much of the population of Contagiousville.

I learned an interesting way to think about this heuristically from Tim Gowers on Twitter:

You can read Tim’s interesting thread yourself, but here’s the main idea. Say your population has size N. You make a graph out of the pandemic by placing an edge between vertices i and j if one of the corresponding people infects the other. (Probably better to set this up in a directed way, but I didn’t.) Or maybe slightly better to say: you place an edge if person i and person j interact in a manner such that, were either to enter the interaction infected, both would leave that way. If one person in this graph gets infected, the reach of the infection is the connected component of the corresponding vertex. So how big is that component?

The simplest way to set this up is to connect each pair of vertices with probability c/n, all such choices made independently. This is an Erdos-Renyi random graph. And the component structure of this graph has a beautiful well-known theory; if c > 1, there is a giant component which makes up a positive proportion of the vertices, and all other components are very small. The size of this component is nx, where x is the unique positive number such that

.

If c < 1, on the other hand, there is no big component, so the pandemic is unlikely to reach much of the population. (Correspondingly, the equation above has no nonzero solution.)

It is fair to be skeptical of this model, which is completely static and doesn’t do anything fancy, but let me just say this — the most basic dynamic model of epidemic spread, the SIR model, has an endstate where the proportion of the population that’s been infected is the unique positive x such that

.

Which looks pretty familiar!

Now what happens if you want to take into account that the population isn’t actually an undifferentiated mass? Let’s say, for instance, that your county has a red town and a blue town, each with population n/2. Two people in the red town have a probability of 2/n of being connected, while two people in the blue town have a probability of just 1/n of being connected, and a red-blue pair is connected with probability 1/n. (This kind of random graph is called a stochastic block model, if you want to go look at papers about it.) So the typical red-town person is going to infect 1 fellow red-towner and 0.5 blue-towners, for an R_0 of 1.5, while the blue-towner is going to have an R_0 of 1.

Here’s the heuristic for figuring out the size of the big component. Suppose x is the proportion of the red town in the big component of the graph, and y is the proportion of the blue town in the big component. Take a random red person; what’s the change they’re in the big component? Well, the chance they’re not connected to any of the xn/2 red-towners in the big component is

(oh yeah did I mention that n was infinity?) and the chance that they’re not connected to any of the blue-towners in the big component is

so all in all you get

and by the same token you would get

and now you have two equations that you can solve for x and y! In fact, you find x = 47% and y = 33%. So just as you might expect, the disease gets farther in the spreadier town.

And you might notice that what we’re doing is just matrix algebra! If you think of (x,y) as a vector v, we are solving

where “exponentiation” of a vector is interpreted coordinatewise. You can think of this as finding a fixed point of a nonlinear operator on vectors.

When does the outbreak spread to cover a positive proportion of the population? There’s a beautiful theorem of Bollobas, Janssen, and Riordan that tells you: you get a big component exactly when the largest eigenvalue λ of A, the so-called Perron-Frobenius eigenvalue, is larger than 1. In the case of the matrix studied above, the two eigenvalues are about 1.31 and 0.19. You might also note that in the early stages of the epidemic, when almost everyone in the network is susceptible, the spread in each town will be governed by repeated multiplication of a small vector by A, and the exponential rate of growth is thus also going to be given by λ.

It would be cool if the big eigenvalue also told you what proportion of the vertices are in the giant component, but that’s too much to ask for. For instance, we could just replace A with a diagonal matrix with 1.31 and 0.19 on the diagonal; then the first town gets 43% infected and the second town completely disconnected from the first, gets 0.

What is the relationship between the Perron-Frobenius eigenvalue and the usual “mean R_0” definition? The eigenvalue can be thought of as

while the average R_0 is exactly

where 1 is the all-ones vector. So we see immediately that λ is bounded below by the average R_0, but it really can be bigger; indeed, this is just what we see in the two-separated-towns example we started with, where R_0 is smaller than 1 but λ is larger.

I don’t see how to work out any concise description of the size of the giant component in terms of the symmetric matrix, even in the simplest cases. As we’ve seen, it’s not just a function of λ. The very simplest case might be that where A has rank 1; in other words, you have some division of the population into equal sized boxes, and each box has its own R_0, and then the graph is constructed in a way that is “Erdos-Renyi but with a constraint on degrees” — I think there are various ways to do this but the upshot is that the matrix A is rank 1 and its (i,j) entry is R_0(i) R_0(j) / C where C is the sum of the R_0 in each box. The eigenvalues of A are all zero except for the big one λ, which is equal to the trace, which is

or, if you like, mean(R_0) + variance(R_0)/mean(R_0); so if the average R_0 is held fixed, this gets bigger the more R_0 varies among the population.

And if you look back at that Wikipedia page about the giant component, you’ll see that this is the exact threshold they give for random graphs with specified degree distribution, citing a 2000 paper of Molloy and Reid. Or if you look at Lauren Meyers’s 2005 paper on epidemic spread in networks, you will find the same threshold for epidemic outbreak in section 2. (The math here is descended from work of Molloy-Reed and this much-cited paper of Newman, Strogatz, and Watts.) Are typical models of “random graphs with specified degree distribution” are built to have rank 1 in this sense? I think so — see e.g. this sentence in Newman-Strogatz-Watts: “Another quantity that will be important to us is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex.”

At any rate, even in this rank 1 case, even for 2×2 matrices, it’s not clear to me how to express the size of the giant component except by saying it’s a nonzero solution of . Does the vector v have anything do do with the Perron-Frobenius eigenvector? Challenge for the readers: work this out!

I did try a bunch of randomly chosen 6×6 matrices and plot the overall size of the giant component against λ, and this is what I got:

The blue line shows the proportion of the vertices that get infected if the graph were homogeneous with parameter λ. Which makes me think that thinking of λ as a good proxy for R_0 is not a terrible idea; it seems like a summary statistic of A which is pretty informative about the giant component. (This graph suggests maybe that among graphs with a given λ, the homogeneous one actually has the biggest giant component? Worth checking.)

I should hasten to say that there’s a lot of interest in the superspreader phenomenon, where a very small (probability -> 0) set of vertices has very large (superlinear in n) number of contacts. Meyers works out a bunch of cases like this and I think they are not well modeled by what I’m talking about here. (Update: I wrote another post about this! Indeed, I think the tight connection shown in the chart between λ and size of giant component is not going to persist when there’s extreme heterogeneity of degree.)

A more technical note: the result of Bollobas et al is much more general; there’s no reason the vertices have to be drawn from finitely many towns of equal size; you can instead have the types of vertices drawn from whatever probability space M you like, and then have the probability of an edge between an vertex x and a vertex y be W(x,y) for some symmetric function on M^2; nowadays this is called the “graphon” point of view. Now the matrix is replaced by an operator on functions:

,

the probability g(x) that a vertex of type x is in the giant component is a solution of the integral equation

and a giant component exists just when the operator norm is greater than 1. This is the kind of analysis you’d want to use if you wanted to really start taking geography into account. For instance, take the vertices to be random points in a disc and let W(x,y) be a decreasing function of |x-y|, modeling a network where infection likelihood is a decreasing function of distance. What is the norm of the operator A_W in this case? I’ll bet my harmonic analyst friends know, if any of them are reading this. But I do not.

Update:Now I know, because my harmonic analyst friend Brian Street told me it’s just the integral over W(x,y) over y, which is the same for all y (well, at least it is if we’re on the whole of R^d.) Call that number V. He gave a very nice Fourier-theoretic argument but now that I know the answer I’m gonna explain it in terms of the only part of math I actually understand, 2×2 matrices. Here’s how it goes. In this model, each vertex has the same expected number of neighbors, namely that integral V above. But to say every vertex has the same expected number of neighbors is to say that 1 is an eigenvector for A. If 1 were any eigenvector other than Perron-Frobenius, it would be orthogonal to Perron-Frobenius, which it can’t be because both have positive entries, so it is Perron-Frobenius, so λ = V.

In fact I had noticed this fact in one of the papers I looked at while writing this (that if the matrix had all row-sums the same, the long-term behavior didn’t depend on the matrix) but didn’t understand why until just this minute. So this is kind of cool — if the kind of heterogeneity the network exhibits doesn’t cause different kinds of vertices to have different mean degree, you can just pretend the network is homogeneous with whatever mean R_0 it has. This is a generalization of the fact that two towns with no contact which have the same R_0 can be treated as one town with the same R_0 and you don’t get anything wrong.

When you plot the number of reported deaths from COVID on a log scale you get pictures that look like this one, by John Burn-Murdoch at the Financial Times:

A straight line represents exponential growth, which is what one might expect to see in the early days of a pandemic according to baby models. You’ll note that the straight line doesn’t last very long, thank goodness; in just about every country the line starts to bend. Why are COVID deaths concave? There are quite a few possible reasons.

Suppression is working. When pandemic breaks out, countries take measures to suppress transmission, and people take their own measures over and above what their governments do. (An analysis by Song Gao of our geography department of cellphone location data shows that Wisconsinites median distance traveled from home decreased by 50% even before the governor issued a stay-at-home order.) That should slow the rate of exponential growth — hopefully, flip it to exponential decay.

Change in reporting. Maybe we’re getting better at detecting COVID deaths; if on day 1, only half of COVID deaths were reported as same, while now we’re accurately reporting them all, we’d see a spuriously high slope at the beginning of the outbreak. (The same reasoning applies to the curve for number of confirmed cases; at the beginning, the curve grows faster than the true number of infections as testing ramps up.)

COVID is getting less lethal. This is the whole point of “flattening the curve” — with each week that passes, hospitals are more ready, we have more treatment options and fuller knowledge of which of the existing treatments best suits which cases.

Infection has saturated the population. This is the most controversial one. The baby model (where by baby I mean SIR) tells you that the curve bends as the number of still-susceptible people starts to really drop. The consensus seems to be we’re nowhere near that yet, and almost everyone (in the United States, at least) is still susceptible. But I guess one should be open to the possibility that there are way more asymptomatic people than we think and half the population is already infected; or that for some reason a large proportion of the population carries natural immunity so 1% of population infected is half the susceptible population.

Heterogeneous growth rate. I came across this idea in a post by a physicist (yeah, I know, but it was a good post!) which I can’t find now — sorry, anonymous physicist! There’s not one true exponential growth rate; different places may have different slopes. Just for the sake of argument, suppose a bunch of different locales all start with the same number of deaths, and suppose the rate of exponential growth is uniformly distributed between 0 and 1; then the total deaths at time t is which is . The log of that function has positive second derivative; that is, it tends to make the curve bend up rather than down! That makes sense; with heterogeneous rates of exponential growth, you’ll start with some sort of average of the rates but before long the highest rate will dominate.

I’m sure I’ve skipped some curve-bending factors; propose more in comments!

Talking to AB about multiplying rational numbers. She understands the commutativity of multiplication of integers perfectly well. But I had forgotten that commutativity in the rational setting is actually conceptually harder! That four sixes is six fours you can conceptualize by thinking of a rectangular array, or something equivalent to that. But the fact that seven halves is the same thing as seven divided by two doesn’t seem as “natural” to her. (Is that even an instance of commutativity? I think of the first as 7 x 1/2 and the second as 1/2 x 7.)

Indeed this series diverges, just as the tweeter says: there’s a positive-density subset of n such that the summand exceeds .

More subtle: what about

This should still diverge. Argument: the probability that a real number x chosen uniformly from a large interval has is on order , not ; so there will be a subset of integers with density on order where the summand exceeds , and summing over those integers along should give a sum on order , which can be made as large as you like by bringing close to 0.

Subhash Khot is giving a talk at Current Developments in Math this year and his talk has the intriguing phrase “Grassmann graph” in it so I thought I’d look up what it is and what he and his collaborators prove about it, and indeed it’s interesting! I’m just going to jot down something I learned from “Pseudorandom Sets in Grassmann Graph have Near-Perfect Expansion,” by Khot, Dor Minzer, and Muli Safra, in a way that makes it sound like something algebraic geometers might be interested in, which, indeed, I think they might be!

Suppose you have a sheaf F on a space, and the space has a covering U_1, .. U_N. The sheaf axiom says that if we have a family of sections s_i of F(U_i) such that s_i and s_j agree on for all i,j, then there is actually a global section s in F(X) which restricts to each s_i.

What if we only have an approximate section? That is: what if we have a family of s_i such that: if I select i and j uniformly at random, the probability that s_i and s_j agree on is bounded below by some p > 0. Call such a family a “p-section.” (You should take the view that this is really a family of problems with X changing and N growing, so that when I say p > 0 the content is that p is bounded away from some 0 uniformly in X,N.)

The question is then: Is an approximate section approximately a section?

That is: if s_1, .. s_N from a p-section, is there some actual section s in F(X) such that, for i chosen uniformly at random,

for some p’ depending only on p?

The case which turns out to be relevant to complexity theory is the Grassmann graph, which we can describe as follows: X is a k-dimensional vector space over F_2 and the U_i are the l-dimensional vector spaces for some integer l. But we do something slightly weird (which is what makes it the Grassmann graph, not the Grassmann simplicial complex) and declare that the only nonempty intersections are those where has dimension l-1. The sheaf is the one whose sections on U_i are the linear functions from U_i to F_2.

Speculation 1.7 in the linked paper is that an approximate section is approximately a section. This turns out not to be true! Because there are large sets of U_i whose intersection with the rest of X is smaller than you might expect. This makes sense: if X is a space which is connected but which is “almost a disjoint union of X_1 and X_2,” i.e. and $\latex X_1 \cap X_2$ involves very few of the U_i, then by choosing a section of F(X_1) and a section of F(X_2) independently you can get an approximate section which is unlikely to be approximated by any actual global section.

But the good news is that, in the case at hand, that ends up being the only problem. Khot-Minzer-Safra classify the “approximately disconnected” chunks of X (they are collections of l-dimensional subspaces containing a fixed subspace of small dimension and contained in a fixed subspace of small codimension) and show that any approximate section of F is approximated by a section on some such chunk; this is all that is needed to prove the “2-to-2 games conjecture” in complexity theory, which is their ultimate goal.

So I found all this quite striking! Do questions about approximate global sections being approximated by global sections appear elsewhere? (The question as phrased here is already a bit weird from an algebraic geometry point of view, since it seems to require that you have or impose a probability measure on your set of open patches, but maybe that’s natural in some cases?)