## Heights on stacks and heights on vector bundles over stacks

I’ve been giving a bunch of talks about work with Matt Satriano and David Zureick-Brown on the problem of defining the “height” of a rational point on a stack.  The abstract usually looks something like this:

Here are two popular questions in number theory:

1.  How many degree-d number fields are there with discriminant at most X?
2.  How many rational points are there on a cubic surface with height at most X?

Our expectations about the first question are governed by Malle’s conjecture; about the second, by the Batyrev-Manin conjecture.  The forms of the conjectures are very similar, predicting in both cases an asymptotic of the form c X^a (log X)^b, and this is no coincidence: I will explain how to think of both questions in a common framework, that of counting points of bounded height on an algebraic stack.  A serious obstacle is that there is no definition of the height of a rational point on a stack.  I will propose a definition and try to convince you it’s the right one.  If there’s time, I’ll also argue that when we talk about heights with respect to a line bundle we have always secretly meant “vector bundle,” or should have.

(joint work with Matt Satriano and David Zureick-Brown)

Frank Calegari asked a good question after I talked about this at Mazur’s birthday conference.  And other people have asked me the same question!  So I thought I’d write about it here on the blog.

An actual (somewhat tangential) math question about your talk: when it comes (going back to the original problem) of extensions with Galois group G, there is (as you well know) a natural cover $\mathbf{A}^n/G \rightarrow \cdot/G,$ and the source has a nice smooth unirational open subscheme which is much less stacky object and could possibly still be used to count G-extensions (or rather, to count G-polynomials). How does this picture interact (if at all) with your talk or the Malle conjecture more generally?

Here’s an answer.  Classically, how do we count degree-n extensions of Q?  We count monic degree-n polynomials with bounded coefficients; that is, we count integral points of bounded height on A^n / S_n, which is isomorphic to A^n, the space of monic degree-n polynomials.

Now A^n / S_n is the total space of a vector bundle over the stack B(S_n).  So you might say that what we’re doing is using “points on the total space of a vector bundle E/X as a proxy for points on X.”  And when you put it that way, you see that it’s what people who work on rational points do all the time!  What do we do when we count rational points on P^1?  We count pairs of coprime integers in a box; in other words, we count integral points on A^2 – 0, which is the total space (sans zero section) of a line bundle on P^1.  More generally, in many cases where people can prove the Batyrev-Manin conjecture for a variety X, it’s precisely by means of passing to a “universal torsor” — the total space of a vector bundle (or an torus bundle sitting in a vector bundle) over X.  Sometimes you can use this technique to get actual asymptotics for rational points on X; other times you just get bounds; if you can prove that, for any x in X(Q), there is a point on the fiber E_x whose height is at most F(height(x)) for some reasonable function F, you can parlay upper bounds for points on E into upper bounds for points on X.  In the classical case, this is the part where we argue that (by Minkowski) a number field with discriminant D contains an algebraic integer whose characteristic polynomial has coefficients bounded in terms of D.

So coming back to the original question:  how do you know which vector bundle on BG is a good one to think about?  Actually, this is far from clear!  The very first thing I ever wrote about counting number fields, my first paper with Akshay, gave new upper bounds for the number of degree-n extensions, by counting points on

$(\mathbf{A}^n)^m / S_n$

where S_n acts diagonally.  In other words, we used a different vector bundle on B(S_n) than the “standard” one, and showed that by optimizing m (and being careful about stripping out loci playing the role of accumulating subvarieties) we could get better upper bounds than the ones coming from counting polynomials.

So apparently I’ve been counting points on vector bundles on stacks all along…!

## How much is the stacks project graph like a random graph?

Cathy posted some cool data yesterday coming from the new visualization features of the magnificent Stacks Project.  Summary:  you can make a directed graph whose vertices are the 10,445 tagged assertions in the Stacks Project, and whose edges are logical dependency.  So this graph (hopefully!) doesn’t have any directed cycles.  (Actually, Cathy tells me that the Stacks Project autovomits out any contribution that would create a logical cycle!  I wish LaTeX could do that.)

Given any assertion v, you can construct the subgraph G_v of vertices which are the terminus of a directed path starting at v.  And Cathy finds that if you plot the number of vertices and number of edges of each of these graphs, you get something that looks really, really close to a line.

Why is this so?  Does it suggest some underlying structure?  I tend to say no, or at least not much — my guess is that in some sense it is “expected” for graphs like this to have this sort of property.

Because I am trying to get strong at sage I coded some of this up this morning. One way to make a random directed graph with no cycles is as follows:  start with N edges, and a function f on natural numbers k that decays with k, and then connect vertex N to vertex N-k (if there is such a vertex) with probability f(k).  The decaying function f is supposed to mimic the fact that an assertion is presumably more likely to refer to something just before it than something “far away” (though of course the stack project is not a strictly linear thing like a book.)

Here’s how Cathy’s plot looks for a graph generated by N= 1000 and f(k) = (2/3)^k, which makes the mean out-degree 2 as suggested in Cathy’s post.

Pretty linear — though if you look closely you can see that there are really (at least) a couple of close-to-linear “strands” superimposed! At first I thought this was because I forgot to clear the plot before running the program, but no, this is the kind of thing that happens.

Is this because the distribution decays so fast, so that there are very few long-range edges? Here’s how the plot looks with f(k) = 1/k^2, a nice fat tail yielding many more long edges:

My guess: a random graph aficionado could prove that the plot stays very close to a line with high probability under a broad range of random graph models. But I don’t really know!

Update:  Although you know what must be happening here?  It’s not hard to check that in the models I’ve presented here, there’s a huge amount of overlap between the descendant graphs; in fact, a vertex is very likely to be connected all but c of the vertices below it for a suitable constant c.

I would guess the Stacks Project graph doesn’t have this property (though it would be interesting to hear from Cathy to what extent this is the case) and that in her scatterplot we are not measuring the same graph again and again.

It might be fun to consider a model where vertices are pairs of natural numbers and (m,n) is connected to (m-k,n-l) with probability f(k,l) for some suitable decay.  Under those circumstances, you’d have substantially less overlap between the descendant trees; do you still get the approximately linear relationship between edges and nodes?