## Prime subset sums

Efrat Bank‘s interesting number theory seminar here before break was about sums of arithmetic functions on short intervals in function fields.  As I was saying when I blogged about Hast and Matei’s paper, a short interval in F_q[t] means:  the set of monic degree-n polynomials P such that

deg(P-P_0) < h

for some monic degree-n P_0 and some small h.  Bank sets this up even more generally, defining an interval in the space V of global sections of a line bundle on an arbitrary curve over F_q.  In Bank’s case, by contrast with the number field case, an interval is an affine linear subspace of some ambient vector space of forms.  This leads one to wonder:  what’s special about these specific affine spaces?  What about general spaces?

And then one wonders:  well, what classical question over Z does this correspond to?  So here it is:  except I’m not sure this is a classical question, though it sort of seems like it must be.

Question:  Let c > 1 be a constant.  Let A be a set of integers with |A| = n and max(A) < c^n.  Let S be the (multi)set of sums of subsets of A, so |S| = 2^n.  What can we say about the number of primes in S?  (Update:  as Terry points out in comments, I need some kind of coprimality assumption; at the very least we should ask that there’s no prime factor common to everything in A.)

I’d like to say that S is kind of a “generalized interval” — if A is the first n powers of 2, it is literally an interval.  One can also ask about other arithmetic functions:  how big can the average of Mobius be over S, for instance?  Note that the condition on max(S) is important:   if you let S get as big as you want, you can make S have no primes or you can make S be half prime (thanks to Ben Green for pointing this out to me.)  The condition on max(S) can be thought of as analogous to requiring that an interval containing N has size at least some fixed power of N, a good idea if you want to average arithmetic functions.

Anyway:  is anything known about this?  I can’t figure out how to search for it.

## Women in math: accountability

I’ve talked about women in math a lot on this blog and maybe you think of me as someone who is aware of and resistant to sexism in our profession.  But what if we look at some actual numbers?

My Ph.D. students:  2 out of 15 are women.

Coauthors, last 5 years: 2 out of 23 are women.

Letters posted on MathJobs, last 2 years:  3 out of 24 are women.

That is sobering.  I’m hesitant about posting this, but I think it’s a good idea for senior people to look at their own numbers and get some sense of how much they’re actually doing to support early-career women in the profession.

Update:  I removed the numbers for tenure/promotion letters.  A correspondent pointed out that these, unlike the other items, are supposed to be confidential, and given the small numbers are at least partially de-anonymizable.

Tagged

## Hast and Matei, “Moments of arithmetic functions in short intervals”

Two of my students, Daniel Hast and Vlad Matei, have an awesome new paper, and here I am to tell you about it!

A couple of years ago at AIM I saw Jon Keating talk about this charming paper by him and Ze’ev Rudnick.  Here’s the idea.  Let f be an arithmetic function: in that particular paper, it’s the von Mangoldt function, but you can ask the same question (and they do) for Möbius and many others.

Now we know the von Mangoldt function is 1 on average.  To be more precise: in a suitably long interval ($[X,X+X^{1/2 + \epsilon}]$ is long enough under Riemann) the average of von Mangoldt is always close to 1.  But the average over a short interval can vary.  You can think of the sum of von Mangoldt over  $[x,x+H]$, with H = x^d,  as a function f(x) which has mean 1 but which for d < 1/2 need not be concentrated at 1.  Can we understand how much it varies?  For a start, can we compute its variance as x ranges from 1 to X?This is the subject of a conjecture of Goldston and Montgomery.  Keating and Rudnick don’t prove that conjecture in its original form; rather, they study the problem transposed into the context of the polynomial ring F_q[t].  Here, the analogue of archimedean absolute value is the absolute value

$|f| = q^{\deg f}$

so an interval of size q^h is the set of f such that deg(f-f_0) < q^h for some polynomial f_0.

So you can take the monic polynomials of degree n, split that up into q^{n-h} intervals of size q^h, and sum f over each interval, and take the variance of all these sums.  Call this V_f(n,h).  What Keating and Rudnick show is that

$\lim_{q \rightarrow \infty} q^{-(h+1)} V(n,h) = n - h - 2$.

This is not quite the analogue of the Goldston-Montgomery conjecture; that would be the limit as n,h grow with q fixed.  That, for now, seems out of reach.  Keating and Rudnick’s argument goes through the Katz equidistribution theorems (plus some rather hairy integration over groups) and the nature of those equidistribution theorems — like the Weil bounds from which they ultimately derive — is to give you control as q gets large with everything else fixed (or at least growing very slo-o-o-o-o-wly.)  Generally speaking, a large-q result like this reflects knowledge of the top cohomology group, while getting a fixed-q result requires some control of all the cohomology groups, or at least all the cohomology groups in a large range.

Now for Hast and Matei’s paper.  Their observation is that the variance of the von Mangoldt function can actually be studied algebro-geometrically without swinging the Katz hammer.  Namely:  there’s a variety X_{2,n,h} which parametrizes pairs (f_1,f_2) of monic degree-n polynomials whose difference has degree less than h, together with an ordering of the roots of each polynomial.  X_{2,n,h} carries an action of S_n x S_n by permuting the roots.  Write Y_{2,n,h} for the quotient by this action; that’s just the space of pairs of polynomials in the same h-interval.  Now the variance Keating and Rudnick ask about is more or less

$\sum_{(f_1, f_2) \in Y_{2,n,h}(\mathbf{F}_q)} \Lambda(f_1) \Lambda(f_2)$

where $\Lambda$ is the von Mangoldt function.  But note that $\Lambda(f_i)$ is completely determined by the factorization of $f_i$; this being the case, we can use Grothendieck-Lefschetz to express the sum above in terms of the Frobenius traces on the groups

$H^i(X_{2,n,h},\mathbf{Q}_\ell) \otimes_{\mathbf{Q}_\ell[S_n \times S_n]} V_\Lambda$

where $V_\Lambda$ is a representation of $S_n \times S_n$ keeping track of the function $\Lambda$.  (This move is pretty standard and is the kind of thing that happens all over the place in my paper with Church and Farb about point-counting and representation stability, in section 2.2 particularly)

When the smoke clears, the behavior of the variance V(n,h) as q gets large is controlled by the top “interesting” cohomology group of X_{2,n,h}.  Now X_{2,n,h} is a complete intersection, so you might think its interesting cohomology is all in the middle.  But no — it’s singular, so you have to be more careful.  Hast and Matei carry out a careful analysis of the singular locus of X_{2,n,h}, and use this to show that the cohomology groups that vanish in a large range.  Outside that range, Weil bounds give an upper bound on the trace of Frobenius.  In the end they get

$V(n,h) = O(q^{h+1})$.

In other words, they get the order of growth from Keating-Rudnick but not the constant term, and they get it without invoking all the machinery of Katz.  What’s more, their argument has nothing to do with von Mangoldt; it applies to essentially any function of f that only depends on the degrees and multiplicities of the irreducible factors.

What would be really great is to understand that top cohomology group H as an S_n x S_n – representation.  That’s what you’d need in order to get that n-h-2 from Keating-Rudnick; you could just compute it as the inner product of H with $V_\Lambda$.  You want the variance of a different arithmetic function, you pair H with a different representation.  H has all the answers.  But neither they nor I could see how to compute H.

Then came Brad Rodgers.  Two months ago, he posted a preprint which gets the constant term for the variance of any arithmetic function in short intervals.  His argument, like Keating-Rudnick, goes through Katz equidistribution.  This is the same information we would have gotten from knowing H.  And it turns out that Hast and Matei can actually provably recover H from Rodgers’ result; the point is that the power of q Rodgers get can only arise from H, because all the other cohomology groups of high enough weight are the ones Hast and Matei already showed are zero.

So in the end they find

$H = \oplus_\lambda V_\lambda \boxtimes V_\lambda$

where $\lambda$ ranges over all partitions of n whose top row has length at most n-h-2.

I don’t think I’ve ever seen this kind of representation come up before — is it familiar to anyone?

Anyway:  what I like so much about this new development is that it runs contrary to the main current in this subject, in which you prove theorems in topology or algebraic geometry and use them to solve counting problems in arithmetic statistics over function fields.  Here, the arrow goes the other way; from Rodgers’s counting theorem, they get a computation of a cohomology group which I can’t see any way to get at by algebraic geometry.  That’s cool!  The other example I know of the arrow going this direction is this beautiful paper of Browning and Vishe, in which they use the circle method over function fields to prove the irreducibility of spaces of rational curves on low-degree hypersurfaces.  I should blog about that paper too!  But this is already getting long….

## Call for nominations for the Chern Medal

This is a guest post by Caroline Series.

The Chern Medal is a relatively new prize, awarded once every four years jointly by the IMU and the Chern Medal Foundation (CMF) to an individual whose accomplishments warrant the highest level of recognition for outstanding achievements in the field of mathematics. Funded by the CMF, the Medalist receives a cash prize of US$250,000. In addition, each Medalist may nominate one or more organizations to receive funding totalling US$ 250,000, for the support of research, education, or other outreach programs in the field of mathematics.

Professor Chern devoted his life to mathematics, both in active research and education, and in nurturing the field whenever the opportunity arose. He obtained fundamental results in all the major aspects of modern geometry and founded the area of global differential geometry. Chern exhibited keen aesthetic tastes in his selection of problems, and the breadth of his work deepened the connections of geometry with different areas of mathematics. He was also generous during his lifetime in his personal support of the field.

Nominations should be sent to the Prize Committee Chair: Caroline Series, email: chair(at)chern18.mathunion.org by 31st December 2016. Further details and nomination guidelines for this and the other IMU prizes can be found here.  Note that previous winners of other IMU prizes, such as the Fields Medal, are not eligible for consideration.

Tagged , ,

## Math!

I really like talking with AB about arithmetic and her strategies for doing problems.  All this Common Core stuff about breaking up into hundreds and tens and ones that people like to make fun of?  That’s how she does things.  She can describe her process a lot more articulately than most grownups can, because it’s less automatic for her.  I learn a lot about how to teach math by watching her learn math.

## Kevin Jamieson, hyperparameter optimization, playoffs

Kevin Jamieson gave a great seminar here on Hyperband, his algorithm for hyperparameter optimization.

Here’s the idea.  Doing machine learning involves making a lot of choices.  You set up your deep learning neural thingamajig but that’s not exactly one size fits all:  How many layers do you want in your net?  How fast do you want your gradient descents to step?  And etc. and etc.  The parameters are the structures your thingamajig learns.  The hyperparameters are the decisions you make about your thingamajig before you start learning.  And it turns out these decisions can actually affect performance a lot.  So how do you know how to make them?

Well, one option is to pick N choices of hyperparameters at random, run your algorithm on your test set with each choice, and see how you do.  The problem is, thingamajigs take a long time to converge.  This is expensive to do, and when N is small, you’re not really seeing very much of hyperparameter space (which might have dozens of dimensions.)

A more popular choice is to place some prior on the function

F:[hyperparameter space] -> [performance on test set]

You make a choice of hyperparameters, you run the thingamajig, based on the output you update your distribution on F, based on your new distribution you choose a likely-to-be-informative hyperparameter and run again, etc.

This is called “Bayesian optimization of hyperparameters” — it works pretty well — but really only about as well as taking twice as many random choices of hyperparameters, in practice.  A 2x speedup is nothing to sneeze at, but it still means you can’t get N large enough to search much of the space.

Kevin thinks you should think of this as a multi-armed bandit problem.  You have a hyperparameter whose performance you’d like to judge.  You could run your thingamajig with those parameters until it seems to be converging, and see how well it does.  But that’s expensive.  Alternatively, you could run your thingamajig (1/c) times as long; then you have time to consider Nc values of the hyperparameters, much better.  But of course you have a much less accurate assessment of the performance:  maybe the best performer in that first (1/c) time segment is actually pretty bad, and just got off to a good start!

So you do this instead.  Run the thingamajig for time (1/c) on Nc values.  That costs you N.  Then throw out all values of the hyperparameters that came in below median on performance.  You still have (1/2)Nc values left, so continue running those processes for another time (1/c).  That costs you (1/2)N.  Throw out everything below the median.  And so on.  When you get to the end you’ve spent N log Nc, not bad at all but instead of looking at only N hyperparameters, you’ve looked at Nc, where c might be pretty big.  And you haven’t wasted lots of processor time following unpromising choices all the way to the end; rather, you’ve mercilessly culled the low performers along the way.

But how do you choose c?  I insisted to Kevin that he call c a hyperhyperparameter but he wasn’t into it.  No fun!  Maybe the reason Kevin resisted my choice is that he doesn’t actually choose c; he just carries out his procedure once for each c as c ranges over 1,2,4,8,…. N; this costs you only another log N.

In practice, this seems to find hyperparameters just as well as more fancy Bayesian methods, and much faster.  Very cool!  You can imagine doing the same things in simpler situations (e.g. I want to do a gradient descent, where should I start?) and Kevin says this works too.

In some sense this is how a single-elimination tournament works!  In the NCAA men’s basketball finals, 64 teams each play a game; the teams above the median are 1-0, while the teams below the median, at 0-1, get cut.  Then the 1-0 teams each play one more game:  the teams above the median at 2-0 stay, the teams below the median at 1-1 get cut.

What if the regular season worked like this?  Like if in June, the bottom half of major league baseball just stopped playing, and the remaining 15 teams duked it out until August, then down to 8… It would be impossible to schedule, of course.  But in a way we have some form of it:  at the July 31 trade deadline, teams sufficiently far out of the running can just give up on the season and trade their best players for contending teams’ prospects.  Of course the bad teams keep playing games, but in some sense, the competition has narrowed to a smaller field.

## Luke Pebody on sharp bounds for tri-colored sum-free sets

Quick update on this post, where I listed three variations on the problem of finding large subsets of an abelian group A with no three terms in arithmetic progression.  The quantities I called G_2(F_q^n) and G_3(F_q^n) in that post are both bounded above by M(F_q^n), the number of monomials of total degree at most (q-1)n/3 and degree at most q-1 in each variable.  There’s already been a lot of motion in the few weeks since I wrote that post!  A result of Kleinberg, Sawin, and Speyer shows that G_2(F_q^n) is bounded between c_q^n and c_q^{n-epsilon} for some constant c_q, which they explicitly describe but which is kind of hard to compute.  But it’s kind of a win-win.  Either c_q^n is less than M(F_q^n), in which case, great, improvement over the results of CLP/EG, or not, in which case, great, the bounds on tri-colored sum-free sets in CLP/EG are tight up to subexponential factors!  And now Luke Pebody has posted a preprint showing that the latter is the case.

To sum up:  the quantities G_2(F_q^n) and G_3(F_q^n) which I alluded to in the original post are now bounded above by M(F_q^n) and below by M(F_q^n)^{1-epsilon}.  Wonderful!

This only heightens the interest in the original problem of estimating G_1(F_q^n), the size of the largest subset of F_q^n containing no three-term arithmetic progession.  Is the bound M(F_q^n) essentially sharp?  Or is G_1(F_q^n) much smaller?

## “On l-torsion in class groups of number fields” (with L. Pierce, M.M. Wood)

New paper up with Lillian Pierce and Melanie Matchett Wood!

Here’s the deal.  We know a number field K of discriminant D_K has class group of size bounded above by roughly D_K^{1/2}.  On the other hand, if we fix a prime l, the l-torsion in the class group ought to be a lot smaller.  Conjectures of Cohen-Lenstra type predict that the average size of the l-torsion in the class group of D_K, as K ranges over a “reasonable family” of algebraic number fields, should be constant.  Very seldom do we actually know anything like this; we just have sporadic special cases, like the Davenport-Heilbronn theorem, which tells us that the 3-torsion in the class group of a random quadratic field is indeed constant on average.

But even though we don’t know what’s true on average, why shouldn’t we go ahead and speculate on what’s true universally?  It’s too much to ask that Cl(K)[l] literally be bounded as K varies (at least if you believe even the most modest version of Cohen-Lenstra, which predicts that any value of dim Cl(D_K)[l] appears for a positive proportion of quadratic fields K) but people do think it’s small:

Conjecture:  |Cl(K)[l]| < D_K^ε.

Even beating the trivial bound

|Cl(K)[l]| < |Cl(K)| < D_K^{1/2 + ε}

is not easy.  Lillian was the first to do it for 3-torsion in quadratic fields.  Later, Helfgott-Venkatesh and Venkatesh and I sharpened those bounds.  I hear from Frank Thorne that he, Bhargava, Shankar, Tsimerman and Zhao have a nontrivial bound on 2-torsion for the class group of number fields of any degree.

In the new paper with Pierce and Wood, we prove nontrivial bounds for the average size of the l-torsion in the class group of K, where l is any integer, and K is a random number field of degree at most 5.  These bounds match the conditional bounds Akshay and I get on GRH.  The point, briefly, is this.  To make our argument work, Akshay and I needed GRH in order to guarantee the existence of a lot of small rational primes which split in K.  (In a few cases, like 3-torsion of quadratic fields, we used a “Scholz reflection trick” to get around this necessity.)  At the time, there was no way to guarantee small split primes unconditionally, even on average.  But thanks to the developments of the last decade, we now know a lot more about how to count number fields of small degree, even if we want to do something delicate like keep track of local conditions.  So, for instance, not only can one count quartic fields of discriminant < X, we can count fields which have specified decomposition at any specified finite set of rational primes.  This turns out to be enough — as long as you are super-careful with error terms! — to  allow us to show, unconditionally, that most number fields of discriminant < D have enough small split primes to make the bound on l-torsion go.  Hopefully, the care we took here to get counts with explicit error terms for number fields subject to local conditions will be useful for other applications too.

## Variations on three-term arithmetic progressions

Here are three functions.  Let N be an integer, and consider:

•  G_1(N), the size of the largest subset S of 1..N containing no 3-term arithmetic progression;
•  G_2(N), the largest M such that there exist subsets S,T of 1..N with |S| = |T| = M such that the equation s_i + t_i = s_j + t_k has no solutions with (j,k) not equal to (i,i).  (This is what’s called  a tri-colored sum-free set.)
• G_3(N), the largest M such that the following is true: given subsets S,T of 1..N, there always exist subsets S’ of S and T’ of T with |S’| + |T’| = M and $S'+T \cup S+T' = S+T.$

You can see that G_1(N) <= G_2(N) <= G_3(N).  Why?  Because if S has no 3-term arithmetic progression, we can take S = T and s_i = t_i, and get a tri-colored sum-free set.  Now suppose you have a tri-colored sum-free set (S,T) of size M; if S’ and T’ are subsets of S and T respectively, and $S'+T \cup S+T' = S+T$, then for every pair (s_i,t_i), you must have either s_i in S’ or t_i in T’; thus |S’| + |T’| is at least M.

When the interval 1..N is replaced by the group F_q^n, the Croot-Lev-Pach-Ellenberg-Gijswijt argument shows that G_1(F_q^n) is bounded above by the number of monomials of degree at most (q-1)n/3; call this quantity M(F_q^n).  In fact, G_3(F_q^n) is bounded above by M(F_q^n), too (see the note linked from this post) and the argument is only a  modest extension of the proof for G_1.  For all we know, G_1(F_q^n) might be much smaller, but Kleinberg has recently shown that G_2(F_2^n) (whence also G_3(F_2^n)) is equal to M(F_2^n) up to subexponential factors, and work in progress by Kleinberg and Speyer has shown this for several more q and seems likely to show that the bound is tight in general.  On the other hand, I have no idea whether to think G_1(F_q^n) is actually equal to M(F_q^n); i.e. is the bound proven by me and Dion sharp?

The behavior of G_1(N) is, of course, very much studied; we know by Behrend (recently sharpened by Elkin) that G_1(N) is at least N/exp(c sqrt(log N)).  Roth proved that G_1(N) = o(N), and the best bounds, due to Tom Sanders, show that G_1(N) is O(N(log log N)^5 / log N).  (Update:  Oops, no!  Thomas Bloom has an upper bound even a little better than Sanders, change that 5 to a 4.)

What about G_2(N) and G_3(N)?  I’m not sure how much people have thought about these problems.  But if, for instance, you could show (for example, by explicit constructions) that G_3(N) was closer to Sanders than to Behrend/Elkin, it would close off certain strategies for pushing the bound on G_1(N) downward. (Update:  Jacob Fox tells me that you can get an upper bound for G_2(N) of order N/2^{clog* N} from his graph removal paper, applied to the multicolored case.)

Do we think that G_2(N) and G_3(N) are basically equal, as is now known to be the case for F_q^n?

## Sumsets and sumsets of subsets

Say that ten times fast!
Now that you’re done, here’s an interesting fact.  I have been turning over this argument of Croot-Lev-Pach and mine and Gijswijt’s for a couple of weeks now, trying to understand what it’s really doing that leads to control of subsets of F_q^n without arithmetic progressions.

It turns out that there’s a nice refinement of what we prove, which somehow feels like it’s using more of the full strength of the Croot-Lev-Pach lemma.  The critical input is an old result of Roy Meshulam on linear spaces of low-rank matrices.

So here’s a statement.  Write M(q,n) for the CLP/EG upper bound on subsets of F_q^n with no three-term AP.

Then Theorem:  every subset S of F_q^n contains a subset S’ of size at most M(q,n) such that S’+S = S+S.

(Exercise:   Show that this immediately implies the bound on subsets with no three-term AP.)

I find this result very funny, so much so that I didn’t believe it at first, but I think the proof is right..!  Well, see for yourself, here it is.

Two natural questions:  is the bound on S’ sharp?  And is there any analogue of this phenomenon for the integers?

Update:  Of course right after I post this I realize that maybe this can be said more simply, without the invocation of Meshulam’s result (though I really like that result!)  Namely:  it’s equivalent to say that if |S| > M(q,n), you can remove ONE element from S and get an S’ with S’+S = S+S.  Why is this so?  Well, suppose not.  Choose some s_1.  We know it can’t be removed, so there must be some s_1 + s’_1 which is not expressible as a sum in S+T any other way.  The same applies to s_2, s_3, and so on.  So you end up with a set U of “unique sums” s_i + s’_i.  Now you can apply the CLP/EG argument directly to this situation; let P be a polyomial vanishing off U, this makes the matrix P(s+t) on S have a single 1 in each row and each column, and this is just as good as diagonal from the point of view of the argument in EG, so you can conclude just as there that |S| <= M(q,n).  Does that make sense?  This is the same spirit in which the polynomial method is used by Blasiak-Church-Cohn-Grochow-Umans to control multicolored sum-free sets, and the multicolored sum-free set of size (2^(4/3))^n constructed by Alon, Shpilka, and Umans also gives a lower bound for the problem under discussion here.

I still like the one-step argument in the linked .pdf better!  But I have to concede that you can prove this fact without doing any fancy linear algebra.

Update to Update (Jun 9):  Actually, I’m not so sure this argument above actually proves the theorem in the linked note.  So maybe you do need to (get to!) use this Meshulam paper after all!  What do you guys think?

Update:  The bound is sharp, at least over F_2!  I just saw this paper of Robert Kleinberg, which constructs a multicolored sum-free set in F_2^n of size just under M(2,n)!  That is, he gives you subsets S and T, both of size just under M(2,n), such that S’+T union S+T’ can’t be all of S+T if S’ and T’ are smaller than (1/2)S and (1/2)T, if I worked this out right.

The construction, which is actually based on one from 2014 by Fu and Kleinberg, actually uses a large subset of a cyclic group Z/MZ, where M is about M(2,n), and turns this into a multicolored sum-free set in (F_2)^n of (about) the same size.  So the difference between the upper bound and the lower bound in the (F_2)^n case is now roughly the same as the difference between the (trivial) upper bound and the lower bound in the case of no-three-term-AP sets in the interval.  Naturally you start to wonder:  a) Does the Fu-Kleinberg construction really have to do with characteristic 2 or is it general?  (I haven’t read it yet.)  b) Can similar ideas be used to construct large 3-AP-free subsets of (F_q)^n?  (Surely this has already been tried?) c) Is there a way to marry Meshulam’s Fourier-analytic argument with the polynomial method to get upper bounds on order (1/n)M(q,n)?  I wouldn’t have thought this worthwhile until I saw this Kleinberg paper, which makes me think maybe it’s not impossible to imagine we’re getting closer to the actual truth.