## Variations on three-term arithmetic progressions

Here are three functions.  Let N be an integer, and consider:

•  G_1(N), the size of the largest subset S of 1..N containing no 3-term arithmetic progression;
•  G_2(N), the largest M such that there exist subsets S,T of 1..N with |S| = |T| = M such that the equation s_i + t_i = s_j + t_k has no solutions with (j,k) not equal to (i,i).  (This is what’s called  a tri-colored sum-free set.)
• G_3(N), the largest M such that the following is true: given subsets S,T of 1..N, there always exist subsets S’ of S and T’ of T with |S’| + |T’| = M and $S'+T \cup S+T' = S+T.$

You can see that G_1(N) <= G_2(N) <= G_3(N).  Why?  Because if S has no 3-term arithmetic progression, we can take S = T and s_i = t_i, and get a tri-colored sum-free set.  Now suppose you have a tri-colored sum-free set (S,T) of size M; if S’ and T’ are subsets of S and T respectively, and $S'+T \cup S+T' = S+T$, then for every pair (s_i,t_i), you must have either s_i in S’ or t_i in T’; thus |S’| + |T’| is at least M.

When the interval 1..N is replaced by the group F_q^n, the Croot-Lev-Pach-Ellenberg-Gijswijt argument shows that G_1(F_q^n) is bounded above by the number of monomials of degree at most (q-1)n/3; call this quantity M(F_q^n).  In fact, G_3(F_q^n) is bounded above by M(F_q^n), too (see the note linked from this post) and the argument is only a  modest extension of the proof for G_1.  For all we know, G_1(F_q^n) might be much smaller, but Kleinberg has recently shown that G_2(F_2^n) (whence also G_3(F_2^n)) is equal to M(F_2^n) up to subexponential factors, and work in progress by Kleinberg and Speyer has shown this for several more q and seems likely to show that the bound is tight in general.  On the other hand, I have no idea whether to think G_1(F_q^n) is actually equal to M(F_q^n); i.e. is the bound proven by me and Dion sharp?

The behavior of G_1(N) is, of course, very much studied; we know by Behrend (recently sharpened by Elkin) that G_1(N) is at least N/exp(c sqrt(log N)).  Roth proved that G_1(N) = o(N), and the best bounds, due to Tom Sanders, show that G_1(N) is O(N(log log N)^5 / log N).  (Update:  Oops, no!  Thomas Bloom has an upper bound even a little better than Sanders, change that 5 to a 4.)

What about G_2(N) and G_3(N)?  I’m not sure how much people have thought about these problems.  But if, for instance, you could show (for example, by explicit constructions) that G_3(N) was closer to Sanders than to Behrend/Elkin, it would close off certain strategies for pushing the bound on G_1(N) downward. (Update:  Jacob Fox tells me that you can get an upper bound for G_2(N) of order N/2^{clog* N} from his graph removal paper, applied to the multicolored case.)

Do we think that G_2(N) and G_3(N) are basically equal, as is now known to be the case for F_q^n?

## Sumsets and sumsets of subsets

Say that ten times fast!
Now that you’re done, here’s an interesting fact.  I have been turning over this argument of Croot-Lev-Pach and mine and Gijswijt’s for a couple of weeks now, trying to understand what it’s really doing that leads to control of subsets of F_q^n without arithmetic progressions.

It turns out that there’s a nice refinement of what we prove, which somehow feels like it’s using more of the full strength of the Croot-Lev-Pach lemma.  The critical input is an old result of Roy Meshulam on linear spaces of low-rank matrices.

So here’s a statement.  Write M(q,n) for the CLP/EG upper bound on subsets of F_q^n with no three-term AP.

Then Theorem:  every subset S of F_q^n contains a subset S’ of size at most M(q,n) such that S’+S = S+S.

(Exercise:   Show that this immediately implies the bound on subsets with no three-term AP.)

I find this result very funny, so much so that I didn’t believe it at first, but I think the proof is right..!  Well, see for yourself, here it is.

Two natural questions:  is the bound on S’ sharp?  And is there any analogue of this phenomenon for the integers?

Update:  Of course right after I post this I realize that maybe this can be said more simply, without the invocation of Meshulam’s result (though I really like that result!)  Namely:  it’s equivalent to say that if |S| > M(q,n), you can remove ONE element from S and get an S’ with S’+S = S+S.  Why is this so?  Well, suppose not.  Choose some s_1.  We know it can’t be removed, so there must be some s_1 + s’_1 which is not expressible as a sum in S+T any other way.  The same applies to s_2, s_3, and so on.  So you end up with a set U of “unique sums” s_i + s’_i.  Now you can apply the CLP/EG argument directly to this situation; let P be a polyomial vanishing off U, this makes the matrix P(s+t) on S have a single 1 in each row and each column, and this is just as good as diagonal from the point of view of the argument in EG, so you can conclude just as there that |S| <= M(q,n).  Does that make sense?  This is the same spirit in which the polynomial method is used by Blasiak-Church-Cohn-Grochow-Umans to control multicolored sum-free sets, and the multicolored sum-free set of size (2^(4/3))^n constructed by Alon, Shpilka, and Umans also gives a lower bound for the problem under discussion here.

I still like the one-step argument in the linked .pdf better!  But I have to concede that you can prove this fact without doing any fancy linear algebra.

Update to Update (Jun 9):  Actually, I’m not so sure this argument above actually proves the theorem in the linked note.  So maybe you do need to (get to!) use this Meshulam paper after all!  What do you guys think?

Update:  The bound is sharp, at least over F_2!  I just saw this paper of Robert Kleinberg, which constructs a multicolored sum-free set in F_2^n of size just under M(2,n)!  That is, he gives you subsets S and T, both of size just under M(2,n), such that S’+T union S+T’ can’t be all of S+T if S’ and T’ are smaller than (1/2)S and (1/2)T, if I worked this out right.

The construction, which is actually based on one from 2014 by Fu and Kleinberg, actually uses a large subset of a cyclic group Z/MZ, where M is about M(2,n), and turns this into a multicolored sum-free set in (F_2)^n of (about) the same size.  So the difference between the upper bound and the lower bound in the (F_2)^n case is now roughly the same as the difference between the (trivial) upper bound and the lower bound in the case of no-three-term-AP sets in the interval.  Naturally you start to wonder:  a) Does the Fu-Kleinberg construction really have to do with characteristic 2 or is it general?  (I haven’t read it yet.)  b) Can similar ideas be used to construct large 3-AP-free subsets of (F_q)^n?  (Surely this has already been tried?) c) Is there a way to marry Meshulam’s Fourier-analytic argument with the polynomial method to get upper bounds on order (1/n)M(q,n)?  I wouldn’t have thought this worthwhile until I saw this Kleinberg paper, which makes me think maybe it’s not impossible to imagine we’re getting closer to the actual truth.