## Kevin Jamieson, hyperparameter optimization, playoffs

Kevin Jamieson gave a great seminar here on Hyperband, his algorithm for hyperparameter optimization.

Here’s the idea.  Doing machine learning involves making a lot of choices.  You set up your deep learning neural thingamajig but that’s not exactly one size fits all:  How many layers do you want in your net?  How fast do you want your gradient descents to step?  And etc. and etc.  The parameters are the structures your thingamajig learns.  The hyperparameters are the decisions you make about your thingamajig before you start learning.  And it turns out these decisions can actually affect performance a lot.  So how do you know how to make them?

Well, one option is to pick N choices of hyperparameters at random, run your algorithm on your test set with each choice, and see how you do.  The problem is, thingamajigs take a long time to converge.  This is expensive to do, and when N is small, you’re not really seeing very much of hyperparameter space (which might have dozens of dimensions.)

A more popular choice is to place some prior on the function

F:[hyperparameter space] -> [performance on test set]

You make a choice of hyperparameters, you run the thingamajig, based on the output you update your distribution on F, based on your new distribution you choose a likely-to-be-informative hyperparameter and run again, etc.

This is called “Bayesian optimization of hyperparameters” — it works pretty well — but really only about as well as taking twice as many random choices of hyperparameters, in practice.  A 2x speedup is nothing to sneeze at, but it still means you can’t get N large enough to search much of the space.

Kevin thinks you should think of this as a multi-armed bandit problem.  You have a hyperparameter whose performance you’d like to judge.  You could run your thingamajig with those parameters until it seems to be converging, and see how well it does.  But that’s expensive.  Alternatively, you could run your thingamajig (1/c) times as long; then you have time to consider Nc values of the hyperparameters, much better.  But of course you have a much less accurate assessment of the performance:  maybe the best performer in that first (1/c) time segment is actually pretty bad, and just got off to a good start!

So you do this instead.  Run the thingamajig for time (1/c) on Nc values.  That costs you N.  Then throw out all values of the hyperparameters that came in below median on performance.  You still have (1/2)Nc values left, so continue running those processes for another time (1/c).  That costs you (1/2)N.  Throw out everything below the median.  And so on.  When you get to the end you’ve spent N log Nc, not bad at all but instead of looking at only N hyperparameters, you’ve looked at Nc, where c might be pretty big.  And you haven’t wasted lots of processor time following unpromising choices all the way to the end; rather, you’ve mercilessly culled the low performers along the way.

But how do you choose c?  I insisted to Kevin that he call c a hyperhyperparameter but he wasn’t into it.  No fun!  Maybe the reason Kevin resisted my choice is that he doesn’t actually choose c; he just carries out his procedure once for each c as c ranges over 1,2,4,8,…. N; this costs you only another log N.

In practice, this seems to find hyperparameters just as well as more fancy Bayesian methods, and much faster.  Very cool!  You can imagine doing the same things in simpler situations (e.g. I want to do a gradient descent, where should I start?) and Kevin says this works too.

In some sense this is how a single-elimination tournament works!  In the NCAA men’s basketball finals, 64 teams each play a game; the teams above the median are 1-0, while the teams below the median, at 0-1, get cut.  Then the 1-0 teams each play one more game:  the teams above the median at 2-0 stay, the teams below the median at 1-1 get cut.

What if the regular season worked like this?  Like if in June, the bottom half of major league baseball just stopped playing, and the remaining 15 teams duked it out until August, then down to 8… It would be impossible to schedule, of course.  But in a way we have some form of it:  at the July 31 trade deadline, teams sufficiently far out of the running can just give up on the season and trade their best players for contending teams’ prospects.  Of course the bad teams keep playing games, but in some sense, the competition has narrowed to a smaller field.

## Luke Pebody on sharp bounds for tri-colored sum-free sets

Quick update on this post, where I listed three variations on the problem of finding large subsets of an abelian group A with no three terms in arithmetic progression.  The quantities I called G_2(F_q^n) and G_3(F_q^n) in that post are both bounded above by M(F_q^n), the number of monomials of total degree at most (q-1)n/3 and degree at most q-1 in each variable.  There’s already been a lot of motion in the few weeks since I wrote that post!  A result of Kleinberg, Sawin, and Speyer shows that G_2(F_q^n) is bounded between c_q^n and c_q^{n-epsilon} for some constant c_q, which they explicitly describe but which is kind of hard to compute.  But it’s kind of a win-win.  Either c_q^n is less than M(F_q^n), in which case, great, improvement over the results of CLP/EG, or not, in which case, great, the bounds on tri-colored sum-free sets in CLP/EG are tight up to subexponential factors!  And now Luke Pebody has posted a preprint showing that the latter is the case.

To sum up:  the quantities G_2(F_q^n) and G_3(F_q^n) which I alluded to in the original post are now bounded above by M(F_q^n) and below by M(F_q^n)^{1-epsilon}.  Wonderful!

This only heightens the interest in the original problem of estimating G_1(F_q^n), the size of the largest subset of F_q^n containing no three-term arithmetic progession.  Is the bound M(F_q^n) essentially sharp?  Or is G_1(F_q^n) much smaller?

## “On l-torsion in class groups of number fields” (with L. Pierce, M.M. Wood)

New paper up with Lillian Pierce and Melanie Matchett Wood!

Here’s the deal.  We know a number field K of discriminant D_K has class group of size bounded above by roughly D_K^{1/2}.  On the other hand, if we fix a prime l, the l-torsion in the class group ought to be a lot smaller.  Conjectures of Cohen-Lenstra type predict that the average size of the l-torsion in the class group of D_K, as K ranges over a “reasonable family” of algebraic number fields, should be constant.  Very seldom do we actually know anything like this; we just have sporadic special cases, like the Davenport-Heilbronn theorem, which tells us that the 3-torsion in the class group of a random quadratic field is indeed constant on average.

But even though we don’t know what’s true on average, why shouldn’t we go ahead and speculate on what’s true universally?  It’s too much to ask that Cl(K)[l] literally be bounded as K varies (at least if you believe even the most modest version of Cohen-Lenstra, which predicts that any value of dim Cl(D_K)[l] appears for a positive proportion of quadratic fields K) but people do think it’s small:

Conjecture:  |Cl(K)[l]| < D_K^ε.

Even beating the trivial bound

|Cl(K)[l]| < |Cl(K)| < D_K^{1/2 + ε}

is not easy.  Lillian was the first to do it for 3-torsion in quadratic fields.  Later, Helfgott-Venkatesh and Venkatesh and I sharpened those bounds.  I hear from Frank Thorne that he, Bhargava, Shankar, Tsimerman and Zhao have a nontrivial bound on 2-torsion for the class group of number fields of any degree.

In the new paper with Pierce and Wood, we prove nontrivial bounds for the average size of the l-torsion in the class group of K, where l is any integer, and K is a random number field of degree at most 5.  These bounds match the conditional bounds Akshay and I get on GRH.  The point, briefly, is this.  To make our argument work, Akshay and I needed GRH in order to guarantee the existence of a lot of small rational primes which split in K.  (In a few cases, like 3-torsion of quadratic fields, we used a “Scholz reflection trick” to get around this necessity.)  At the time, there was no way to guarantee small split primes unconditionally, even on average.  But thanks to the developments of the last decade, we now know a lot more about how to count number fields of small degree, even if we want to do something delicate like keep track of local conditions.  So, for instance, not only can one count quartic fields of discriminant < X, we can count fields which have specified decomposition at any specified finite set of rational primes.  This turns out to be enough — as long as you are super-careful with error terms! — to  allow us to show, unconditionally, that most number fields of discriminant < D have enough small split primes to make the bound on l-torsion go.  Hopefully, the care we took here to get counts with explicit error terms for number fields subject to local conditions will be useful for other applications too.

## Variations on three-term arithmetic progressions

Here are three functions.  Let N be an integer, and consider:

•  G_1(N), the size of the largest subset S of 1..N containing no 3-term arithmetic progression;
•  G_2(N), the largest M such that there exist subsets S,T of 1..N with |S| = |T| = M such that the equation s_i + t_i = s_j + t_k has no solutions with (j,k) not equal to (i,i).  (This is what’s called  a tri-colored sum-free set.)
• G_3(N), the largest M such that the following is true: given subsets S,T of 1..N, there always exist subsets S’ of S and T’ of T with |S’| + |T’| = M and $S'+T \cup S+T' = S+T.$

You can see that G_1(N) <= G_2(N) <= G_3(N).  Why?  Because if S has no 3-term arithmetic progression, we can take S = T and s_i = t_i, and get a tri-colored sum-free set.  Now suppose you have a tri-colored sum-free set (S,T) of size M; if S’ and T’ are subsets of S and T respectively, and $S'+T \cup S+T' = S+T$, then for every pair (s_i,t_i), you must have either s_i in S’ or t_i in T’; thus |S’| + |T’| is at least M.

When the interval 1..N is replaced by the group F_q^n, the Croot-Lev-Pach-Ellenberg-Gijswijt argument shows that G_1(F_q^n) is bounded above by the number of monomials of degree at most (q-1)n/3; call this quantity M(F_q^n).  In fact, G_3(F_q^n) is bounded above by M(F_q^n), too (see the note linked from this post) and the argument is only a  modest extension of the proof for G_1.  For all we know, G_1(F_q^n) might be much smaller, but Kleinberg has recently shown that G_2(F_2^n) (whence also G_3(F_2^n)) is equal to M(F_2^n) up to subexponential factors, and work in progress by Kleinberg and Speyer has shown this for several more q and seems likely to show that the bound is tight in general.  On the other hand, I have no idea whether to think G_1(F_q^n) is actually equal to M(F_q^n); i.e. is the bound proven by me and Dion sharp?

The behavior of G_1(N) is, of course, very much studied; we know by Behrend (recently sharpened by Elkin) that G_1(N) is at least N/exp(c sqrt(log N)).  Roth proved that G_1(N) = o(N), and the best bounds, due to Tom Sanders, show that G_1(N) is O(N(log log N)^5 / log N).  (Update:  Oops, no!  Thomas Bloom has an upper bound even a little better than Sanders, change that 5 to a 4.)

What about G_2(N) and G_3(N)?  I’m not sure how much people have thought about these problems.  But if, for instance, you could show (for example, by explicit constructions) that G_3(N) was closer to Sanders than to Behrend/Elkin, it would close off certain strategies for pushing the bound on G_1(N) downward. (Update:  Jacob Fox tells me that you can get an upper bound for G_2(N) of order N/2^{clog* N} from his graph removal paper, applied to the multicolored case.)

Do we think that G_2(N) and G_3(N) are basically equal, as is now known to be the case for F_q^n?

## Sumsets and sumsets of subsets

Say that ten times fast!
Now that you’re done, here’s an interesting fact.  I have been turning over this argument of Croot-Lev-Pach and mine and Gijswijt’s for a couple of weeks now, trying to understand what it’s really doing that leads to control of subsets of F_q^n without arithmetic progressions.

It turns out that there’s a nice refinement of what we prove, which somehow feels like it’s using more of the full strength of the Croot-Lev-Pach lemma.  The critical input is an old result of Roy Meshulam on linear spaces of low-rank matrices.

So here’s a statement.  Write M(q,n) for the CLP/EG upper bound on subsets of F_q^n with no three-term AP.

Then Theorem:  every subset S of F_q^n contains a subset S’ of size at most M(q,n) such that S’+S = S+S.

(Exercise:   Show that this immediately implies the bound on subsets with no three-term AP.)

I find this result very funny, so much so that I didn’t believe it at first, but I think the proof is right..!  Well, see for yourself, here it is.

Two natural questions:  is the bound on S’ sharp?  And is there any analogue of this phenomenon for the integers?

Update:  Of course right after I post this I realize that maybe this can be said more simply, without the invocation of Meshulam’s result (though I really like that result!)  Namely:  it’s equivalent to say that if |S| > M(q,n), you can remove ONE element from S and get an S’ with S’+S = S+S.  Why is this so?  Well, suppose not.  Choose some s_1.  We know it can’t be removed, so there must be some s_1 + s’_1 which is not expressible as a sum in S+T any other way.  The same applies to s_2, s_3, and so on.  So you end up with a set U of “unique sums” s_i + s’_i.  Now you can apply the CLP/EG argument directly to this situation; let P be a polyomial vanishing off U, this makes the matrix P(s+t) on S have a single 1 in each row and each column, and this is just as good as diagonal from the point of view of the argument in EG, so you can conclude just as there that |S| <= M(q,n).  Does that make sense?  This is the same spirit in which the polynomial method is used by Blasiak-Church-Cohn-Grochow-Umans to control multicolored sum-free sets, and the multicolored sum-free set of size (2^(4/3))^n constructed by Alon, Shpilka, and Umans also gives a lower bound for the problem under discussion here.

I still like the one-step argument in the linked .pdf better!  But I have to concede that you can prove this fact without doing any fancy linear algebra.

Update to Update (Jun 9):  Actually, I’m not so sure this argument above actually proves the theorem in the linked note.  So maybe you do need to (get to!) use this Meshulam paper after all!  What do you guys think?

Update:  The bound is sharp, at least over F_2!  I just saw this paper of Robert Kleinberg, which constructs a multicolored sum-free set in F_2^n of size just under M(2,n)!  That is, he gives you subsets S and T, both of size just under M(2,n), such that S’+T union S+T’ can’t be all of S+T if S’ and T’ are smaller than (1/2)S and (1/2)T, if I worked this out right.

The construction, which is actually based on one from 2014 by Fu and Kleinberg, actually uses a large subset of a cyclic group Z/MZ, where M is about M(2,n), and turns this into a multicolored sum-free set in (F_2)^n of (about) the same size.  So the difference between the upper bound and the lower bound in the (F_2)^n case is now roughly the same as the difference between the (trivial) upper bound and the lower bound in the case of no-three-term-AP sets in the interval.  Naturally you start to wonder:  a) Does the Fu-Kleinberg construction really have to do with characteristic 2 or is it general?  (I haven’t read it yet.)  b) Can similar ideas be used to construct large 3-AP-free subsets of (F_q)^n?  (Surely this has already been tried?) c) Is there a way to marry Meshulam’s Fourier-analytic argument with the polynomial method to get upper bounds on order (1/n)M(q,n)?  I wouldn’t have thought this worthwhile until I saw this Kleinberg paper, which makes me think maybe it’s not impossible to imagine we’re getting closer to the actual truth.

## Bounds for cap sets

Briefly:  it seems to me that the idea of the Croot-Lev-Pach paper I posted about yesterday can indeed be used to give a new bound on the size of subsets of F_3^n with no three-term arithmetic progression! Such a set has size at most (2.756)^n. (There’s actually a closed form for the constant, I think, but I haven’t written it down yet.)

Here’s the preprint. It’s very short. I’ll post this to the arXiv in a day or two, assuming I (or you) don’t find anything wrong with it, so comment if you have comments! Note: I’ve removed the link, since the official version of this result is now the joint paper by me and Gijswijt, and the old version shouldn’t be cited.

Update:  Busy few days of administrative stuff and travel, sorry for not having updated the preprint yet, will try to finish it today.  One note, already observed below in the comments:  you get a similar bound for subsets of (F_q)^n free of solutions to (ax+by+cz=0) for any (a,b,c) with a+b+c=0; the cap set case is q=3, (a,b,c) = (1,1,1).

Update 2:  Dion Gijswijt and I will be submitting this result as a joint paper, which will amalgamate the presentations of our essentially identical arguments.  Dion carried out his work independently of mine at around the same time, and the idea should be credited to both of us.  Our joint paper is available on the arXiv.

## Croot-Lev-Pach on AP-free sets in (Z/4Z)^n

As you know I love the affine cap problem:  how big can a subset of (Z/3Z)^n be that contains no three elements summing to 0 — or, in other words, that contains no 3-term arithmetic progression?  The best upper bounds, due to Bateman and Katz, are on order 3^n / n^(1+epsilon).  And I think it’s fair to say that all progress on this problem, since Meshulam’s initial results, have come from Fourier-analytic arguments.

So I’m charmed by this paper of Ernie Croot, Vsevolod Lev, and Peter Pach which proves a much stronger result for A = (Z/4Z)^n:  a subset with no 3-term arithmetic progression has size at most c^n for c strictly less than 4.  Better still (for an algebraic geometer) the argument has no harmonic analysis at all, but proceeds via the polynomial method!

This is surprising for two reasons.  First, it’s hard to make the polynomial method work well for rings, like Z/4Z, that aren’t fields; extending our knowledge about additive combinatorics to such settings is a long-standing interest of mine.  Second, the polynomial method over finite fields usually works in the “fixed dimension large field” regime; problems like affine cap, where the base ring is fixed and the dimension are growing, have so far been mostly untouched.

As for the first issue, here’s the deal.  This looks like a problem over Z/4Z but is really a problem over F_2, because the condition for being a 3-term AP

a – 2b + c = 0

has a 2 in it.  In other words:  the two outer terms have to lie in the same coset of 2A, and the middle term is only determined up to 2A.

So CLP recast the problem as follows.  Let S be a large subset of A with no 3-term AP.   Let V be 2A, which is an n-dimensional vector space over F_2.  For each v in V, there’s a coset of V consisting of the solutions to 2a = v, and we can let S_v be the intersection of S with this coset.

We want to make this a problem about V, not about A.  So write T_v for a translate of S_v by some element of the coset, so T_v now sits in V.  Which element?  Doesn’t matter!

We can now write the “no 3-term AP” condition strictly in terms of these subsets of V.  Write (T_v – T_v)^* for the set of differences between distinct elements of T_v.  Write U for the set of v in V such that T_v is nonempty.  Then the union over all v in U of

(T_v – T_v)^* + v

is disjoint from U.

I leave it as an exercise to check the equivalence.

Now we have a combinatorial question about vector spaces over F_2; we want to show that, under the condition above, the sum of |T_v| over all v in U can’t be too large.

This is where the polynomial method comes in!  CLP show that (over any field, not just F_2), a polynomial of low degree vanishing on (T_v – T_v)^* has to vanish at 0 as well; this is Lemma 1 in their paper.  So write down a polynomial P vanishing on V – U; by dimension considerations we can choose one which doesn’t vanish on all of V.  (This uses the fact that the squarefree monomials of degree up to d are linearly independent functions on F_2^n.)  If U is big, we can choose P to have lowish degree.

Since P vanishes on V-U, P has to vanish on (T_v – T_v)^* + v for all v.  Since P has low degree, it has to vanish on v too, for all v.  But then P vanishes everywhere, contrary to our assumption.

The magic of the paper is in Lemma 1, in my view, which is where you really see the polynomial method applied in this unusual fixed-field-large-dimension regime.  Let me say a vague word about how it works.  (The actual proof is less than a page, by the way, so I’m not hiding much!)  Let P be your polynomial and d its degee.  You send your vector space into a subvariety of a much larger vector space W via degree-d Veronese embedding F_d. In fact you do this twice, writing

V x V -> W x W.

Now if P is your polynomial of degree-d, you can think of P(v_1 – v_2) as a bilinear form <,> on W x W.  Suppose S is a subset of V such that P(s_1 – s_2) vanishes for all distinct s_1, s_2 in S.   That means

<F_d(s_1), F_d(s_2)> = 0

for all distinct s_1, s_2 in S.  On the other hand,

<F_d(s_1), F_d(s_1)>

doesn’t depend on s_1; it just takes the value P(0).  So if P(0) is not equal to 0, you have |S| vectors of nonzero norm which are mutually orthogonal under this bilinear form, and so there can be at most dim W of these, and that’s the bound on |S| you need.

This is very slick and I hope the idea is more generally applicable!

## Bourgain, Gamburd, Sarnak on Markoff triples

Such a great colloquium last week by Peter Sarnak, this year’s Hilldale Lecturer, on his paper with Bourgain and Gamburd.  My only complaint is that he promised to talk about the mapping class group and then barely did!  So I thought I’d jot down what their work has to do with mapping class groups and spaces of branched covers.

Let E be a genus 1 Riemann surface — that is, a torus — and O a point of E.  Then pi_1(E-O) is just a free group on two generators, whose commutator is (the conjugacy class of) a little loop around the puncture.  If G is a group, a G-cover of E branched only at O is thus a map from pi_1(E-O) to G, which is to say a pair (a,b) of elements of G.  Well, such a pair considered up to conjugacy, since we didn’t specify a basepoint for our pi_1.  And actually, we might as well just think about the surjective maps, which is to say the connected G-covers.

Let’s focus on the case G = SL_2(Z).  And more specifically on those maps where the puncture class is sent to a matrix of trace -2.  Here’s an example:  we can take

$a_0 = \left[ \begin{array}{cc} 1 & 1 \\ 1 & 2 \end{array} \right]$

$b_0 = \left[ \begin{array}{cc} 2 & 1 \\ 1 & 1 \end{array} \right]$

You can check that in this case the puncture class has trace -2; that is, it is the negative of a unipotent matrix.  Actually, I gotta be honest, these matrices don’t generate SL_2(Z); they generate a finite-index subgroup H of SL_2(Z), its commutator.

Write S for the set of all conjugacy classes of pairs (a,b) of matrices which generate H and have commutator with trace -2.  It turns out that this set is the set of integral points of an affine surface called the Markoff surface:  namely, if we take x = Tr(a)/3, y = Tr(b)/3, and z = Tr(ab)/3, then the three traces obey the relation

$x^2 + y^2 + z^2 = 3xyz$

and indeed every solution to this equation corresponds to an element of S.

So the integral points on the Markoff surface are acted on by an infinite discrete group.  Which if you just look at the equation seems like kind of a ridiculous miracle.  But in the setting of H-covers is very natural.  Because there’s a natural group acting on S: namely, the mapping class group Γ of type (1,1).  This group’s whole purpose in life is to act on the fundamental group of a once-punctured torus!  (For readers unfamiliar with mapping class groups, I highly recommend Benson Farb and Dan Margalit’s wonderful textbook.)   So you start with a surjection from pi_1(E-O) to H, you compose with the action of  Γ, and you get a new homomorphism.  The action of  Γ on pi_1(E-O) is only outer, but that’s OK, because we’re only keeping track of conjugacy classes of homomorphisms from pi_1(E-O) to H.

So Γ acts on S; and now the lovely theorem is that this action is transitive.

I don’t want to make this mapping class group business sound more abstract than it is.  Γ isn’t a mystery group; it acts on H_1(E-O), a free abelian group of rank 2, which gives a map from Γ to SL_2(Z), which turns out to be an isomorphism.  What’s more, the action of Γ on pairs (a,b) is completely explicit; the standard unipotent generators of SL_2(Z) map to the moves

(a,b) -> (ab,b)

(a,b) -> (a,ab)

(Sanity check:  each of these transformations preserves the conjugacy class of the commutator of a and b.)

Sarnak, being a number theorist, is interested in strong approximation: are the integral solutions of the Markoff equation dense in the adelic solutions?   In particular, if I have a solution to the Markoff equation over F_p — which is to say, a pair (a,b) in SL_2(F_p) with the right commutator — can I lift it to a solution over Z?

Suppose I have a pair (a,b) which lifts to a pair (a,b).  We know (a,b) = g(a_0,b_0) for some g in Γ.  Thus (a,b) = g(a_0,b_0).  In other words, if strong approximation is true, Γ acts transitively on the set S_p of Markoff solutions mod p.  And this is precisely what Bourgain, Gamburd, and Sarnak conjecture.  (In fact, they conjecture more:  that the Cayley-Schreier graph of this action is an expander, which is kind of a quantitative version of an action being transitive.)  One reason to believe this:  if we replace F_p with C, we replace S with the SL_2(C) character variety of pi_1(E-O), and Goldman showed long ago that the action of mapping class groups on complex character varieties of fundamental groups was ergodic; it mixes everything around very nicely.

Again, I emphasize that this is on its face a question of pure combinatorial group theory.  You want to know if you can get from any pair of elements in SL_2(F_p) with negative-unipotent commutator to any other via the two moves above.  You can set this up on your computer and check that it holds for lots and lots of p (they did.)  But it’s not clear how to prove this transitivity for all p!

They’re not quite there yet.  But what they can prove is that the action of Γ on S_p has a very big orbit, and has no very small orbits.

Now that G is the finite group SL_2(F_p), we’re in my favorite situation, that of Hurwitz spaces.  The mapping class group Γ is best seen as the fundamental group of the moduli stack M_{1,1} of elliptic curves.  So an action of Γ on the finite set S_p is just a cover H_p of M_{1,1}.  It is nothing but the Hurwitz space parametrizing maps (f: X -> E) where E is an elliptic curve and f an SL_2(F_p)-cover branched only at the origin.  What Bourgain, Gamburd, and Sarnak conjecture is that H_p is connected.

If you like, this is a moduli space of curves with nonabelian level structure as in deJong and Pikaart.  Or, if you prefer (and if H_p is actually connected) it is a noncongruence modular curve corresponding to the stabilizer of an element of S_p in Γ = SL_2(Z).  This stabilizer is in general going to be a noncongruence subgroup, except it is a congruence subgroup in the more general sense of Thurston.

This seems like an interesting family of algebraic curves!  What, if anything, can we say about it?

## LMFDB!

Very happy to see that the L-functions and Modular Forms Database is now live!

When I was a kid we looked up our elliptic curves in Cremona’s tables, on paper.  Then William Stein created the Modular Forms Database (you can still go there but it doesn’t really work) and suddenly you could look at the q-expansions of cusp forms in whatever weight and level you wanted, up to the limits of what William had computed.

The LMFDB is a sort of massively souped up version of Cremona and Stein, put together by a team of dozens and dozens of number theorists, including too many friends of mine to name individually.  And it’s a lot more than what the title suggests:  the incredibly useful Jones-Roberts database of local fields is built in; there’s a database of genus 2 curves over Q with small conductor; it even has Maass forms!  I’ve been playing with it all night.  It’s like an adventure playground for number theorists.

One of my first trips through Stein’s database came when I was a postdoc and was thinking about Ljunggren’s equation y^2 + 1 = 2x^4.  This equation has a large solution, (13,239), which has to do with the classical identity

$\pi/4 = 4\arctan(1/5) - \arctan(1/239)$.

It turns out, as I explain in an old survey paper, that the existence of such a large solution is “explained” by the presence of a certain weight-2 cuspform in level 1024 whose mod-5 Galois representation is reducible.

With the LMFDB, you can easily wander around looking for more such examples!  For instance, you can very easily ask the database for non-CM elliptic curves whose mod-7 Galois representation is nonsurjective.  Among those, you can find this handsome curve of conductor 1296, which has very large height relative to its conductor.  Applying the usual Frey curve trick you can turn this curve into the Diophantine oddity

3*48383^2 – (1915)^3 = 2^13.

Huh — I wonder whether people ever thought about this Diophantine problem, when can the difference between a cube and three times a square be a power of 2?  Of course they did!  I just Googled

48383 Diophantine

and found this paper of Stanley Rabinowitz from 1978, which finds all solutions to that problem, of which this one is the largest.

Now whether you can massage this into an arctan identity, that I don’t know!

## Smectic crystals, fingerprints, Klein bottles

Amazing colloquium this week by Randall Kamien, who talked about this paper with Chen and Alexander, this one with Liarte, Bierbaum, Mosna, and Sethna, and other stuff besides.

I’ve been thinking about his talk all weekend and I’m just going to write down a bit about what I learned.  In a liquid crystal, the molecules are like little rods; they have an orientation and nearby molecules want to have nearby orientations.  In a nematic crystal, that’s all that’s going on — the state of the crystal in some region B is given by a line field on B.   A smectic crystal has a little more to it — here, the rods are aligned into layers

(image via this handy guide to liquid crystal phases)

separated by — OK, I’m not totally clear on whether they’re separated by a sheet of something else or whether that’s just empty space.  Doesn’t matter.  The point is, this allows you to tell a really interesting topological story.  Let’s focus on a smectic crystal in a simply connected planar region B.   At every point of B, you have, locally, a structure that looks like a family of parallel lines in the plane, each pair of lines separated by a unit distance.  (The unit is the length of the molecule, I think.)

Alternatively, you can think of such a “local smectic structure” as a line in the plane, where we consider two lines equivalent if they are parallel and the distance between them is an integer.  What’s the moduli space M — the “ground state manifold” — of such structures?    Well, the line family has a direction, so you get a map from M to S^1.  The lines in a given direction are parametrized by a line, and the equivalence relation mods out by the action of a lattice, so the fiber of M -> S^1 is a circle; in fact, it’s not hard to see that this surface M is a Klein bottle.

Of course this map might be pretty simple.  If B is the whole plane, you can just choose a family of parallel lines on B, which corresponds to the constant map.  Or you can cover the plane with concentric circles; the common center doesn’t have a smectic structure, and is a defect, but you can map B = R^2 – 0 to M.  Homotopically, this just gives you a path in M, i.e. an element of pi_1(M), which is a semidirect product of Z by Z, with presentation

$\langle S,F: FSF^{-1} = S^{-1} \rangle$

The concentric circle smectic corresponds the map which sends the generator of pi_1(B) to F.

So already this gives you a nice topological invariant of a plane smectic with k defects; you get a map from pi_1(B), which is a free group on k generators, to pi_1(M).  Note also that there’s a natural notion of equivalence on these maps; you can “stir” the smectic, which is to say, you can apply a diffeomorphism of the punctured surface, which acts by precomposition on pi_1(B).  The action of (the connected components of) Diff(B) on Hom(pi_1(B), pi_1(M)) is my favorite thing; the Hurwitz action of a mapping class group on the space of covers of a Riemann surface!  In particular I think the goal expressed in Chen et al’s paper of “extending our work to the topology of such patterns on surfaces of nontrivial topology (rather than just the plane)” will certainly involve this story.  I think in this case the Hurwitz orbits are pretty big; i.e. if what you know is the local appearance of the defects (i.e. the image in pi_1(M) of the conjugacy class in pi_1(B) corresponding to the puncture) you should almost be able to reconstruct the homotopy type of the map (up to stirring.)  If I understood Randy correctly, those conjugacy classes are precisely what you can actually measure in an experiment.

There’s more, though — a lot more!  You can’t just choose a map from B to M and make a smectic out of it.  The layers won’t line up!  There’s a differential criterion.  This isn’t quite the way they express it, but I think it amounts to the following:  the tangent bundle of M has a natural line bundle L sitting inside it, consisting of those directions of motion that move a line parallel to itself.  I think you want to consider only those maps from B to M such that the induced map on tangent bundles TB -> TM takes image in L.  More concretely, in coordinates, I think this means the following:  if you think of the local smectic structure at p as the preimage of Z under some real-valued function f in the neighborhood of p, then f should satisfy

$(df/dx)^2 + (df/dy)^2 = 1.$

This restricts your maps a lot, and it accounts for all kinds of remarkable behavior.  For one thing, it forbids certain conjugacy classes in pi_1(M) from appearing as local monodromy; i.e. the set of possible defect types is strictly smaller than the set of conjugacy classes in pi_1(M).  Moreover, it forbids certain kinds of defects from colliding and coalescing — for algebraic geometers, this naturally makes you feel like there’s a question about boundaries of Hurwitz spaces floating around.

Best of all, the differential equation forces the appearance of families of parallel ellipses, involute spirals, and other plane curves of an 18th century flavor.  The cyclides of Dupin put in an appearance.  Not just in the abstract — in actual liquid crystals!  There are pictures!  This is great stuff.

Update:  Wait a minute — I forgot to say anything about fingerprints!  Maybe because I don’t have anything to say at the moment.  Except that the lines of a fingerprint are formally a lot like the lines of a smectic crystal, the defects can be analyzed in roughly the same way, etc.  Whether the diffeomorphism type of a fingerprint is an interesting forensic invariant I don’t rightly know.  I’ll bet whoever made my iPhone home button knows, though.