## When random people give money to random other people

A post on Decision Science about a problem of Uri Wilensky‘s has been making the rounds:

Imagine a room full of 100 people with 100 dollars each. With every tick of the clock, every person with money gives a dollar to one randomly chosen other person. After some time progresses, how will the money be distributed?

People often expect the distribution to be close to uniform.  But this isn’t right; the simulations in the post show clearly that inequality of wealth rapidly appears and then persists (though each individual person bobs up and down from rich to poor.)  What’s going on?  Why would this utterly fair and random process generate winners and losers?

Here’s one way to think about it.  The possible states of the system are the sets of nonnegative integers (m_1, .. m_100) summing to 10,000; if you like, the lattice points inside a simplex.  (From now on, let’s write N for 100 because who cares if it’s 100?)

The process is a random walk on a graph G, whose vertices are these states and where two vertices are connected if you can get from one to the other by taking a dollar from one person and giving it to another.  We are asking:  when you run the random walk for a long time, where are you on this graph?  Well, we know what the stationary distribution for random walk on an undirected graph is; it gives each vertex a probability proportional to its degree.  On a regular graph, you get uniform distribution.

Our state graph G isn’t regular, but it almost is; most nodes have degree N, where by “most” I mean “about 1-1/e”; since the number of states is

$N^2 + N - 1 \choose N-1$

and, of these, the ones with degree N are exactly those in which nobody’s out of money; if each person has a dollar, the number of ways to distribute the remaining N^2 – N dollars is

$N^2 - 1 \choose N-1$

and so the proportion of states where someone’s out of money is about

$\frac{(N^2 - 1)^N}{(N^2 + N - 1)^N} \sim (1-1/N)^N \sim 1/e$.

So, apart from those states where somebody’s broke, in the long run every possible state is equally likely;  we are just as likely to see $9,901 in one person’s hands and everybody else with$1 as we are to see exact equidistribution again.

What is a random lattice point in this simplex like?  Good question!  An argument just like the one above shows that the probability nobody goes below \$c is on order e^-c, at least when c is small relative to N; in other words, it’s highly likely that somebody’s very nearly out of money.

If X is the maximal amount of money held by any player, what’s the distribution of X?  I didn’t immediately see how to figure this out.  You might consider the continuous version, where you pick a point at random from the real simplex

$(x_1, .. x_N) \in \mathbf{R}^N: \sum x_i = N^2$.

Equivalently; break a stick at N-1 randomly chosen points; what is the length of the longest piece?  This is a well-studied problem; the mean size of the longest piece is about N log N.  So I guess I think maybe that’s the expected value of the net worth of the richest player?

But it’s not obvious to me whether you can safely approximate the finite problem by its continuous limit (which corresponds to the case where we keep the number of players at N but reduce the step size so that each player can give each other a cent, or a picocent, or whatever.)

What happens if you give each of the N players just one dollar?  Now the uniformity really breaks down, because it’s incredibly unlikely that nobody’s broke.  The probability distribution on the set of (m_1, .. m_N) summing to N assigns each vector a probability proportional to the size of its support (i.e. the number of m_i that are nonzero.)  That must be a well-known distribution, right?  What does the corresponding distribution on partitions of N look like?

Update:  Kenny Easwaran points out that this is basically the same computation physicists do when they compute the Boltzmann distribution, which was new to me.

Tagged , ,

## Rational points on solvable curves over Q via non-abelian Chabauty (with Daniel Hast)

New paper up!  With my Ph.D. student Daniel Hast (last seen on the blog here.)

We prove that hyperelliptic curves over Q of genus at least 2 have only finitely many rational points.  Actually, we prove this for a more general class of high-genus curves over Q, including all solvable covers of P^1.

But wait, don’t we already know that, by Faltings?  Of course we do.  So the point of the paper is to show that you can get this finiteness in a different way, via the non-abelian Chabauty method pioneered by Kim.  And I think it seems possible in principle to get Faltings for all curves over Q this way; though I don’t know how to do it.

## Multiple height zeta functions?

Idle speculation ensues.

Let X be a projective variety over a global field K, which is Fano — that is, its anticanonical bundle is ample.  Then we expect, and in lots of cases know, that X has lots of rational points over K.  We can put these points together into a height zeta function

$\zeta_X(s) = \sum_{x \in X(K)} H(x)^{-s}$

where H(x) is the height of x with respect to the given projective embedding.  The height zeta function organizes information about the distribution of the rational points of X, and which in favorable circumstances (e.g. if X is a homogeneous space) has the handsome analytic properties we have come to expect from something called a zeta function.  (Nice survey by Chambert-Loir.)

What if X is a variety with two (or more) natural ample line bundles, e.g. a variety that sits inside P^m x P^n?  Then there are two natural height functions H_1 and H_2 on X(K), and we can form a “multiple height zeta function”

$\zeta_X(s,t) = \sum_{x \in X(K)} H_1(x)^{-s} H_2(x)^{-t}$

There is a whole story of “multiple Dirichlet series” which studies functions like

$\sum_{m,n} (\frac{m}{n}) m^{-s} n^{-t}$

where $(\frac{m}{n})$ denotes the Legendre symbol.  These often have interesting analytic properties that you wouldn’t see if you fixed one variable and let the other move; for instance, they sometimes have finite groups of functional equations that commingle the s and the t!

So I just wonder:  are there situations where the multiple height zeta function is an “analytically interesting” multiple Dirichlet series?

Here’s a case to consider:  what if X is the subvariety of P^2 x P^2 cut out by the equation

$x_0 y_0 + x_1 y_1 + x_2 y_2 = 0?$

This has something to do with Eisenstein series on GL_3 but I am a bit confused about what exactly to say.

## What is the median length of homeownership?

Well, it’s longer than it used to be, per Conor Dougherty in the New York Times:

The median length of time people have owned their homes rose to 8.7 years in 2016, more than double what it had been 10 years earlier.

The accompanying chart shows that “median length of homeownership” used to hover at  just under 4 years.  That startled me!  Doesn’t 4 years seem like a pretty short length of time to own a house?

When I thought about this a little more, I realized I had no idea what this meant.  What is the “median length of homeownership” in 2017?  Does it mean you go around asking each owner-occupant how long they’ve lived in their house, and take the median of those numbers?  Probably not:  when people were asked that in 2008, the median answer was 10 years, and whatever the Times was measuring was about 3.7 years in 2008.

Does it mean you look at all house sales in 2017, subtract the time since last sale, and take the median of those numbers?

Suppose half of all houses changed hands every year, and the other half changed hands every thirty years.  Are the lengths of ownership we’re medianning half “one year” and half “30 years”, or “30/31 1 year” and 1/31 “30 years”?

There are about 75 million owner-occupied housing units in the US and 4-6 million homes sold per year, so the mean number of sales per unit per year is certainly way less than 1/4; of course, there’s no reason this mean should be close to the median of, well, whatever we’re taking the median of.

Basically I have no idea what’s being measured.  The Times doesn’t link to the Moody’s Analytics study it’s citing, and Dougherty says that study’s not public.  I did some Googling for “median length of homeownership” and as far as I can tell this isn’t a standard term of art with a consensus definition.

As papers run more data-heavy pieces I’d love to see a norm develop that there should be some way for the interested reader to figure out exactly what the numbers in the piece refer to.  Doesn’t even have to be in the main text.  Could be a linked sidebar.  I know not everybody cares about this stuff.  But I do!

## Fox-Neuwirth-Fuks cells, quantum shuffle algebras, and Malle’s conjecture for function fields

I’ve gotten behind on blogging about preprints!  Let me tell you about a new one I’m really happy with, joint with TriThang Tran and Craig Westerland, which we posted a few months ago.

Malle’s conjecture concerns the number of number fields with fixed Galois group and bounded discriminant, a question I’ve been interested in for many years now.  We recall how it goes.

Let K be a global field — that is, a number field or the function field of a curve over a finite field.  Any degree-n extension L/K (here L could be a field or just an etale algebra over K — hold that thought) gives you a homomorphism from Gal(K) to S_n, whose image we call, in a slight abuse of notation, the Galois group of L/K.

Let G be a transitive subgroup of S_n, and let N(G,K,X) be the number of degree-n extensions of K whose Galois group is G and whose discriminant has norm at most X.  Every permutation g in G has an index, which is just n – the number of orbits of g.  So the permutations of index 1 are the transpositions, those of index 2 are the three-cycles and the double-flips, etc.  We denote by a(G) the reciprocal of the minimal index of any element of G.  In particular, a(G) is at most 1, and is equal to 1 if and only if G contains a transposition.

(Wait, doesn’t a transitive subgroup of S_n with a transposition have to be the whole group?  No, that’s only for primitive permutation groups.  D_4 is a thing!)

Malle’s conjecture says that, for every $\epsilon > 0$, there are constants $c,c_\epsilon$ such that

$c X^{a(G)} < N(G,K,X) < c_\epsilon X^{a(G)+\epsilon}$

We don’t know much about this.  It’s easy for G = S_2.  A theorem of Davenport-Heilbronn (K=Q) and Datskovsky-Wright (general case) proves it for G = S_3.  Results of Bhargava handle S_4 and S_5, Wright proved it for abelian G.  I kind of think this new theorem of Alex Smith implies for K=Q and every dihedral G of 2-power order?  Anyway:  we don’t know much.  S_6?  No idea.  The best upper bounds for general n are still the ones I proved with Venkatesh a long time ago, and are very much weaker than what Malle predicts.

Malle’s conjecture fans will point out that this is only the weak form of Malle’s conjecture; the strong form doesn’t settle for an unspecified $X^\epsilon$, but specifies an asymptotic $X^a (log X)^b$.   This conjecture has the slight defect that it’s wrong sometimes; my student Seyfi Turkelli wrote a nice paper which I think resolves this problem, but the revised version of the conjecture is a bit messy to state.

Anyway, here’s the new theorem:

Theorem (E-Tran-Westerland):  Let G be a transitive subgroup of S_n.  Then for all q sufficiently large relative to G, there is a constant $c_\epsilon$ such that

$N(G,\mathbf{F}_q(t),X) < c_\epsilon X^{a(G)+\epsilon}$

for all X>0.

In other words:

The upper bound in the weak Malle conjecture is true for rational function fields.

1.  We are still trying to fix the mistake in our 2012 paper about stable cohomology of Hurwitz spaces.  Craig and I discussed what seemed like a promising strategy for this in the summer of 2015.  It didn’t work.  That result is still unproved.  But the strategy developed into this paper, which proves a different and in some respects stronger theorem!  So … keep trying to fix your mistakes, I guess?  There might be payoffs you don’t expect.
2. We can actually bound that $X^\epsilon$ is actually a power of log, but not the one predicted by Malle.
3. Is there any chance of getting the strong Malle conjecture?  No, and I’ll explain why.  Consider the case G=S_4.  Then a(G) = 1, and in this case the strong Malle’s conjecture predicts N(S_4,K,X) is on order X, not just X^{1+eps}.   But our method doesn’t really distinguish between quartic fields and other kinds of quartic etale algebras.  So it’s going to count all algebras L_1 x L_2, where L_1 and L_2 are quadratic fields with discriminants X_1 and X_2 respectively, with X_1 X_2 < X.  We already know there’s approximately one quadratic field per discriminant, on average, so the number of such algebras is about the number of pairs (X_1, X_2) with X_1 X_2 < X, which is about X log X.  So there’s no way around it:  our method is only going to touch weak Malle.  Note, by the way, that for quartic extensions, the strong Malle conjecture was proved by Bhargava, and he observes the same phenomenon:

…inherent in the zeta function is a sum over all etale extensions” of Q, including the “reducible” extensions that correspond to direct sums of quadratic extensions. These reducible quartic extensions far outnumber the irreducible ones; indeed, the number of reducible quartic extensions of absolute discriminant at most X is asymptotic to X log X, while we show that the number of quartic field extensions of absolute discriminant at most X is only O(X).

4.  I think there is, on the other hand, a chance of getting rid of the “q sufficiently large relative to G” condition and proving something for a fixed F_q(t) and all finite groups G.

OK, so how did we prove this?

## Mathematicians becoming data scientists: Should you? How to?

I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter.  I asked her:  so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry?  How would you know whether you might find that kind of work enjoyable?  And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?

Sarah exceeded my expectations by miles and wrote the following extremely informative and thorough tip sheet, which she’s given me permission to share.  Take it away, Sarah!

Tagged , , ,

## Braid monodromy and the dual curve

Nick Salter gave a great seminar here about this paper; hmm, maybe I should blog about that paper, which is really interesting, but I wanted to make a smaller point here.  Let C be a smooth curve in P^2 of degree n. The lines in P^2 are parametrized by the dual P^2; let U be the open subscheme of the dual P^2 parametrizing those lines which are not tangent to C; in other words, U is the complement of the dual curve C*.  For each point u of U, write L_u for the corresponding line in P^2.

This gives you a fibration X -> U where the fiber over a point u in U is L_u – (L_u intersect C).  Since L_u isn’t tangent to C, this fiber is a line with n distinct points removed.  So the fibration gives you an (outer) action of pi_1(U) on the fundamental group of the fiber preserving the puncture classes; in other words, we have a homomorphism

$\pi_1(U) \rightarrow B_n$

where B_n is the n-strand braid group.

When you restrict to a line L* in U (i.e. a pencil of lines through a point in the original P^2) you get a map from a free group to B_n; this is the braid monodromy of the curve C, as defined by Moishezon.  But somehow it feels more canonical to consider the whole representation of pi_1(U).  Here’s one place I see it:  Proposition 2.4 of this survey by Libgober shows that if C is a rational nodal curve, then pi_1(U) maps isomorphically to B_n.  (OK, C isn’t smooth, so I’d have to be slightly more careful about what I mean by U.)

## Difference sets missing a Hamming sphere

I tend to think of the Croot-Lev-Pach method (as used, for instance, in the cap set problem) as having to do with n-tensors, where n is bigger than 2.  But you can actually also use it in the case of 2-tensors, i.e. matrices, to say things that (as far as I can see) are not totally trivial.

Write m_d for the number of squarefree monomials in x_1, .. x_n of degree at most d; that is,

$m_d = 1 + {n \choose 1} + {n \choose 2} + \ldots + {n \choose d}.$

Claim:  Let P be a polynomial of degree d in F_2[x_1, .. x_n] such that P(0) = 1.  Write S for the set of nonzero vectors x such that P(x) = 1.  Let A be a subset of F_2^n such that no two elements of A have difference lying in S.  Then |A| < 2m_{d/2}.

Proof:  Write M for the A x A matrix whose (a,b) entry is P(a-b).  By the Croot-Lev-Pach lemma, this matrix has rank at most 2m_{d/2}.  By hypothesis on A, M is the identity matrix, so its rank is |A|.

Remark: I could have said “sum” instead of “difference” since we’re in F_2 but for larger finite fields you really want difference.

The most standard context in which you look for large subsets of F_2^n with restricted difference sets is that of error correcting codes, where you ask that no two distinct elements of A have difference with Hamming weight (that is, number of 1 entries) at most k.

It would be cool if the Croot-Lev-Pach lemma gave great new bounds on error-correcting codes, but I don’t think it’s to be.  You would need to find a polynomial P which vanishes on all nonzero vectors of weight larger than k, but which doesn’t vanish at 0. Moreover, you already know that the balls of size k/2 around the points of A are disjoint, which gives you the “volume bound”

|A| < 2^n / m_{k/2}.

I think that’ll be hard to beat.

If you just take a random polynomial P, the support of P will take up about half of F_2^n; so it’s not very surprising that a set whose difference misses that support has to be small!

Here’s something fun you can do, though.  Let s_i be the i-th symmetric function on x_1, … x_n.  Then

$s_i(x) = {wt(x) \choose i}$

where wt(x) denotes Hamming weight.  Recall also that the binomial coefficient

${k \choose 2^a}$

is odd precisely when the a’th binary digit of k is 1.

Thus,

$(1-s_1(x))(1-s_2(x))(1-s_4(x))\ldots(1-s_{2^{b-1}}(x))$

is a polynomial of degree 2^b-1 which vanishes on x unless the last b digits of wt(x) are 0; that is, it vanishes unless wt(x) is a multiple of 2^b.  Thus we get:

Fact:  Let A be a subset of F_2^n such that the difference of two nonzero elements in A never has weight a multiple of 2^b.  Then

$|A| \leq 2m_{2^{b-1} - 1}$.

Note that this is pretty close to sharp!  Because if we take A to be the set of vectors of  weight at most 2^{b-1} – 1, then A clearly has the desired property, and already that’s half as big as the upper bound above.  (What’s more, you can throw in all the vectors of weight 2^{b-1} whose first coordinate is 1; no two of these sum to something of weight 2^b.  The Erdös-Ko-Rado theorem says you can do no better with those weight 2^{b-1} vectors.)

Is there an easier way to prove this?

When b=1, this just says that a set with no differences of even Hamming weight has size at most 2; that’s clear, because two vectors whose Hamming weight has the same parity differ by a vector of even weight.  Even for b=2 this isn’t totally obvious to me.  The result says that a subset of F_2^n with no differences of weight divisible by 4 has size at most 2+2n.  On the other hand, you can get 1+2n by taking 0, all weight-1 vectors, and all weight-2 vectors with first coordinate 1.  So what’s the real answer, is it 1+2n or 2+2n?

Write H(n,k) for the size of the largest subset of F_2^n having no two vectors differing by a vector of Hamming weight exactly k.  Then if 2^b is the largest power of 2 less than n, we have shown above that

$m_{2^{b-1} - 1 } \leq H(n,2^b) \leq 2m_{2^{b-1} - 1}$.

On the other hand, if k is odd, then H(n,k) = 2^{n-1}; we can just take A to be the set of all even-weight vectors!  So perhaps H(n,k) actually depends on k in some modestly interesting 2-adic way.

The sharpness argument above can be used to show that H(4m,2m) is as least

$2(1 + 4m + {4m \choose 2} + \ldots + {4m \choose m-1} + {4m-1 \choose m-1}). (*)$

I was talking to Nigel Boston about this — he did some computations which make it looks like H(4m,2m) is exactly equal to (*) for m=1,2,3.  Could that be true for general m?

(You could also ask about sets with no difference of weight a multiple of k; not sure which is the more interesting question…)

Update:  Gil Kalai points out to me that much of this is very close to and indeed in some parts a special case of the Frankl-Wilson theorem…  I will investigate further and report back!

## Prime subset sums

Efrat Bank‘s interesting number theory seminar here before break was about sums of arithmetic functions on short intervals in function fields.  As I was saying when I blogged about Hast and Matei’s paper, a short interval in F_q[t] means:  the set of monic degree-n polynomials P such that

deg(P-P_0) < h

for some monic degree-n P_0 and some small h.  Bank sets this up even more generally, defining an interval in the space V of global sections of a line bundle on an arbitrary curve over F_q.  In Bank’s case, by contrast with the number field case, an interval is an affine linear subspace of some ambient vector space of forms.  This leads one to wonder:  what’s special about these specific affine spaces?  What about general spaces?

And then one wonders:  well, what classical question over Z does this correspond to?  So here it is:  except I’m not sure this is a classical question, though it sort of seems like it must be.

Question:  Let c > 1 be a constant.  Let A be a set of integers with |A| = n and max(A) < c^n.  Let S be the (multi)set of sums of subsets of A, so |S| = 2^n.  What can we say about the number of primes in S?  (Update:  as Terry points out in comments, I need some kind of coprimality assumption; at the very least we should ask that there’s no prime factor common to everything in A.)

I’d like to say that S is kind of a “generalized interval” — if A is the first n powers of 2, it is literally an interval.  One can also ask about other arithmetic functions:  how big can the average of Mobius be over S, for instance?  Note that the condition on max(S) is important:   if you let S get as big as you want, you can make S have no primes or you can make S be half prime (thanks to Ben Green for pointing this out to me.)  The condition on max(S) can be thought of as analogous to requiring that an interval containing N has size at least some fixed power of N, a good idea if you want to average arithmetic functions.

Anyway:  is anything known about this?  I can’t figure out how to search for it.

## Women in math: accountability

I’ve talked about women in math a lot on this blog and maybe you think of me as someone who is aware of and resistant to sexism in our profession.  But what if we look at some actual numbers?

My Ph.D. students:  2 out of 15 are women.

Coauthors, last 5 years: 2 out of 23 are women.

Letters posted on MathJobs, last 2 years:  3 out of 24 are women.

That is sobering.  I’m hesitant about posting this, but I think it’s a good idea for senior people to look at their own numbers and get some sense of how much they’re actually doing to support early-career women in the profession.

Update:  I removed the numbers for tenure/promotion letters.  A correspondent pointed out that these, unlike the other items, are supposed to be confidential, and given the small numbers are at least partially de-anonymizable.

Tagged