## When random people give money to random other people

A post on Decision Science about a problem of Uri Wilensky‘s has been making the rounds:

Imagine a room full of 100 people with 100 dollars each. With every tick of the clock, every person with money gives a dollar to one randomly chosen other person. After some time progresses, how will the money be distributed?

People often expect the distribution to be close to uniform.  But this isn’t right; the simulations in the post show clearly that inequality of wealth rapidly appears and then persists (though each individual person bobs up and down from rich to poor.)  What’s going on?  Why would this utterly fair and random process generate winners and losers?

Here’s one way to think about it.  The possible states of the system are the sets of nonnegative integers (m_1, .. m_100) summing to 10,000; if you like, the lattice points inside a simplex.  (From now on, let’s write N for 100 because who cares if it’s 100?)

The process is a random walk on a graph G, whose vertices are these states and where two vertices are connected if you can get from one to the other by taking a dollar from one person and giving it to another.  We are asking:  when you run the random walk for a long time, where are you on this graph?  Well, we know what the stationary distribution for random walk on an undirected graph is; it gives each vertex a probability proportional to its degree.  On a regular graph, you get uniform distribution.

Our state graph G isn’t regular, but it almost is; most nodes have degree N, where by “most” I mean “about 1-1/e”; since the number of states is

$N^2 + N - 1 \choose N-1$

and, of these, the ones with degree N are exactly those in which nobody’s out of money; if each person has a dollar, the number of ways to distribute the remaining N^2 – N dollars is

$N^2 - 1 \choose N-1$

and so the proportion of states where someone’s out of money is about

$\frac{(N^2 - 1)^N}{(N^2 + N - 1)^N} \sim (1-1/N)^N \sim 1/e$.

So, apart from those states where somebody’s broke, in the long run every possible state is equally likely;  we are just as likely to see $9,901 in one person’s hands and everybody else with$1 as we are to see exact equidistribution again.

What is a random lattice point in this simplex like?  Good question!  An argument just like the one above shows that the probability nobody goes below $c is on order e^-c, at least when c is small relative to N; in other words, it’s highly likely that somebody’s very nearly out of money. If X is the maximal amount of money held by any player, what’s the distribution of X? I didn’t immediately see how to figure this out. You might consider the continuous version, where you pick a point at random from the real simplex $(x_1, .. x_N) \in \mathbf{R}^N: \sum x_i = N^2$. Equivalently; break a stick at N-1 randomly chosen points; what is the length of the longest piece? This is a well-studied problem; the mean size of the longest piece is about N log N. So I guess I think maybe that’s the expected value of the net worth of the richest player? But it’s not obvious to me whether you can safely approximate the finite problem by its continuous limit (which corresponds to the case where we keep the number of players at N but reduce the step size so that each player can give each other a cent, or a picocent, or whatever.) What happens if you give each of the N players just one dollar? Now the uniformity really breaks down, because it’s incredibly unlikely that nobody’s broke. The probability distribution on the set of (m_1, .. m_N) summing to N assigns each vector a probability proportional to the size of its support (i.e. the number of m_i that are nonzero.) That must be a well-known distribution, right? What does the corresponding distribution on partitions of N look like? Update: Kenny Easwaran points out that this is basically the same computation physicists do when they compute the Boltzmann distribution, which was new to me. Tagged , , ## The Coin Game, II Good answers to the last question! I think I perhaps put my thumb on the scale too much by naming a variable p. Let me try another version in the form of a dialogue. ME: Hey in that other room somebody flipped a fair coin. What would you say is the probability that it fell heads? YOU: I would say it is 1/2. ME: Now I’m going to give you some more information about the coin. A confederate of mine made a prediction about whether the coin would fall head or tails and he was correct. Now what would you say is the probability that it fell heads? YOU: Now I have no idea, because I have no information about the propensity of your confederate to predict heads. (Update: What if what you knew about the coin in advance was that it fell heads 99.99% of the time? Would you still be at ease saying you end up with no knowledge at all about the probability that the coin fell heads?) This is in fact what Joyce thinks you should say. White disagrees. But I think they both agree that it feels weird to say this, whether or not it’s correct. Why would it not feel weird? I think Qiaochu’s comment in the previous thread gives a clue. He writes: Re: the update, no, I don’t think that’s strange. You gave me some weird information and I conditioned on it. Conditioning on things changes my subjective probabilities, and conditioning on weird things changes my subjective probabilities in weird ways. In other words, he takes it for granted that what you are supposed to do is condition on new information. Which is obviously what you should do in any context where you’re dealing with mathematical probability satisfying the usual axioms. Are we in such a context here? I certainly don’t mean “you have no information about Coin 2” to mean “Coin 2 falls heads with probability p where p is drawn from the uniform distribution (or Jeffreys, or any other specified distribution, thanks Ben W.) on [0,1]” — if I meant that, there could be no controversy! I think as mathematicians we are very used to thinking that probability as we know it is what we mean when we talk about uncertainty. Or, to the extent we think we’re talking about something other than probability, we are wrong to think so. Lots of philosophers take this view. I’m not sure it’s wrong. But I’m also not sure it’s right. And whether it’s wrong or right, I think it’s kind of weird. Tagged , ## The coin game Here is a puzzling example due to Roger White. There are two coins. Coin 1 you know is fair. Coin 2 you know nothing about; it falls heads with some probability p, but you have no information about what p is. Both coins are flipped by an experimenter in another room, who tells you that the two coins agreed (i.e. both were heads or both tails.) What do you now know about Pr(Coin 1 landed heads) and Pr(Coin 2 landed heads)? (Note: as is usual in analytic philosophy, whether or not this is puzzling is itself somewhat controversial, but I think it’s puzzling!) Update: Lots of people seem to not find this at all puzzling, so let me add this. If your answer is “I know nothing about the probability that coin 1 landed heads, it’s some unknown quantity p that agrees with the unknown parameter governing coin 2,” you should ask yourself: is it strange that someone flipped a fair coin in another room and you don’t know what the probability is that it landed heads?” Relevant readings: section 3.1 of the Stanford Encyclopedia of Philosophy article on imprecise probabilities and Joyce’s paper on imprecise credences, pp.13-14. Tagged , , ## My other daughter is a girl I like Cathy’s take on this famous probability puzzle. Why does this problem give one’s intuition such a vicious noogie? It is relevant that the two questions below have two different answers. • I have two children. One of my children is a girl who was born on Friday. What’s the probability I have two girls? • I have two children. One of my children is a girl. Before you came in, I selected a daughter at random from the set of all my daughters, and this daughter was born on Friday. What’s the probability I have two girls? ## In like a linkdump ## The hardest Rush Hour position It takes 93 moves to solve, per this paper by Collette, Raskin, and Servais. I tried it and got nowhere. You can think of the space of all possible configurations of vehicles as, well, a configuration space, not unlike the configuration spaces of disks in a box. But here there is a bit less topology; the space is just a graph, with two configurations made adjacent if one can be reached from the other by making a single move. The connected component of configuration space containing the “hardest case” shown here has 24,132 vertices. I wonder what this graph looks like? What does the path of the cars look like as you traverse the 93-step path; do most of the cars traverse most of their range? How many of the possible configurations of the 13 vehicles (constrained to stay in the given rows and columns, and in the same linear order when two share a row or column) are actually contained in this component? Maybe Matt Kahle knows. By the way, another Matt Kahle-like fact is that among the list of the hardest configurations are some which are not so dense at all, like this one with only 9 cars. It looks like it should be easy, but apparently it takes 83 moves to solve! ## Aggregating degrees of belief: a puzzle There are two events X and Y whose probability you’d like to estimate. So you ask a hundred trusted, reasonable people what they think. Half of them say that the probability of X and the probability of Y are both 90%, and the probability of both X and Y occurring is 81%. The other half say that P(X) = 10%, P(Y) = 10%, and P(X and Y) = 1%. What is your best estimate of P(X), P(Y), and P(X and Y)? If you said “50%, 50%, 41%,” does it bother you that you deem these events not to be independent, even though every single person you polled said the opposite? If not, what did you say? (The subtext of this post is: is the “Independence of Irrelevant Alternatives” axiom in Arrow’s theorem a good idea? Feel free to discuss that too.) Tagged , , ## The Google puzzle and the perils of averaging ratios The following brain-teaser has been going around, identified as a question from a Google interview (though there’s some controversy about whether Google actually uses questions like this.) There’s a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What fraction of the population is female? Steve Landsburg posted a version of this question on his blog. “The answer they expect,” he writes, “is simple, definitive, and wrong… Are you smarter than the folks at Google? What’s the answer?” Things quickly went blooey. Google’s purported answer — fiercely argued for by lots of Landsburg’s readers — is 1/2. Landsburg said the right answer was less. A huge comment thread and many follow-up posts ensued. Lubos Motl took time out from his busy schedule of yelling at mathematicians about string theory to yell at Landsburg about probability theory. Landsburg offered to bet Motl, or anybody else,$15,000 that a computer simulation would demonstrate the correctness of his answer.

What’s going on here?  How could a simple probability question have stirred up such a ruckus?

Here’s Landsburg’s explanation of the question:

What fraction of the population should we expect to be female? That is, in a large number of similar countries, what would be the average proportion of females?

If G is the number of girls, and B the number of boys, Landsburg is asking for the expected value E(G/(G+B)).  And let’s get one thing straight:  Landsburg is absolutely right about this expected value.  For any finite number of families, it is strictly less than 1/2.  (See the related Math Overflow thread for a good explanation.)  Landsburg has very patiently knocked down the many wrong arguments to the contrary in his comments section.  Anybody who bets against him, on his terms, is going to lose.

Nonetheless, I’m about to explain why Landsburg is wrong.

You see, Google’s version of the question doesn’t specify anything about expectation.  They might just as well have meant:  “What is the proportion of the expected number of females in the expected population?”  Which is to say, “What is E(G)/E(G) + E(B)”?  And the answer to that question is 1/2.  Just to emphasize the subtlety involved here:

On average, the number of boys and the number of girls are the same.  Furthermore, the proportion of girls is, on average, less than 1/2.

Weird, right?  E(G)/E(G) + E(B) isn’t what Landsburg was asking for — but, if Google’s answer was 1/2, it’s presumably the question they had in mind.  To accuse them of getting their own question “wrong” is a bit rich.

But let me go all in — I actually think Landsburg’s interpretation of the question is not only different from Google’s, but in some ways inferior!  Because averaging ratios with widely ranging denominators is kind of a weird thing to do.  You can certainly compute the average population density of all the U.S. states — but should you? What meaning or use would the result have?

I had a really pungent example ready to deploy, which illustrates the perils of averaging ratios and explains why Landsburg’s version of the question was a little weird.  Then I went to the Joint Meetings before getting around to writing this post.  And when I got back, I discovered that Landsburg had posted the same example on his own blogin support of his point of view!  Awesome.  Here it is:

There’s a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. In expectation, what is the ratio of boys to girls?

The answer to this question is, of course, infinity; in a finite population there might be no girls, so B/G is infinite with some positive probability, so E(B/G) is infinite as well.

But the correctness of that answer surely tells us this is a terrible question!  Averaging is a terribly cruel thing to do to a bunch of ratios.  One zero denominator and you’ve wiped out your entire dataset.

What if Landsburg had phrased his new question along the lines of Google’s original puzzle?

There’s a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What is the ratio of boys to girls in this country?

Honest question:  does Landsburg truly think that infinity is the only “right answer” to this question?  Does he think infinity is a good answer?  Would he hire a person who gave that answer?  Would you?

## Rush Hour, Jr.

OK, so a black toddler and a Chinese toddler stumble on an international drug-trafficking ring — no, actually, this is a game I just bought for CJ, a kid’s version of Nob Yoshigahara‘s classic game Rush Hour.  The object here is to get the small white truck to the edge of the board (the top edge, in the image here.)  The trucks in your way can’t move sideways or turn; they just go forward and back.

You play a captivating game like this and naturally you start abstracting out the underlying math problem.  Play Set enough and you can’t avoid thinking about affine capsRush Hour has more to do with the geometry of configuration spaces; it reminds me of the “disk in a box” problems that people like Persi Diaconis and Matt Kahle work on.

So here’s a question — it doesn’t capture all the features of Rush Hour, but let’s start here.  Let X be the unit square, and let c be a parameter between 0 and 1, and let N be a large integer.  Let L be the set of line segments in X which are either horizontal of the form y = i/N or vertical of the form x = i/N.  A traffic jam is a choice of a length-c interval in each of the 2N +2 line segments in L, where we require that these intervals be pairwise disjoint.  The traffic jams naturally form a topological space, which we call T(N,c).  We say an interval (x,i/n),(x+c,i/n) in a traffic jam t is trapped if no traffic jam in the connected component of t contains the interval (0,i/n),(c,i/n).

Questions: For which values of (N,c) is T(N,c) connected?  In particular, is it connected almost always once it’s nonempty?  If not, when does T(N,c) have a “giant component”?  If there’s an interesting range of parameters where T(N,c) is not connected, what proportion of intervals do we expect to be trapped?

## Motivic puzzle: the moduli space of squarefree polynomials

As I’ve mentioned before, the number of squarefree monic polynomials of degree n in F_q[t] is exactly q^n – q^{n-1}.

I explained in the earlier post how to interpret this fact in terms of the cohomology of the braid group.  But one can also ask whether this identity has a motivic interpretation.  Namely:  let U be the variety over Q parametrizing monic squarefree polynomials of degree d.  So U is a nice open subvariety of affine n-space.  Now the identity of point-counts above suggests the question:

Question: Is there an identity [U] = [A^n] – [A^{n-1}] in the ring of motives K_0(Var/Q)?

I asked Loeser, who seemed to feel the answer was likely yes, and pointed out to me that one could also ask whether the two classes were identical in the localization K_0(Var/Q)[1/L], where L is the class of A^1.  Are these questions different?  That is, is there any nontrivial kernel in the natural map K_0(Var/Q) -> K_0(Var/Q)[1/L]?  This too is apparently unknown.

Here, I’ll start you off by giving a positive answer in the easy case n=2!  Then the monic polynomials are parametrized by A^2, where (b,c) corresponds to the polynomial x^2 + bx + c.  The non-squarefree locus (i.e. the locus of vanishing of the discriminant) consists of solutions to b^2 – 4c = 0; the projection to c is an isomorphism to A^1 over Q.  So in this case the identity is indeed correct.

Update:  I totally forgot that Mike Zieve sent me a one-line argument a few months back for the identity |U(F_q)| = q^n – q^{n-1} which is in fact a proof of the motivic identity as well!  Here it is, in my paraphrase.

Write U_e for the subvariety of U consisting of degree-d polynomials of the form a(x)b(x)^2, with a,b monic, a squarefree, and b of degree e.  Then U is the union of U_e as e ranges from 1 to d/2.  Note that the factorisation as ab^2 is unique; i.e, U_e is naturally identified with {monic squarefree polynomials of degree d-2e} x {monic polynomials of degree e.}

Now let V be the space of all polynomials (not necessarily monic) of degree d-2, so that [V] = [A^{n-1}] – [A^{n-2}].  Let V_e be the space of polynomials which factor as c(x)d(x)^2, with d(x) having degree e-1.  Then V is the union of V_e as e ranges from 1 to d/2.

Now there is a map from U_e to V_e which sends a(x)b(x)^2 to a(x)(b(x) – b(0))^2, and one checks that this induces an isomorphism between V_e x A^1 and U_e, done.

But actually, now that I think of it, Mike’s observation allows you to get the motivic identity even without writing down the map above:  if we write $U^d_e$ for the space of monic squarefrees of degree d in stratum e, then $U^d_e = U_{d-2e} \times \mathbf{A}^e$, and then one can easily compute the class $U^d_0$ by induction.