## E.O. Wilson does not think math is unnecessary

This piece by E.O. Wilson has been much shared and much griped about in my circles, but I think it’s a case of a provocative headline (“Great Scientist ≠ Good at Math:  discoveries emerge from ideas, not number-crunching”) prepended by the WSJ to an essay that says something much more modest and defensible.  I’d paraphrase Wilson like this.   Being good in math is like being a good writer.  Everyone agrees:

• You can do great science and be a terrible writer;
• Being better at writing is a worthwhile aspiration for any scientist.

The conjunction of these two statements in no way feels like a denigration of writing.  Nor is Wilson denigrating math.

I’ve said this before but it’s important so I’ll keep saying it — when you write an opinion piece for a publication, you don’t write the headline — the editors do, and they’ll put whatever loosely relevant headline will generate the most clicks.

Tagged , , ,

## John Doyle on handwaving and universal laws

John Doyle gave this year’s J. Barkley Rosser Lecture at the Wisconsin Institute for Discovery; his talk was dedicated to the proposition that tradeoffs between flexibility and robustness in control systems with significant delays are in the end going to be bound by universal laws, just as the operation of a classical Turing machine is bound by laws coming from information theory and complexity theory.  (A simple such one:  a machine that has the potential to produce N different outputs is going to have a worst-case run time of at least log N steps.)

Doyle believes the robustness-flexibility tradeoff should be fundamental to our way of thinking of both biological and technological devices.  He gave the following very illustrative example, which is so simple that you can play along as you read my blog.

Hold your hand in front of your face and wave your hand vigorously back and forth.  It looks blurry, right?

Which is strange, because the optical problem is in some sense exactly the same.  But the mechanism is different, and so the delay time is different.  When your hand moves, you’re using the same general-function apparatus you use to track moving objects more generally.  It’s a pretty good apparatus!  But because it’s so flexible, working well for all kinds of optical challenges, it is slow, and like any system with a long delay, input that oscillates pretty fast — like your waving hand — can cross it up.

When your head moves, it’s a different story:  we have a vestibulo-ocular reflex which moves our eyes in sync with our head to fix the images on our retina in place.  This doesn’t pass through cognition at all — it’s a direct neural connection from the vestibular sensors in the inner ear to the muscles that control eye movement.  This system isn’t flexible or adaptable at all.  It does just one thing — but it does it fast.

(All this material derived from my notes on Doyle’s talk, which went pretty fast:  all mistakes are mine.)

Here are the slides from Doyle’s talk.  (TooManySlides.pdf is the best filename ever!)

Here’s a paper from Science that Doyle said would be especially useful for mathematicians who want to see how the tradeoffs in question can be precisely formalize.  (Authors:  Chandra, Buzi, Doyle.)

## Roch on phylogenetic trees, learning ultrametrics from noisy measurements, and the shrimp-dog

Sebastien Roch gave a beautiful and inspiring talk here yesterday about the problem of reconstructing an evolutionary tree given genetic data about present-day species.  It was generally thought that keeping track of pairwise comparisons between species was not going to be sufficient to determine the tree efficiently; Roch has proven that it’s just the opposite.  His talk gave me a lot to think about.  I’m going to try to record a probably  corrupted, certainly filtered through my own viewpoint account of Roch’s idea.

So let’s say we have n points P_1, … P_n, which we believe are secretly the leaves of a tree.  In fact, let’s say that the edges of the tree are assigned lengths.  In other words, there is a secret ultrametric on the finite set P_1, … P_n, which we wish to learn.  In the phylogenetic case, the points are species, and the ultrametric distance d(P_i, P_j) between P_i and P_j measures how far back in the evolutionary tree we need to go to find a comon ancestor between species i and species j.

One way to estimate d(P_i, P_j) is to study the correlation between various markers on the genomes of the two species.  This correlation, in Roch’s model, is going to be on order

exp(-d(P_i,P_j))

which is to say that it is very close to 0 when P_i and P_j are far apart, and close to 1 when the two species have a recent common ancestor.  What that means is that short distances are way easier to measure than long distances — you have no chance of telling the difference between a correlation of exp(-10) and exp(-11) unless you have a huge number of measurements at hand.  Another way to put it:  the error bar around your measurement of d(P_i,P_j) is much greater when your estimate is small than when your estimate is high; in particular, at great enough distance you’ll have no real confidence in any upper bound for the distance.

So the problem of estimating the metric accurately seems impossible except in small neighborhoods.  But it isn’t.  Because metrics are not just arbitrary symmetric n x n matrices.  And ultrametrics are not just arbitrary metrics.  They satisfy the ultrametric inequality

d(x,y) <= max(d(x,z),d(y,z)).

And this helps a lot.  For instance, suppose the number of measurements I have is sufficient to estimate with high confidence whether or not a distance is less than 1, but totally helpless with distances on order 5.  So if my measurements give me an estimate d(P_1, P_2) = 5, I have no real idea whether that distance is actually 5, or maybe 4, or maybe 100 — I can say, though, that it’s that it’s probably not 1.

So am I stuck?  I am not stuck!  Because the distances are not independent of each other; they are yoked together under the unforgiving harness of the ultrametric inequality.  Let’s say, for instance, that I find 10 other points Q_1, …. Q_10 which I can confidently say are within 1 of P_1, and 10 other points R_1, .. , R_10 which are within 1 of P_2.  Then the ultrametric inequality tells us that

d(Q_i, R_j) = d(P_1, P_2)

for any one of the 100 ordered pairs (i,j)!  So I have 100 times as many measurements as I thought I did — and this might be enough to confidently estimate d(P_1,P_2).

In biological terms:  if I look at a bunch of genetic markers in a shrimp and a dog, it may be hard to estimate how far back in time one has to go to find their common ancestor.  But the common ancestor of a shrimp and a dog is presumably also the common ancestor of a lobster and a wolf, or a clam and a jackal!  So even if we’re only measuring a few markers per species, we can still end up with a reasonable estimate for the age of the proto-shrimp-dog.

What do you need if you want this to work?  You need a reasonably large population of points which are close together.  In other words, you want small neighborhoods to have a lot of points in them.  And what Roch finds is that there’s a threshold effect; if the mutation rate is too fast relative to the amount of measurement per species you do, then you don’t hit “critical mass” and you can’t bootstrap your way up to a full high-confidence reconstruction of the metric.

This leads one to a host of interesting questions — interesting to me, that is, albeit not necessarily interesting for biology.  What if you want to estimate a metric from pairwise distances but you don’t know it’s an ultrametric? Maybe instead you have some kind of hyperbolicity constraint; or maybe you have a prior on possible metrics which weights “closer to ultrametric” distances more highly.  For that matter, is there a principled way to test the hypothesis that a measured distance is in fact an ultrametric in the first place?  All of this is somehow related to this previous post about metric embeddings and the work of Eriksson, Darasathy, Singh, and Nowak.

## Bovine fraternal skin graft

Another thing I learned from the August 1951 issue of The Times Review of the Progress of Science is that cows can accept skin grafts from their fraternal twins, but humans can’t.  That’s because cow fetuses actually share some blood and tissue in the womb, and automatically get desensitized to those particular foreign entities when they’re young enough not to reject them.  This was totally new to me but apparently if I knew anything about immunology I would already be familiar with this, because Peter Medawar’s work on the phenomenon earned him the 1960 Nobel Prize and more or less launched the field of acquired transplantation tolerance.

There was also an anecdote about a baby switched at birth, who doctor proved to be the identical twin of another child in his birth family by grafting a patch of his skin onto the other kid!

Tagged , ,

## Tom Leinster on entropy, diversity, and cardinality

You might want to consider reading the n-category cafe even if you don’t know what an n-category is — even if, antique as this view may be, you don’t care what an n-category is!

For instance, it’s the best place to read about the curious case of M. El Naschie, who’s published 322 of his own papers in the journal he edits for Elsevier.

More substantively: Tom Leinster has a beautiful pair of posts (part I, part II) about varying notions of “diversity” in population biology, and a way to capture all these notions as special cases of a general mathematical construction.

Drastic oversimplification: you might start by defining the diversity of an island beetle population to be the number of different species of beetles living there. But that misses something — a population with three equinumerous beetle species is more diverse than one where a single dominant species accounts for 98% of the beetles, with the remainder split evenly between the other two species. Part I of Leinster’s post is devoted to various measures that capture this behavior. In particular, he’ll explain why on the former island the effective number of species is 3 (just as you’d expect) while on the latter the “number” of species is not 3, but about 1.12 — in other words, the second island is very close to having just one kind of beetle.

In part II, Leinster discusses what happens when you take into account that some pairs of species are more similar than others.

## Zebrafish are very interesting

I was working in Memorial Union today, trying to figure out what I think the phrase “random pro-p group” should mean, when I noticed that the guy in the booth next to mine was reading an 800-page conference proceedings about zebrafish. Well, I just had to ask. What’s so interesting about zebrafish?

It turns out that developmental biologists are BFF with zebrafish, whose growth to maturity is both very visible — their eggs are transparent — and very, very fast — from a single cell to a creature with a functioning nervous system in 24 hours, and to something resembling a fish in 4 days. So you can follow many hundreds of generations of these guys from fertilization on, watching closely on a microsopic scale, making different kinds of cells light up so you can see what they’re up to, flicking different genomic switches on and off … SCIENCE!

All material above paraphrased from my conversation with unnamed zebrafish expert, and not checked against an authoritative source — please do not use in your term paper, zebrafish Googlers! Perhaps a better resource would be Zebrafish — the peer-reviewed journal. Or the University of Oregon zebrafish FAQ, where you can find the answer to “How can we obtain mutant stocks of zebrafish for our high school lab?” Gotta go, I think I just had a great idea for a low-budget horror movie.