Tag Archives: evolution

Roch on phylogenetic trees, learning ultrametrics from noisy measurements, and the shrimp-dog

Sebastien Roch gave a beautiful and inspiring talk here yesterday about the problem of reconstructing an evolutionary tree given genetic data about present-day species.  It was generally thought that keeping track of pairwise comparisons between species was not going to be sufficient to determine the tree efficiently; Roch has proven that it’s just the opposite.  His talk gave me a lot to think about.  I’m going to try to record a probably  corrupted, certainly filtered through my own viewpoint account of Roch’s idea.

So let’s say we have n points P_1, … P_n, which we believe are secretly the leaves of a tree.  In fact, let’s say that the edges of the tree are assigned lengths.  In other words, there is a secret ultrametric on the finite set P_1, … P_n, which we wish to learn.  In the phylogenetic case, the points are species, and the ultrametric distance d(P_i, P_j) between P_i and P_j measures how far back in the evolutionary tree we need to go to find a comon ancestor between species i and species j.

One way to estimate d(P_i, P_j) is to study the correlation between various markers on the genomes of the two species.  This correlation, in Roch’s model, is going to be on order

exp(-d(P_i,P_j))

which is to say that it is very close to 0 when P_i and P_j are far apart, and close to 1 when the two species have a recent common ancestor.  What that means is that short distances are way easier to measure than long distances — you have no chance of telling the difference between a correlation of exp(-10) and exp(-11) unless you have a huge number of measurements at hand.  Another way to put it:  the error bar around your measurement of d(P_i,P_j) is much greater when your estimate is small than when your estimate is high; in particular, at great enough distance you’ll have no real confidence in any upper bound for the distance.

So the problem of estimating the metric accurately seems impossible except in small neighborhoods.  But it isn’t.  Because metrics are not just arbitrary symmetric n x n matrices.  And ultrametrics are not just arbitrary metrics.  They satisfy the ultrametric inequality

d(x,y) <= max(d(x,z),d(y,z)).

And this helps a lot.  For instance, suppose the number of measurements I have is sufficient to estimate with high confidence whether or not a distance is less than 1, but totally helpless with distances on order 5.  So if my measurements give me an estimate d(P_1, P_2) = 5, I have no real idea whether that distance is actually 5, or maybe 4, or maybe 100 — I can say, though, that it’s that it’s probably not 1.

So am I stuck?  I am not stuck!  Because the distances are not independent of each other; they are yoked together under the unforgiving harness of the ultrametric inequality.  Let’s say, for instance, that I find 10 other points Q_1, …. Q_10 which I can confidently say are within 1 of P_1, and 10 other points R_1, .. , R_10 which are within 1 of P_2.  Then the ultrametric inequality tells us that

d(Q_i, R_j) = d(P_1, P_2)

for any one of the 100 ordered pairs (i,j)!  So I have 100 times as many measurements as I thought I did — and this might be enough to confidently estimate d(P_1,P_2).

In biological terms:  if I look at a bunch of genetic markers in a shrimp and a dog, it may be hard to estimate how far back in time one has to go to find their common ancestor.  But the common ancestor of a shrimp and a dog is presumably also the common ancestor of a lobster and a wolf, or a clam and a jackal!  So even if we’re only measuring a few markers per species, we can still end up with a reasonable estimate for the age of the proto-shrimp-dog.

What do you need if you want this to work?  You need a reasonably large population of points which are close together.  In other words, you want small neighborhoods to have a lot of points in them.  And what Roch finds is that there’s a threshold effect; if the mutation rate is too fast relative to the amount of measurement per species you do, then you don’t hit “critical mass” and you can’t bootstrap your way up to a full high-confidence reconstruction of the metric.

This leads one to a host of interesting questions — interesting to me, that is, albeit not necessarily interesting for biology.  What if you want to estimate a metric from pairwise distances but you don’t know it’s an ultrametric? Maybe instead you have some kind of hyperbolicity constraint; or maybe you have a prior on possible metrics which weights “closer to ultrametric” distances more highly.  For that matter, is there a principled way to test the hypothesis that a measured distance is in fact an ultrametric in the first place?  All of this is somehow related to this previous post about metric embeddings and the work of Eriksson, Darasathy, Singh, and Nowak.

 

 

 

Tagged , , , , , , , , ,

In re pigeons, Mr. Darwin is immense

The New York Times reviews The Origin of Species, March 1860.

Shall we frankly declare that, after the most deliberate consideration of Mr. DARWIN’s arguments, we remain unconvinced?

The book is full of a most interesting and impressive series of minor verifications; but he fails to show the points of junction between these, and no where rises to complete logical statement.

The difficulties, of course, are enormous. This he frankly acknowledges. “Some of them are so grave that to this day I can never reflect on them without being staggered.” Such are his own naive and noble words.

He thinks, however, they are more apparent than real. We fear they are very real. To us insurmountable.

Tagged , , , ,

Human(itie)s, aliens, and autism: Ian Hacking and Elliot Sober at Fluno Center tomorrow

Humanities at Wisconsin are said to be underfunded and demoralized, but you’d never know it from the excellent “What is Human?” symposium the Center for Humanities is holding tomorrow at Fluno Center. At 1:45, Ian Hacking will speak on “Humans, aliens, and autism” — perhaps he’ll expand on some of the material in this 2006 essay from the LRB. Hacking’s two books on the development of probability theory, The Emergence of Probability and The Taming of Chance, are probably the best I’ve read on the history of mathematics; to stay bound to the theme of this post, he is one of the only people writing really humanely about mathematical practice. (The late Thomas Tymoczko was another.)

Speaking at 11:15 is our own Elliot Sober, who is that most powerful of creatures, a philosopher who knows Bayes’ Theorem. (See also: Adam Elga, K. Anthony Appiah.) Sober’s title is TBA, but he may well talk about (or, more likely, against) the “design argument” against Darwinism. (He’s definitely giving a talk on that subject at 7:30 this Thursday night, in 1315 Chemistry.) A very vulgar version of the design argument looks like this. The probability that intelligent life would arise, if there were no divine guidance, is nonzero but spectacularly small. The probability that intelligent life would arise, if a divine being created it, is 1. Now Bayes says you should think that divine origin of human life is very likely, even if it was very unlikely in your prior. Sober’s new book, Evidence and Evolution, takes on the design argument and its many more sophisticated variants, and more generally tries to work out what we mean by “evidence” about the origins of life. Bayes flies everywhere.

Sober is also credited with the following joke:

A boy is about to go on his first date, and is nervous about what to talk about. He asks his father for advice. The father replies: “My son, there are three subjects that always work. These are food, family, and philosophy.”

The boy picks up his date and they go to a soda fountain. Ice cream sodas in front of them, they stare at each other for a long time, as the boy’s nervousness builds. He remembers his father’s advice, and chooses the first topic. He asks the girl: “Do you like potato pancakes?” She says “No,” and the silence returns.

After a few more uncomfortable minutes, the boy thinks of his father’s suggestion and turns to the second item on the list. He asks, “Do you have a brother?” Again, the girl says “No” and there is silence once again.

The boy then plays his last card. He thinks of his father’s advice and asks the girl the following question: “If you had a brother, would he like potato pancakes?”

(This philosophy joke, along with many others, appears here.)

Tagged , , , , , , , , , ,

Bad predictions

Working today in the 6th floor of Memorial Library, surrounded by outdated religion books. I spent the afternoon laboriously checking the approximate exactness of some complex and trying to keep a lot of indices straight, and when my head started to swim I’d go pick a book off the shelf and see what it said.

So I opened up I Believe in God and in Evolution, by William W. Keen, M.D. The book is an expanded version of his 1922 commencement address at Crozer Theological Seminary, best-known as the alma mater of Martin Luther King, Jr.

Keen writes:

Darwin’s Origin of Species was published in 1859, the year when I graduated at Brown University. The recrudescence of the warfare over Evolution, which for many years had subsided and almost disappeared, except sporadically, is a strange and surely only a passing phenomenon.

Nope!

Tagged , ,
%d bloggers like this: