## Maps of candidates from coconsideration

G. Elliot Morris posted this embedding of the current Democratic presidential candidates in R^2 on Twitter:

where the edge weights (and thus the embeddings) derive from YouGov data, which for each pair of candidates (i,j) tell you which proportion of voters who report they’re considering candidate i also tell you they’re considering candidate j.

Of course, this matrix is non-symmetric, which makes me wonder exactly how he derived distances from it. I also think his picture looks a little weird; Sanders and Bloomberg are quite ideologically distinct, and their coconsiderers few in number, but they end up neighbors in his embedding.

Here was my thought about how one might try to produce an embedding using the matrix above. Model voter ideology as a standard Gaussian f in R^2 (I know, I know…) and suppose each candidate is a point y in R^2. You can model propensity to consider y as a standard Gaussian centered at y, so that the number of voters who are considering candidate y is proportional to the integral

$\int f(x) f(y-x) dx$

and the voters who are considering candidate z to

$\int f(x) f(y-x) f(z-x) dx$

So the proportions in Morris’s table can be estimated by the ratio of the second integral to the first, which, if I computed it right (be very unsure about the constants) is

$(2/3) \exp(-(1/12) |y-2z|^2$.

(The reason this is doable in closed form is that the product of Gaussian probability density functions is just exp(-Q) for some other quadratic form, and we know how to integrate those.) In other words, the candidate y most likely to be considered by voters considering z is one who’s just like z but half as extreme. I think this is probably an artifact of the Gaussian I’m using, which doesn’t, for instance, really capture a scenario where there are multiple distinct clusters of voters; it posits a kind of center where ideological density is highest. Anyway, you can still try to find 8 points in R^2 making the function above approximate Morris’s numbers as closely as possible. I didn’t do this in a smart optimization way, I just initialized with random numbers and let it walk around randomly to improve the error until it stopped improving. I ended up here:

which agrees with Morris that Gabbard is way out there, that among the non-Gabbard candidates, Steyer and Klobuchar are hanging out there as vertices of the convex hull, and that Warren is reasonably central. But I think this picture more appropriately separates Bloomberg from Sanders.

How would you turn the coconsideration numbers into an R^2 embedding?

## Political coordinates test

A popular political quiz on the internet purports to place you on a Cartesian plane with “left-right” on one axis and “libertarian-communitarian” on the other, by presenting you with 36 assertions you’re suppposed to agree or disagree with. One of them is

“There are too many wasteful government programs.”

Well, of course there are! For this not to be the case, the government would have to be uniquely unwasteful among all large institutions. The quiz does not ask whether you agree that

“There are too many wasteful private enterprises.”

I would like to agree with both, but the test only allows me to agree with the first while remaining silent above the second, which makes me seem more of a free-market purist than I really am. Which questions you choose to ask affects which answers you’re able to get.

## Roch on phylogenetic trees, learning ultrametrics from noisy measurements, and the shrimp-dog

Sebastien Roch gave a beautiful and inspiring talk here yesterday about the problem of reconstructing an evolutionary tree given genetic data about present-day species.  It was generally thought that keeping track of pairwise comparisons between species was not going to be sufficient to determine the tree efficiently; Roch has proven that it’s just the opposite.  His talk gave me a lot to think about.  I’m going to try to record a probably  corrupted, certainly filtered through my own viewpoint account of Roch’s idea.

So let’s say we have n points P_1, … P_n, which we believe are secretly the leaves of a tree.  In fact, let’s say that the edges of the tree are assigned lengths.  In other words, there is a secret ultrametric on the finite set P_1, … P_n, which we wish to learn.  In the phylogenetic case, the points are species, and the ultrametric distance d(P_i, P_j) between P_i and P_j measures how far back in the evolutionary tree we need to go to find a comon ancestor between species i and species j.

One way to estimate d(P_i, P_j) is to study the correlation between various markers on the genomes of the two species.  This correlation, in Roch’s model, is going to be on order

exp(-d(P_i,P_j))

which is to say that it is very close to 0 when P_i and P_j are far apart, and close to 1 when the two species have a recent common ancestor.  What that means is that short distances are way easier to measure than long distances — you have no chance of telling the difference between a correlation of exp(-10) and exp(-11) unless you have a huge number of measurements at hand.  Another way to put it:  the error bar around your measurement of d(P_i,P_j) is much greater when your estimate is small than when your estimate is high; in particular, at great enough distance you’ll have no real confidence in any upper bound for the distance.

So the problem of estimating the metric accurately seems impossible except in small neighborhoods.  But it isn’t.  Because metrics are not just arbitrary symmetric n x n matrices.  And ultrametrics are not just arbitrary metrics.  They satisfy the ultrametric inequality

d(x,y) <= max(d(x,z),d(y,z)).

And this helps a lot.  For instance, suppose the number of measurements I have is sufficient to estimate with high confidence whether or not a distance is less than 1, but totally helpless with distances on order 5.  So if my measurements give me an estimate d(P_1, P_2) = 5, I have no real idea whether that distance is actually 5, or maybe 4, or maybe 100 — I can say, though, that it’s that it’s probably not 1.

So am I stuck?  I am not stuck!  Because the distances are not independent of each other; they are yoked together under the unforgiving harness of the ultrametric inequality.  Let’s say, for instance, that I find 10 other points Q_1, …. Q_10 which I can confidently say are within 1 of P_1, and 10 other points R_1, .. , R_10 which are within 1 of P_2.  Then the ultrametric inequality tells us that

d(Q_i, R_j) = d(P_1, P_2)

for any one of the 100 ordered pairs (i,j)!  So I have 100 times as many measurements as I thought I did — and this might be enough to confidently estimate d(P_1,P_2).

In biological terms:  if I look at a bunch of genetic markers in a shrimp and a dog, it may be hard to estimate how far back in time one has to go to find their common ancestor.  But the common ancestor of a shrimp and a dog is presumably also the common ancestor of a lobster and a wolf, or a clam and a jackal!  So even if we’re only measuring a few markers per species, we can still end up with a reasonable estimate for the age of the proto-shrimp-dog.

What do you need if you want this to work?  You need a reasonably large population of points which are close together.  In other words, you want small neighborhoods to have a lot of points in them.  And what Roch finds is that there’s a threshold effect; if the mutation rate is too fast relative to the amount of measurement per species you do, then you don’t hit “critical mass” and you can’t bootstrap your way up to a full high-confidence reconstruction of the metric.

This leads one to a host of interesting questions — interesting to me, that is, albeit not necessarily interesting for biology.  What if you want to estimate a metric from pairwise distances but you don’t know it’s an ultrametric? Maybe instead you have some kind of hyperbolicity constraint; or maybe you have a prior on possible metrics which weights “closer to ultrametric” distances more highly.  For that matter, is there a principled way to test the hypothesis that a measured distance is in fact an ultrametric in the first place?  All of this is somehow related to this previous post about metric embeddings and the work of Eriksson, Darasathy, Singh, and Nowak.