Monthly Archives: March 2012

Orioles pre-mortem 2012

That time of year!  When we ask:  are the Orioles going to finish last in the AL East this year?  Yes.  The Orioles are going to finish last in the AL East this year.  Camden Depot has them projected at 68-94, 11 games out of fourth place.  On the other hand, I think the error bars around the O’s 2012 record are pretty wide; so I have a higher estimate than most people of the chance of the Orioles having an OK year.  If somebody gives you the chance to bet on the Orioles not to finish last at 10:1 odds, I think you should take it.


Because the 2011 Orioles were a pretty much average team with the bat.  They were a bad team overall because they were lousy defensively, and, more importantly, because their young pitchers were uniformly terrible.

What about this year?  I don’t think the lineup is likely to get worse.  Much of it hasn’t changed.  J.J. Hardy isn’t likely to hit as well in 2012, but Wilson Betemit is likely to contribute useful AB.  And there’s one area where the team could improve a lot.  The Orioles had an absolutely terrible bench last year.  The good thing about giving a ton of at-bats to horrible players is that you can get a big improvement just by finding some mediocre players to replace them.  Between Jai Miller, Endy Chavez, Betemit, and Nick Johnson, I think those at-bats are not going to be as bad.  Overall, I think again you have a team that’ll score runs at or just a little below the league average.

As for the pitching, no one knows.  But there are so many pitchers on the 2012 team who could be effective that it’s hard to imagine there won’t be two or three who actually get there.  On the other hand, the only pitcher who was reasonably certain to be effective, Jeremy Guthrie, was traded.  Thus the large error bars.  If Matusz doesn’t recover, if Wada and Chen can’t transition to MLB, if Hammel can’t replace Guthrie’s innings, the pitching could actually get even worse than it was in 2011.  But there’s a non-tiny chance it could be a whole lot better.

Update:  But why is Jamie Moyer pitching for the Rockies instead of us?  It would have been a longshot, I know, but a pitching staff with valuable contributions from both Jamie Moyer and Dontrelle Willis would be a thing of glory and joy.

Tufts made me a nice poster

I just had the extremely enjoyable experience of giving the Norbert Wiener lectures at Tufts.  I’m not sure my talks lived up to the awesomeness of this poster:


Tagged , ,

In which I am sentimental about diagramming sentences

Enjoyable op-ed in the Times about the history of the soon-to-be-lost art:

By the latter half of the 19th century, chalkboards had become increasingly common in classrooms; for students, the impact of watching a sentence take shape on that large surface as a comprehensible, often elegant, and sometimes downright ingenious drawing must have been significant. It’s hard to believe anyone but the most dedicated pedant could have actually enjoyed parsing, but plenty of students — including me — loved diagramming.

Me too.  It’s funny:  I don’t have any feeling at all that today’s students need to learn the pencil-and-paper algorithm for long division or square root extraction.  But the vanishing of sentence diagrams makes me sad.  Presumably if I were a linguist instead of a mathematician I’d feel the opposite.

Tagged , , , ,

John Lackey stands up for himself

From today’s Shaughnessy:

Is he willing to acknowledge that mistakes were made? “I guess. Sure. They’re being made in every clubhouse in the big leagues, then. If we’d have made the playoffs, we’d have been a bunch of fun guys.’’

He’s right!  We think we’re judging people’s behavior, but when we judge in retrospect, we approve and disapprove of the same behaviors, depending on the outcome.  See Phil Rosenzweig’s The Halo Effect for the same phenomenon described at book length.  A firm’s behavior will invariably be described as “daring” and “bold” when the company is doing well.  When the company’s fortunes turn sour, the same set of decisions are retrospectively reclassified as “reckless” and “foolhardy.”

Tagged , , ,

Do required math courses increase your earnings?

So says Josh Goodman at the Kennedy School:

I identify the impact of math coursework on earnings using the differential timing of state-level increases in high school graduation requirements as a source of exogenous variation. The increased requirements induced large increases in both the completed math coursework and earnings of blacks, particularly black males. Two-sample instrumental variable estimates suggest that each additional year of math raised blacks’ earnings by 5-9%, accounting for a large fraction of the value of a year of schooling. Closer analysis suggests that much of this effect comes from black students who attend non-white schools and who will not attend college.

Tagged , , ,

Gluten-free for lent

I have a friend who has given up gluten for lent.  We had an interesting discussion today about whether this would be annoying to someone suffering from celiac disease.  We considered this test case:  would it be OK, or not OK, to rent a wheelchair and give up walking for Lent?  Clearly not OK, it seems to me, but I’m having trouble formulating the correct decision principle.  Lots of people give up sweets for Lent, and this doesn’t seem insensitive in any way to the world’s diabetics.

Tagged , , ,

Roch on phylogenetic trees, learning ultrametrics from noisy measurements, and the shrimp-dog

Sebastien Roch gave a beautiful and inspiring talk here yesterday about the problem of reconstructing an evolutionary tree given genetic data about present-day species.  It was generally thought that keeping track of pairwise comparisons between species was not going to be sufficient to determine the tree efficiently; Roch has proven that it’s just the opposite.  His talk gave me a lot to think about.  I’m going to try to record a probably  corrupted, certainly filtered through my own viewpoint account of Roch’s idea.

So let’s say we have n points P_1, … P_n, which we believe are secretly the leaves of a tree.  In fact, let’s say that the edges of the tree are assigned lengths.  In other words, there is a secret ultrametric on the finite set P_1, … P_n, which we wish to learn.  In the phylogenetic case, the points are species, and the ultrametric distance d(P_i, P_j) between P_i and P_j measures how far back in the evolutionary tree we need to go to find a comon ancestor between species i and species j.

One way to estimate d(P_i, P_j) is to study the correlation between various markers on the genomes of the two species.  This correlation, in Roch’s model, is going to be on order


which is to say that it is very close to 0 when P_i and P_j are far apart, and close to 1 when the two species have a recent common ancestor.  What that means is that short distances are way easier to measure than long distances — you have no chance of telling the difference between a correlation of exp(-10) and exp(-11) unless you have a huge number of measurements at hand.  Another way to put it:  the error bar around your measurement of d(P_i,P_j) is much greater when your estimate is small than when your estimate is high; in particular, at great enough distance you’ll have no real confidence in any upper bound for the distance.

So the problem of estimating the metric accurately seems impossible except in small neighborhoods.  But it isn’t.  Because metrics are not just arbitrary symmetric n x n matrices.  And ultrametrics are not just arbitrary metrics.  They satisfy the ultrametric inequality

d(x,y) <= max(d(x,z),d(y,z)).

And this helps a lot.  For instance, suppose the number of measurements I have is sufficient to estimate with high confidence whether or not a distance is less than 1, but totally helpless with distances on order 5.  So if my measurements give me an estimate d(P_1, P_2) = 5, I have no real idea whether that distance is actually 5, or maybe 4, or maybe 100 — I can say, though, that it’s that it’s probably not 1.

So am I stuck?  I am not stuck!  Because the distances are not independent of each other; they are yoked together under the unforgiving harness of the ultrametric inequality.  Let’s say, for instance, that I find 10 other points Q_1, …. Q_10 which I can confidently say are within 1 of P_1, and 10 other points R_1, .. , R_10 which are within 1 of P_2.  Then the ultrametric inequality tells us that

d(Q_i, R_j) = d(P_1, P_2)

for any one of the 100 ordered pairs (i,j)!  So I have 100 times as many measurements as I thought I did — and this might be enough to confidently estimate d(P_1,P_2).

In biological terms:  if I look at a bunch of genetic markers in a shrimp and a dog, it may be hard to estimate how far back in time one has to go to find their common ancestor.  But the common ancestor of a shrimp and a dog is presumably also the common ancestor of a lobster and a wolf, or a clam and a jackal!  So even if we’re only measuring a few markers per species, we can still end up with a reasonable estimate for the age of the proto-shrimp-dog.

What do you need if you want this to work?  You need a reasonably large population of points which are close together.  In other words, you want small neighborhoods to have a lot of points in them.  And what Roch finds is that there’s a threshold effect; if the mutation rate is too fast relative to the amount of measurement per species you do, then you don’t hit “critical mass” and you can’t bootstrap your way up to a full high-confidence reconstruction of the metric.

This leads one to a host of interesting questions — interesting to me, that is, albeit not necessarily interesting for biology.  What if you want to estimate a metric from pairwise distances but you don’t know it’s an ultrametric? Maybe instead you have some kind of hyperbolicity constraint; or maybe you have a prior on possible metrics which weights “closer to ultrametric” distances more highly.  For that matter, is there a principled way to test the hypothesis that a measured distance is in fact an ultrametric in the first place?  All of this is somehow related to this previous post about metric embeddings and the work of Eriksson, Darasathy, Singh, and Nowak.




Tagged , , , , , , , , ,

17th-century patent trolling

In 1685, Sir William Phips found caches of sunken treasure in Caribbean shipwrecks, setting off a burst of feverish speculation.  Between 1691 and 1693, 11 out of the 61 patents issued in England were for diving bells.  Many of these patenters apparently had no real invention to produce at all, but were simply seeking royal license to establish a diving-engine company to draw capital from credulous investors.  One of those was Daniel DeFoe, who lost 200 pounds to a treasure-diving enterprise that evaporated as soon as it had cashed in on its patents.  From Defoe’s An Essay on Projects:

There are, and that too many, fair pretences of fine discoveries,
new inventions, engines, and I know not what, which–being advanced
in notion, and talked up to great things to be performed when such
and such sums of money shall be advanced, and such and such engines
are made–have raised the fancies of credulous people to such a
height that, merely on the shadow of expectation, they have formed
companies, chose committees, appointed officers, shares, and books,
raised great stocks, and cried up an empty notion to that degree
that people have been betrayed to part with their money for shares
in a new nothing; and when the inventors have carried on the jest
till they have sold all their own interest, they leave the cloud to
vanish of itself, and the poor purchasers to quarrel with one
another, and go to law about settlements, transferrings, and some
bone or other thrown among them by the subtlety of the author to lay
the blame of the miscarriage upon themselves. Thus the shares at
first begin to fall by degrees, and happy is he that sells in time;
till, like brass money, it will go at last for nothing at all. So
have I seen shares in joint-stocks, patents, engines, and
undertakings, blown up by the air of great words, and the name of
some man of credit concerned, to 100 pounds for a five-hundredth
part or share (some more), and at last dwindle away till it has been
stock-jobbed down to 10 pounds, 12 pounds, 9 pounds, 8 pounds a
share, and at last no buyer (that is, in short, the fine new word
for nothing-worth), and many families ruined by the purchase. If I
should name linen manufactures, saltpetre-works, copper mines,
diving engines, dipping, and the like, for instances of this, I
should, I believe, do no wrong to truth, or to some persons too
visibly guilty.

I might go on upon this subject to expose the frauds and tricks of
stock-jobbers, engineers, patentees, committees, with those Exchange
mountebanks we very properly call brokers, but I have not gaul
enough for such a work; but as a general rule of caution to those
who would not be tricked out of their estates by such pretenders to
new inventions, let them observe that all such people who may be
suspected of design have assuredly this in their proposal: your
money to the author must go before the experiment. And here I could
give a very diverting history of a patent-monger whose cully was
nobody but myself, but I refer it to another occasion.

This story (as well as a general warning against taking patent statistics as a good measure of societal inventiveness) from Christine MacLeod’s “The 1690s Patents Boom:  Invention or Stock-Jobbing?”  (The Economic History Review, New Series, vol. 39, No. 4 (1986)), available on JSTOR if you have it.

Tagged , , ,

Should we have a Reckoner General?

The United States has a Surgeon General, who serves as the public face of the government in matters concerning public health.  Should there be a single person who serves as a national spokesperson for quantitative issues?

What is the mathematical analogue of the warning on the cigarette package?



%d bloggers like this: