## Tom Leinster on entropy, diversity, and cardinality

You might want to consider reading the n-category cafe even if you don’t know what an n-category is — even if, antique as this view may be, you don’t care what an n-category is!

For instance, it’s the best place to read about the curious case of M. El Naschie, who’s published 322 of his own papers in the journal he edits for Elsevier.

More substantively: Tom Leinster has a beautiful pair of posts (part I, part II) about varying notions of “diversity” in population biology, and a way to capture all these notions as special cases of a general mathematical construction.

Drastic oversimplification: you might start by defining the diversity of an island beetle population to be the number of different species of beetles living there. But that misses something — a population with three equinumerous beetle species is more diverse than one where a single dominant species accounts for 98% of the beetles, with the remainder split evenly between the other two species. Part I of Leinster’s post is devoted to various measures that capture this behavior. In particular, he’ll explain why on the former island the effective number of species is 3 (just as you’d expect) while on the latter the “number” of species is not 3, but about 1.12 — in other words, the second island is very close to having just one kind of beetle.

In part II, Leinster discusses what happens when you take into account that some pairs of species are more similar than others.

An island with three species of beetles and no other animal life ought to be rated less diverse than an island with one kind of mammal, one kind of beetle, and one kind of bird. The set of species can be thought of as a finite metric space, in which the distance between any two species is a measure of the biological “difference” between them. Leinster has developed an interesting notion of the “cardinality” of a metric space. Here’s how it works when the metric space has just two points, separated by distance d:

If A has two points, distance d apart, then

∣A∣ =1+tanh(d).
So when d=0 we have ∣A∣=1; think of this as saying ‘there
is effectively only one point’. As the points move further apart,
∣A| increases, until when d=∞ we have ∣A∣ =2; think of this as saying
‘there are two completely distinguished points’. More generally, if A is an n-point space with d_ij=∞ whenever i≠j, then ∣A∣ MS=n. You can think of cardinality as the ‘effective number of points’.

The real fun comes when we try to write down a measure of diversity that keeps track of both of the above features; that some pairs of species are more similar than others, and some species occur more frequently than others. In other words, if P is a probability distribution on a finite metric space, is there a good notion of “cardinality” |P|? Leinster explains how to construct such a notion; and reveals that this (almost) agrees with a definition already given by the population biologists Ricotta and Szeidl in 2006.

The notion of cardinality of a metric space, by the way, is an outgrowth of Leinster’s earlier work defining a cardinality (or, better, an Euler characteristic) for certain categories. This is something you’ve grappled with before, even if you’re not a category theorist. For example: suppose you want to count the isomorphism classes of quadratic forms in a given genus. You quickly realize that’s not actually what you want to count — rather, you want to count the number of isomorphism classes of quadratic forms Q, where each form is counted with weight 1/|Aut(Q)|. And when you do that, you get a very clean answer from the Siegel mass formula.

What’s going on? The point is that the quadratic forms in the genus shouldn’t be thought of as a set, but as a category — in fact, a groupoid — whose objects are quadratic forms and whose morphisms are isomorphisms. (If you know the definition of a stack you know very well the terrible danger of thinking of a category as a mere set.) And the cardinality of this groupoid, in Leinster’s sense, is precisely the sum over isomorphism classes of 1/|Aut(Q)| — what number theorists would call the “total mass.”

From this point of view, you instantly derive the amusing fact that the “number” of finite sets is e, and that the a finite set chosen from the “uniform distribution” on the category of finite sets has cardinality n with probability e^(-1)/n!; in other words, the cardinality of a random finite set obeys the Poisson distribution with mean 1.

## 3 thoughts on “Tom Leinster on entropy, diversity, and cardinality”

1. Scott Carnahan says:

Another example of formulas working better when weighted by automorphisms is the number of supersingular elliptic curves over an algebraic closure of F_p. Silverman’s book has a messy unweighted formula, but by counting automorphisms, you get (p-1)/24. Bonus: if you know how the automorphism group depends on the j invariant, this formula uniquely determines the unweighted numbers, together with some j data.

2. JSE says:

This is the example I was originally going to use! It was in my mind since Michael Volpato just gave a beautiful seminar talk on this formula and its higher-dimensional variants here at Wisconsin. But I thought the quadratic forms example would be more familiar to more people.

3. […] if it carries no extra structure, should have cardinality obeying a Poisson distribution — the “uniform distribution” on the groupoid of sets.  (Though actually that uniform distribution is Poisson(1); I wonder what tweak is necessary to be […]