I really like persistent homology; it’s a very beautiful idea, a way to look for structure in data when you really don’t have any principled way to embed it in Euclidean space (or, even when it does come embedded in Euclidean space, to find the kind of structure that doesn’t depend too much on the embedding.)

But because I like it, I want to see it done well, so I have some minor complaints!

Complaint one: Persistent homology, applied to H_0 only, is clustering, and we know a lot about clustering already. (**Update:** As commenters point out, this is really only so for persistent homology computed on the Vietoris-Rips complex of a point cloud, the “classical case…”!) Not to say that the ideas of persistence can’t be useful here at all (I have some ideas about directed graphs I want to eventually work out) but my sense is that people are not craving new clustering algorithms. I really like the work that tries to grapple with the topology of the data in its fullness; I was really charmed, for instance, by Ezra Miller’s piece about the persistent homology of fruit fly wings. (There’s a lot of nice stuff about geometric probability theory, too — e.g., how do you take the “average” of a bunch of graded modules for k[x,y], which you may think of as noisy measurements of some true module you want to estimate?)

My second complaint is the lack of understanding of the null hypothesis. You have some point cloud, you make a barcode, you see some bars that look long, you say they’re features — but why are you so sure? How long would bars be under the null hypothesis that the data has no topological structure at all? You kind of have to know this in order to do good inference. Laura Balzano and I did a little numerical investigation of this years ago but now Omer Bobrowski, Matthew Kahle, and Primoz Skraba have proved a theorem! (Kahle’s cool work in probabilistic topology has appeared several times before on Quomodocumque…)

They show that if you sample points from a uniform Poisson process on the unit cube of intensity n (i.e. you expect n points) the longest bar in the H_k barcode has

(death radius / birth radius) ~ [(log n)/(log log n)]^(1/k).

That is really short! And it makes me feel like there actually is something going on, when you see a long barcode in practice.

### Like this:

Like Loading...