I really like persistent homology; it’s a very beautiful idea, a way to look for structure in data when you really don’t have any principled way to embed it in Euclidean space (or, even when it does come embedded in Euclidean space, to find the kind of structure that doesn’t depend too much on the embedding.)
But because I like it, I want to see it done well, so I have some minor complaints!
Complaint one: Persistent homology, applied to H_0 only, is clustering, and we know a lot about clustering already. (Update: As commenters point out, this is really only so for persistent homology computed on the Vietoris-Rips complex of a point cloud, the “classical case…”!) Not to say that the ideas of persistence can’t be useful here at all (I have some ideas about directed graphs I want to eventually work out) but my sense is that people are not craving new clustering algorithms. I really like the work that tries to grapple with the topology of the data in its fullness; I was really charmed, for instance, by Ezra Miller’s piece about the persistent homology of fruit fly wings. (There’s a lot of nice stuff about geometric probability theory, too — e.g., how do you take the “average” of a bunch of graded modules for k[x,y], which you may think of as noisy measurements of some true module you want to estimate?)
My second complaint is the lack of understanding of the null hypothesis. You have some point cloud, you make a barcode, you see some bars that look long, you say they’re features — but why are you so sure? How long would bars be under the null hypothesis that the data has no topological structure at all? You kind of have to know this in order to do good inference. Laura Balzano and I did a little numerical investigation of this years ago but now Omer Bobrowski, Matthew Kahle, and Primoz Skraba have proved a theorem! (Kahle’s cool work in probabilistic topology has appeared several times before on Quomodocumque…)
They show that if you sample points from a uniform Poisson process on the unit cube of intensity n (i.e. you expect n points) the longest bar in the H_k barcode has
(death radius / birth radius) ~ [(log n)/(log log n)]^(1/k).
That is really short! And it makes me feel like there actually is something going on, when you see a long barcode in practice.
One thing to keep in mind in this discussion (relevant both for the claim about H_0 barcodes being clustering and that the longest barcode is the key to inference) is that there are many ways of constructing a filtered simplicial complex, and hence barcode, from a data set. A common way of doing this is via the Vietoris-Rips complex, where one takes a point cloud and slowly expands the points into solid balls and builds a family of simplicial complexes encoding how and when the balls overlap. If you look at many papers on Topological Data Analysis (TDA) you will see this method, and it’s easy to get the false impression that this is the ONLY method, but really there are other interesting things people have done, even while sticking to one-dimensional persistence, and I suspect many more remain to be developed.
One useful example to consider is when data is not a discrete point cloud but instead some kind of solid blobby shape (pardon the imprecision). One can sweep across the data blob and build a simplicial complex approximating the chuck of the data uncovered as one sweeps. This is a terribly vague description, but the point is that as the sweep progresses, more data is uncovered and hence we have a nested family of simplicial complexes—and that’s really all one needs to do persistent homology/barcodes. Carlsson et al. applied this to handwriting recognition, and Ezra Miller et al. in (http://arxiv.org/abs/1411.6652) applies this to brain arteries which form a sort of 3-dimensional tree shape. As the latter paper nicely explains, here the H_0 barcode actually measures the windiness (or “tortuosity”) of the arteries, so really a novel geometric feature that is difficult to encompass using statistical notions, and something quite distinct from clustering. They then correlate the H_0 barcodes with the age of the person who’s brain was scanned and find a strong correlation, better in fact than previous statistical measures. One remarkable finding though is that this correlation was mostly strongly manifest not by the longest bar in the barcode, but by something like the 18th longest bar! Seems perverse at first, but as the authors explain, what happens is the longest bars somehow represent structure common to nearly all human brains (evidently), and the shortest bars are indeed noise of some kind, whereas the middle-length bars are the statistical goldilocks—they have the most potential to differentiate brain scans based on the relevant variables, such as age and gender in this case.
So the summary/moral is that TDA is much more than Vietoris-Rips applied to point clouds—though how much more remains to be seen, much of the story is still very much being explored—and also that one must be a little cautious in developing a theory (or even bold claims) about the best way of applying statistical methods to barcodes, since “best” will be largely determined by how the barcode was produced. My suspicion is that the most applicability will be found when very particular structure in the data is sought and captured in the barcode, as opposed to purely general methods that apply to all data sets equally well. Perhaps there has been an influx of pure mathematicians to TDA quixotically hoping for an elegant tool that is universally applicable, as opposed to the ad hoc methods commonly seen in applied math—but I fear such naivety might be limiting the potential power of TDA (and souring some people on the field as over-hyped). I, however, find a rather beautiful symmetry in the observation that the statistics we need to develop for barcodes depends largely on the method used to produce the barcodes, while simultaneously the methods we develop to produce barcodes is (or should be) shaped largely by what leads to the most potent statistics. It’s either a frustrating catch 22 or a delightfully inspiring feedback loop, depending your outlook, I suppose :)
Isn’t this the same sort of problem that Steve Smale and collaborators have been investigating?
Click to access topology_and_data_2.pdf
Could you elaborate on similarities and differences? (Sorry, I’m not a topologist, or even a mathematician.)
I’m one of the authors on the brain arteries paper that Noah very nicely and clearly summarizes up there. I guess I might muddy the waters a bit with three comments:
1) we actually used persistent H_1 as well (Loosely: magine the artery trees thickening up within the skull and keep track of persistent H_1 during this process. Precisely: subsample the trees and get a set of points, and then run 1-dim Rips on this, and appeal to the gods of stability theorems to say you’re doing the right thing). We found even stronger age correlations there (as well as a significant gender different). As was the case with H_0, it was the middle-length-range bars that did the trick. I would like to loudly and clearly proclaim that we have no particular notions why this is so!
2) In general, I think a very productive analytical pipeline is: take populations of filtered objects with some sort of ground-truth labels, compute persistence diagrams, extract features (how? Lots of ways, all very different, and we need much more theory!), and then “do machine-learning.” The brain artery paper is one (small) example of this pipeline, I think. A bunch of other folks are exploring signal analysis applications where the filtered objects are signal snippets (more precisely, functions f: I \to R and you then filter I by sublevel sets of f). The features you get from zero-dim persistence on this seem to given really interesting conclusions, and the features are clearly very different than things you’d get from, say, fourier or wavelet analysis. I don’t way to say “better,” but I think “different” is clear, and I think using them in combination will bear a lot of fruit.
3) All that said, the worry about the Null Hypothesis is extremely valid! The paper by Bobrowksi/Kahle/Skraba, which is really great work, attacks the length of the longest bar for this very specific type of filtered object. On the other hand, many of the exciting TDA/ML combo-papers are finding stuff using shorter bar lengths and very different filtering paradigms, so the theory/practice gap is vast. Of course, that’s true of almost all engineering that gets done “in practice,” and so I’d say Noah’s second paragraph is very much on point.
These comments are great — in a hurry now but let me remark that the discussion of sublevel sets in Paul’s comment 2) above is a good example of a persistence argument on H_0 which is not “just clustering.” (Because, as Noah points out, there are ways to get persistence diagrams which don’t pass through Vietoris-Rips.)