I’m talking tomorrow at the spanking-new SILO seminar (“Systems, Learning, Information, and Optimization”) in the spanking-new Wisconsin Institute for Discovery. As you may know, I don’t do any research in any of the four subjects appearing in the acronym! But I’ve been following some work in the area ever since last year’s very successful Topology and Data seminar. Here’s one semi-baked idea that will appear in the talk.

**World’s most hurried introduction to compressed sensing **(see Terry’s blog for a more thorough account.)

Think of an image on an N x N grid of pixels as a vector in R^(N^2). We know that images can be *compressed* — that is, we can specify an image with reasonable accuracy using much less information. That’s because most real-world images lie on or near a manifold in R^(N^2) of dimension much lower than N^2.

It’s already great that we can get away with fewer than N^2 coordinates to specify our image. But it gets better still. Maybe it’s expensive to record all N^2 pixels. If we’re want to end up with (say) only N log N numbers, maybe we should try making only N log N measurements of our image in the first place? If you just record a subset of N log N pixels, you’re probably in trouble; there may be localized features in the image that happen to be far from any of the pixels in your sparse set. But if you measure N log N *random* linear functions on R^(N^2), it turns out that you’re very likely to have pinned down a single point on the manifold of interesting images!

**Finite metric spaces**

Suppose you’ve got a set S of objects, and a function d: S x S -> R, which you think of as a *distance* or *dissimilarity* function. Could be texts, could be genomes, whatever. An arbitrary finite metric space is a hard thing to get one’s head around, so it’s desirable to construct an embedding f: S -> R^k such that the Euclidean distance between f(x) and f(y) agrees as closely as possible with d(x,y). (Think of k as fixed and small.) The standard algorithm for this is called multidimensional scaling. It works well. End of story.

Except — note that once again, we’ve compressed our data set. It takes N^2 coordinates to specify a distance function, but only kN to specify an embedding. So you can think of MDS as a compression algorithm for metric spaces! And why does it work? Because the set of N-point metrics that embed in R^k is a very low-dimensional submanifold of the N^2-dimensional space of general metrics.

So the example of compressed sensing leads us to ask — if you are willing to grant that your metric is approximately embeddable in R^k, can you reconstruct it from a sparse set of distances? The answer is yes — you can do this with “Landmark MDS,” developed by Vin de Silva and Josh Tenenbaum, and, just as one might want, one can reconstruct the metric from a number of distances that’s linear in N. I want to call this “Compressed MDS” — is that appropriate? Will a randomly chosen set of cN distances specify the embedding with high probability? Seems like it to me.

Multidimensional scaling is just one example of a *metric embedding problem* — you have a finite set S and a fixed metric space M (or, sometimes, a class of metrics from which M is to be drawn) and you want to find an embedding f: S -> M which respects distance as well as possible. (This survey by Indyk and Matousek is a good resource for the general problem; and here’s a Newton Institute conference on metric embeddings from two weeks ago, with all the talks online for streaming.) Of special interest is the case where M is required to be an ultrametric; to embed S in an ultrametric space is to identify S with the leaves of a weighted tree, which is more or less to impose a heirarchical clustering on S. The space of ultrametrics on S is again of dimension much smaller than N^2, so, the heirarchical construction can be thought of as a compressed representation of the distance function d. And again, one asks — can we reconstruct the ultrametric from a sparse set of distances?

Yes — a forthcoming paper of Eriksson, Dasarathy, Singh, and Nowak explains how. N log N distances is enough. (Update, Feb 2011: now on the arXiv.)

Can one call this “compressed clustering”?