Andrew Bridy, Lalit Jain, Ben Recht and I spent the weekend in Cambridge at Music Hack Day, organized by the Echo Nest and sponsored by just about every company you can think of that cares about both music and technology. We hacked in a somewhat different spirit than most of the folks there; for us, the Million Song Dataset isn’t a tool for app-building, but a playground where we can test ideas about massive networks and information retrieval.
(Re app-building: Bohemian Rhapsicord. Chrome-only.)
We’ve actually been playing with the MSD for a few weeks, and I’ll probably post some of those results later, but let’s start with what we did this weekend. We wanted to see what aspects of the rules of melody we could find in the dataset. Which notes like to follow which other notes? Which chords like to follow which other chords? If you took piano lessons as a kid you already know the answers to these questions. Which is kind of the point! When you start to dig into a giant dataset, the first thing you’d better do is check that it can tell you the things you already know.
We quickly found out that getting a handle on the melodies wasn’t so easy. The song files in the MSD aren’t transcribed from scores, and they don’t have notes: there’s pitch data, but it’s in the form of chromata; these keep good track of how the energy of a song segment is distributed across frequency bands, but they don’t necessarily correspond well to notes. (For instance, what does the chroma of a drum hit sound like?) We found that only about 2% of the songs in the sample had chromata that were “clean” enough to let us infer notes.
But here’s the good thing about a million — 2% of a million is still a lot! Actually, to save time, we only analyzed about 100,000 songs — but that still gave us a couple of thousand songs’ worth of chroma to work with. We threw out all the songs Echo Nest thought were in minor keys, and transposed everything to C. Then we put all the bigrams, or pairs of successive notes, in a big bag, and computed the frequency of each one in the sample. And this is what we saw:
Pretty nice, right? The size of the circle represents the frequency of the note. C (the tonic) and G (the dominant) are the most common notes, just as they should be. And the notes that are actually in the C-major scale are noticably more frequent than those that aren’t. The arrow from note x to note y represents the probability that the note following an x will be y; the thicker and redder the arrow, the greater the transition probability. These, too, look just as they should. The biggest red arrow is the one from B to C, which is because a major seventh (correction from commenter: a leading tone) really wants to resolve to tonic. And the strong “Louie Louie” clique joining C,F, and G is plain to see.
Once you have these numbers, you can start to play around. Lalit wrote a program that generated notes by random-walking along the graph above: the resulting “song” sounds kind of OK! You can hear it at the end of our 2-minute presentation:
Once you have this computation, you can do all kinds of fun things. For example, which songs in the database have the most “unusual” melodies from the point of view of this transition matrix? It turns out that many of the top scorers are indeed songs whose key Echo Nest has misclassified, or which are in keys (like blues scale) that Echo Nest doesn’t recognize. There’s also a lot of stuff like this:
Not exactly “Louie Louie.” Low scorers often sound like this Spiritualized song, with big dynamic shifts but not much tonal stray from the old I-IV-V (and in this case, I think it’s mostly the big red I-V)
A relevant paper: “Clustering beat-chroma patterns in a large music database,” by Thierry Bertin-Mahieux, Ron Weiss, and Daniel Ellis.
Here I am talking linear algebra with Vladimir Viro, who built the amazing Music N-gram Viewer.
DSC_0179 by thomasbonte, on Flickr
Note our team slogan, a bit hard to read on a slant: “DO THE STUPIDEST THING FIRST.”