On data and music


Get the short URL

If I were a better musician, I’d be a musician.

Ever since I was young, I’ve loved music. And it’s changed so much since that time, in large part because of technology. Music was once entirely original. There was no recording, no broadcast, and a troubadour could earn a living traveling from place to place.

With the advent of sheet music, that changed. Smart composers introduced unique signatures that weren’t on the written page—Mozart’s flourishes—to keep their live acts original. This is the first salvo in music’s battle of supply and demand, pitting artists’ livelihoods against frictionless distribution.

Once music was written, it could be automated (through steam calliopes) and eventually broadcast (through radio) and recorded (on vinyl.) Each of these innovations turned the idea of a song into data of some kind, separating the act of listening from the act of performing.

With each innovation came more data. Sheet music had annotation; stations had playlists; albums had performers’ names. And listening activity had to be measured, because music was commercial, supported by advertising. Early on, this measurement came in the form of listening diaries from companies like Arbitron, which gave advertisers and station managers an idea of their demographic reach.

But every facet of music has been changed by the move from analog to digital, from atoms to bits. Napster upended the entire music distribution industry almost overnight, and let the MP3 genie out of the DRM bottle. Streaming music faltered, stalled, and now seems poised to catch on

As a concrete example, consider Arbitron’s release of the Personal Performance Meter (PPM). This pager-sized device, released in 2008, listens for inaudible tones in broadcast radio. It replaced listening diaries with accurate data on who was listening to what. In doing so, it transformed the radio industry. Stations with a 50-percent market share crumbled as they learned they weren’t nearly as popular as they thought.

Far more importantly, though, because radio programmers now knew what songs made listeners stick around and what made them leave, programming changed. Before the PPM, a song in “heavy rotation” was played just over five times a day. By 2012, that song was played every fifty-five minutes.

Publishers initially raced to stamp out the spread of illegal music. Kevin Kelly says the Internet is a copying machine, and labels tried everything from encryption to malware to protect their content. Fast-forward a few years, and the best way to listen to free music is Youtube—a grossly inefficient tool, but one that’s proven harder to squash.

Music data is all around us. Every repost on Soundcloud is more information for artists and labels. Digital channels let composers share the building blocks of their tracks, letting anyone create remixes or change live performance.

We’re all making data now, too. Millions of people armed with smartphones, anyone can identify a song “in the wild” using tools like Shazam—which in turn tells Shazam, and the publishers, which songs will succeed. When Katy Perry and Lady Gaga released tracks in Europe at around the same time, Shazam knew within hours that Perry’s track would soar and Gaga’s wouldn’t be a global hit.

Music is also a barrier to entry. The iTunes library, and its associated functionality, is a huge reason people stay loyal to Apple’s platforms. It’s important to remember that iTunes (January 2001) predates the iPod (October, 2001) significantly. iTunes and its ilk are tools for managing metadata; and yet most music collections are messy enough to drive data scientists to distraction, despite the efforts of applications like Tuneup or Mediabrainz.

Music services tag information differently, too. Pandora’s “music genome” uses attributes of the music to describe it—letting you find similar tracks easily, but requiring more classification effort. On the other hand, Gracenote doesn’t know what the music sounds like, just whether it’s copied from somewhere.

I’m going to spend part of 2015 trying to understand how data is changing the music industry, and putting it into a report for O’Reilly Media as part of my work with them on Data. I’ll be talking to labels, startups, producers, and tool makers, and charting the course of where data-driven music is headed. The ideas are new right now, and it’ll be interesting to make sense of them. But it’s sobering to think, when we listen to Spotify, how far we’ve come from Mozart’s early attempts to keep music personal.