Alastair thesis review

Kragen Javier Sitaker, 2013-05-17 (1 minute)

Reading Alastair Porter's master's thesis.

It provides a comprehensive overview of how current audio fingerprinting algorithms work, what they are used for (both for good and for evil), and a more detailed comparison of the open-source Echoprint and Chromaprint algorithms and an open-source Matlab implementation of the Landmark or Shazam algorithm.

It's a gold mine of references to things I didn't know about, or that I hadn't heard of before Alastair told me about them: mel-frequency cepstral coefficients, the freesound archive, the Shazam and Soundhound smartphone apps for song identification, the three open-source audio fingerprinting algorithms he evaluated, a list of other companies that do audio fingerprinting, the Fourier-Mellin transform, an adaptive FFT, Query By Humming, the Codaich dataset, the Acoustid and Echoprint servers, etc. Its overview of the history of audio fingerprinting for automated censorship in Napster, YouTube, and other applications may be shocking for those who aren't familiar with the situation.

The research involved making a fingerprint database of the thirty thousand two hundred eighty-three songs in the Codaich dataset and running twenty thousand queries (trying to match some audio fragment) against it. But they didn't just do that once; they did it many times with different fingerprinting algorithms and different kinds of degradation on the queries.

Topics