Biblio
This paper proposes a high-performance audio fingerprint extraction method for identifying TV commercial advertisement. In the proposed method, a salient audio peak pair fingerprints based on constant Q transform (CQT) are hashed and stored, to be efficiently compared to one another. Experimental results confirm that the proposed method is quite robust in different noise conditions and improves the accuracy of the audio fingerprinting system in real noisy environments.
Suppose that you are at a music festival checking on an artist, and you would like to quickly know about the song that is being played (e.g., title, lyrics, album, etc.). If you have a smartphone, you could record a sample of the live performance and compare it against a database of existing recordings from the artist. Services such as Shazam or SoundHound will not work here, as this is not the typical framework for audio fingerprinting or query-by-humming systems, as a live performance is neither identical to its studio version (e.g., variations in instrumentation, key, tempo, etc.) nor it is a hummed or sung melody. We propose an audio fingerprinting system that can deal with live version identification by using image processing techniques. Compact fingerprints are derived using a log-frequency spectrogram and an adaptive thresholding method, and template matching is performed using the Hamming similarity and the Hough Transform.