Biblio
A new dataset is presented composed of music identification matches from Gracenote, a leading global music metadata company. Matches from January 1, 2014 to December 31, 2014 have been curated and made available as a public dataset called Gracenote Music Identification 2014, or GNMID14, at the following address: https://developer.gracenote.com/mid2014. This collection is the first significant music identification dataset and one of the largest music related datasets available containing more than 110M matches in 224 countries for 3M unique tracks, and 509K unique artists. It features geotemporal information (i.e. country and match date), genre and mood metadata. In this paper, we characterize the dataset and demonstrate its utility for Information Retrieval (IR) research.
This paper presents a novel design of content fingerprints based on maximization of the mutual information across the distortion channel. We use the information bottleneck method to optimize the filters and quantizers that generate these fingerprints. A greedy optimization scheme is used to select filters from a dictionary and allocate fingerprint bits. We test the performance of this method for audio fingerprinting and show substantial improvements over existing learning based fingerprints.