Visible to the public Biblio

Filters: Keyword is Audio fingerprinting  [Clear All Filters]
2021-01-20
Zarazaga, P. P., Bäckström, T., Sigg, S..  2020.  Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks. IEEE Access. 8:166083—166094.

Voice user interfaces can offer intuitive interaction with our devices, but the usability and audio quality could be further improved if multiple devices could collaborate to provide a distributed voice user interface. To ensure that users' voices are not shared with unauthorized devices, it is however necessary to design an access management system that adapts to the users' needs. Prior work has demonstrated that a combination of audio fingerprinting and fuzzy cryptography yields a robust pairing of devices without sharing the information that they record. However, the robustness of these systems is partially based on the extensive duration of the recordings that are required to obtain the fingerprint. This paper analyzes methods for robust generation of acoustic fingerprints in short periods of time to enable the responsive pairing of devices according to changes in the acoustic scenery and can be integrated into other typical speech processing tools.

2015-05-04
Jun-Yong Lee, Hyoung-Gook Kim.  2014.  Audio fingerprinting to identify TV commercial advertisement in real-noisy environment. Communications and Information Technologies (ISCIT), 2014 14th International Symposium on. :527-530.

This paper proposes a high-performance audio fingerprint extraction method for identifying TV commercial advertisement. In the proposed method, a salient audio peak pair fingerprints based on constant Q transform (CQT) are hashed and stored, to be efficiently compared to one another. Experimental results confirm that the proposed method is quite robust in different noise conditions and improves the accuracy of the audio fingerprinting system in real noisy environments.

Rafii, Z., Coover, B., Jinyu Han.  2014.  An audio fingerprinting system for live version identification using image processing techniques. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :644-648.

Suppose that you are at a music festival checking on an artist, and you would like to quickly know about the song that is being played (e.g., title, lyrics, album, etc.). If you have a smartphone, you could record a sample of the live performance and compare it against a database of existing recordings from the artist. Services such as Shazam or SoundHound will not work here, as this is not the typical framework for audio fingerprinting or query-by-humming systems, as a live performance is neither identical to its studio version (e.g., variations in instrumentation, key, tempo, etc.) nor it is a hummed or sung melody. We propose an audio fingerprinting system that can deal with live version identification by using image processing techniques. Compact fingerprints are derived using a log-frequency spectrogram and an adaptive thresholding method, and template matching is performed using the Hamming similarity and the Hough Transform.

Ghatak, S., Lodh, A., Saha, E., Goyal, A., Das, A., Dutta, S..  2014.  Development of a keyboardless social networking website for visually impaired: SocialWeb. Global Humanitarian Technology Conference - South Asia Satellite (GHTC-SAS), 2014 IEEE. :232-236.

Over the past decade, we have witnessed a huge upsurge in social networking which continues to touch and transform our lives till present day. Social networks help us to communicate amongst our acquaintances and friends with whom we share similar interests on a common platform. Globally, there are more than 200 million visually impaired people. Visual impairment has many issues associated with it, but the one that stands out is the lack of accessibility to content for entertainment and socializing safely. This paper deals with the development of a keyboard less social networking website for visually impaired. The term keyboard less signifies minimum use of keyboard and allows the user to explore the contents of the website using assistive technologies like screen readers and speech to text (STT) conversion technologies which in turn provides a user friendly experience for the target audience. As soon as the user with minimal computer proficiency opens this website, with the help of screen reader, he/she identifies the username and password fields. The user speaks out his username and with the help of STT conversion (using Web Speech API), the username is entered. Then the control moves over to the password field and similarly, the password of the user is obtained and matched with the one saved in the website database. The concept of acoustic fingerprinting has been implemented for successfully validating the passwords of registered users and foiling intentions of malicious attackers. On successful match of the passwords, the user is able to enjoy the services of the website without any further hassle. Once the access obstacles associated to deal with social networking sites are successfully resolved and proper technologies are put to place, social networking sites can be a rewarding, fulfilling, and enjoyable experience for the visually impaired people.

Naini, R., Moulin, P..  2014.  Fingerprint information maximization for content identification. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :3809-3813.

This paper presents a novel design of content fingerprints based on maximization of the mutual information across the distortion channel. We use the information bottleneck method to optimize the filters and quantizers that generate these fingerprints. A greedy optimization scheme is used to select filters from a dictionary and allocate fingerprint bits. We test the performance of this method for audio fingerprinting and show substantial improvements over existing learning based fingerprints.

Moussallam, M., Daudet, L..  2014.  A general framework for dictionary based audio fingerprinting. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :3077-3081.

Fingerprint-based Audio recognition system must address concurrent objectives. Indeed, fingerprints must be both robust to distortions and discriminative while their dimension must remain to allow fast comparison. This paper proposes to restate these objectives as a penalized sparse representation problem. On top of this dictionary-based approach, we propose a structured sparsity model in the form of a probabilistic distribution for the sparse support. A practical suboptimal greedy algorithm is then presented and evaluated on robustness and recognition tasks. We show that some existing methods can be seen as particular cases of this algorithm and that the general framework allows to reach other points of a Pareto-like continuum.