Visible to the public Biblio

Filters: Keyword is audio signal processing  [Clear All Filters]
2021-01-20
Mehmood, Z., Qazi, K. Ashfaq, Tahir, M., Yousaf, R. Muhammad, Sardaraz, M..  2020.  Potential Barriers to Music Fingerprinting Algorithms in the Presence of Background Noise. 2020 6th Conference on Data Science and Machine Learning Applications (CDMA). :25—30.

An acoustic fingerprint is a condensed and powerful digital signature of an audio signal which is used for audio sample identification. A fingerprint is the pattern of a voice or audio sample. A large number of algorithms have been developed for generating such acoustic fingerprints. These algorithms facilitate systems that perform song searching, song identification, and song duplication detection. In this study, a comprehensive and powerful survey of already developed algorithms is conducted. Four major music fingerprinting algorithms are evaluated for identifying and analyzing the potential hurdles that can affect their results. Since the background and environmental noise reduces the efficiency of music fingerprinting algorithms, behavioral analysis of fingerprinting algorithms is performed using audio samples of different languages and under different environmental conditions. The results of music fingerprint classification are more successful when deep learning techniques for classification are used. The testing of the acoustic feature modeling and music fingerprinting algorithms is performed using the standard dataset of iKala, MusicBrainz and MIR-1K.

Zarazaga, P. P., Bäckström, T., Sigg, S..  2020.  Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks. IEEE Access. 8:166083—166094.

Voice user interfaces can offer intuitive interaction with our devices, but the usability and audio quality could be further improved if multiple devices could collaborate to provide a distributed voice user interface. To ensure that users' voices are not shared with unauthorized devices, it is however necessary to design an access management system that adapts to the users' needs. Prior work has demonstrated that a combination of audio fingerprinting and fuzzy cryptography yields a robust pairing of devices without sharing the information that they record. However, the robustness of these systems is partially based on the extensive duration of the recordings that are required to obtain the fingerprint. This paper analyzes methods for robust generation of acoustic fingerprints in short periods of time to enable the responsive pairing of devices according to changes in the acoustic scenery and can be integrated into other typical speech processing tools.

2020-12-11
Hassan, S. U., Khan, M. Zeeshan, Khan, M. U. Ghani, Saleem, S..  2019.  Robust Sound Classification for Surveillance using Time Frequency Audio Features. 2019 International Conference on Communication Technologies (ComTech). :13—18.

Over the years, technology has reformed the perception of the world related to security concerns. To tackle security problems, we proposed a system capable of detecting security alerts. System encompass audio events that occur as an outlier against background of unusual activity. This ambiguous behaviour can be handled by auditory classification. In this paper, we have discussed two techniques of extracting features from sound data including: time-based and signal based features. In first technique, we preserve time-series nature of sound, while in other signal characteristics are focused. Convolution neural network is applied for categorization of sound. Major aim of research is security challenges, so we have generated data related to surveillance in addition to available datasets such as UrbanSound 8k and ESC-50 datasets. We have achieved 94.6% accuracy for proposed methodology based on self-generated dataset. Improved accuracy on locally prepared dataset demonstrates novelty in research.

2020-08-10
Akdeniz, Fulya, Becerikli, Yaşar.  2019.  Performance Comparison of Support Vector Machine, K-Nearest-Neighbor, Artificial Neural Networks, and Recurrent Neural networks in Gender Recognition from Voice Signals. 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). :1–4.
Nowadays, biometric data is the most common used data in the field of security. Audio signals are one of these biometric data. Voice signals have used frequently in cases such as identification, banking systems, and forensic cases solution. The aim of this study is to determine the gender of voice signals. In the study, many different methods were used to determine the gender of voice signals. Firstly, Mel Frequency kepstrum coefficients were used to extract the feature from the audio signal. Subsequently, these attributes were classified with support vector machines, k-nearest neighborhood method and artificial neural networks. At the other stage of the study, it is aimed to determine gender from audio signals without using feature extraction method. For this, recurrent neural networks (RNN) was used. The performance analyzes of the methods used were made and the results were given. The best accuracy, precision, recall, f-score in the study has found to be 87.04%, 86.32%, 88.58%, 87.43% using K-Nearest-Neighbor algorithm.
2020-08-03
Al-Emadi, Sara, Al-Ali, Abdulla, Mohammad, Amr, Al-Ali, Abdulaziz.  2019.  Audio Based Drone Detection and Identification using Deep Learning. 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC). :459–464.
In recent years, unmanned aerial vehicles (UAVs) have become increasingly accessible to the public due to their high availability with affordable prices while being equipped with better technology. However, this raises a great concern from both the cyber and physical security perspectives since UAVs can be utilized for malicious activities in order to exploit vulnerabilities by spying on private properties, critical areas or to carry dangerous objects such as explosives which makes them a great threat to the society. Drone identification is considered the first step in a multi-procedural process in securing physical infrastructure against this threat. In this paper, we present drone detection and identification methods using deep learning techniques such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Convolutional Recurrent Neural Network (CRNN). These algorithms will be utilized to exploit the unique acoustic fingerprints of the flying drones in order to detect and identify them. We propose a comparison between the performance of different neural networks based on our dataset which features audio recorded samples of drone activities. The major contribution of our work is to validate the usage of these methodologies of drone detection and identification in real life scenarios and to provide a robust comparison of the performance between different deep neural network algorithms for this application. In addition, we are releasing the dataset of drone audio clips for the research community for further analysis.
Zarazaga, Pablo Pérez, B¨ackström, Tom, Sigg, Stephan.  2019.  Robust and Responsive Acoustic Pairing of Devices Using Decorrelating Time-Frequency Modelling. 2019 27th European Signal Processing Conference (EUSIPCO). :1–5.
Voice user interfaces have increased in popularity, as they enable natural interaction with different applications using one's voice. To improve their usability and audio quality, several devices could interact to provide a unified voice user interface. However, with devices cooperating and sharing voice-related information, user privacy may be at risk. Therefore, access management rules that preserve user privacy are important. State-of-the-art methods for acoustic pairing of devices provide fingerprinting based on the time-frequency representation of the acoustic signal and error-correction. We propose to use such acoustic fingerprinting to authorise devices which are acoustically close. We aim to obtain fingerprints of ambient audio adapted to the requirements of voice user interfaces. Our experiments show that the responsiveness and robustness is improved by combining overlapping windows and decorrelating transforms.
2019-11-25
Sathiyamurthi, P, Ramakrishnan, S, Shobika, S, Subashri, N, Prakavi, M.  2018.  Speech and Audio Cryptography System using Chaotic Mapping and Modified Euler's System. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). :606–611.
Security often requires that the data must be kept safe from unauthorized access. And the best line of speech communication is security. However, most computers are interconnected with each other openly, thereby exposing them and the communication channels that person uses. Speech cryptography secures information by protecting its confidentiality. It can also be used to protect information about the integrity and authenticity of data. Stronger cryptographic techniques are needed to ensure the integrity of data stored on a machine that may be infected or under attack. So far speech cryptography is used in many forms but using it with Audio file is another stronger technique. The process of cryptography happens with audio file for transferring more secure sensitive data. The audio file is encrypted and decrypted by using Lorenz 3D mapping and then 3D mapping function is converted into 2D mapping function by using euler's numerical resolution and strong algorithm provided by using henon mapping and then decrypted by using reverse of encryption. By implementing this, the resultant audio file will be in secured form.
2019-01-21
Lu, L., Yu, J., Chen, Y., Liu, H., Zhu, Y., Liu, Y., Li, M..  2018.  LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. :1466–1474.

To prevent users' privacy from leakage, more and more mobile devices employ biometric-based authentication approaches, such as fingerprint, face recognition, voiceprint authentications, etc., to enhance the privacy protection. However, these approaches are vulnerable to replay attacks. Although state-of-art solutions utilize liveness verification to combat the attacks, existing approaches are sensitive to ambient environments, such as ambient lights and surrounding audible noises. Towards this end, we explore liveness verification of user authentication leveraging users' lip movements, which are robust to noisy environments. In this paper, we propose a lip reading-based user authentication system, LipPass, which extracts unique behavioral characteristics of users' speaking lips leveraging build-in audio devices on smartphones for user authentication. We first investigate Doppler profiles of acoustic signals caused by users' speaking lips, and find that there are unique lip movement patterns for different individuals. To characterize the lip movements, we propose a deep learning-based method to extract efficient features from Doppler profiles, and employ Support Vector Machine and Support Vector Domain Description to construct binary classifiers and spoofer detectors for user identification and spoofer detection, respectively. Afterwards, we develop a binary tree-based authentication approach to accurately identify each individual leveraging these binary classifiers and spoofer detectors with respect to registered users. Through extensive experiments involving 48 volunteers in four real environments, LipPass can achieve 90.21% accuracy in user identification and 93.1% accuracy in spoofer detection.

Yao, S., Niu, B., Liu, J..  2018.  Enhancing Sampling and Counting Method for Audio Retrieval with Time-Stretch Resistance. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1–5.

An ideal audio retrieval method should be not only highly efficient in identifying an audio track from a massive audio dataset, but also robust to any distortion. Unfortunately, none of the audio retrieval methods is robust to all types of distortions. An audio retrieval method has to do with both the audio fingerprint and the strategy, especially how they are combined. We argue that the Sampling and Counting Method (SC), a state-of-the-art audio retrieval method, would be promising towards an ideal audio retrieval method, if we could make it robust to time-stretch and pitch-stretch. Towards this objective, this paper proposes a turning point alignment method to enhance SC with resistance to time-stretch, which makes Philips and Philips-like fingerprints resist to time-stretch. Experimental results show that our approach can resist to time-stretch from 70% to 130%, which is on a par to the state-of-the-art methods. It also marginally improves the retrieval performance with various noise distortions.

2019-01-16
Shrestha, P., Shrestha, B., Saxena, N..  2018.  Home Alone: The Insider Threat of Unattended Wearables and A Defense using Audio Proximity. 2018 IEEE Conference on Communications and Network Security (CNS). :1–9.

In this paper, we highlight and study the threat arising from the unattended wearable devices pre-paired with a smartphone over a wireless communication medium. Most users may not lock their wearables due to their small form factor, and may strip themselves off of these devices often, leaving or forgetting them unattended while away from homes (or shared office spaces). An “insider” attacker (potentially a disgruntled friend, roommate, colleague, or even a spouse) can therefore get hold of the wearable, take it near the user's phone (i.e., within radio communication range) at another location (e.g., user's office), and surreptitiously use it across physical barriers for various nefarious purposes, including pulling and learning sensitive information from the phone (such as messages, photos or emails), and pushing sensitive commands to the phone (such as making phone calls, sending text messages and taking pictures). The attacker can then safely restore the wearable, wait for it to be left unattended again and may repeat the process for maximum impact, while the victim remains completely oblivious to the ongoing attack activity. This malicious behavior is in sharp contrast to the threat of stolen wearables where the victim would unpair the wearable as soon as the theft is detected. Considering the severity of this threat, we also respond by building a defense based on audio proximity, which limits the wearable to interface with the phone only when it can pick up on an active audio challenge produced by the phone.

2018-11-19
Grinstein, E., Duong, N. Q. K., Ozerov, A., Pérez, P..  2018.  Audio Style Transfer. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :586–590.

``Style transfer'' among images has recently emerged as a very active research topic, fuelled by the power of convolution neural networks (CNNs), and has become fast a very popular technology in social media. This paper investigates the analogous problem in the audio domain: How to transfer the style of a reference audio signal to a target audio content? We propose a flexible framework for the task, which uses a sound texture model to extract statistics characterizing the reference audio style, followed by an optimization-based audio texture synthesis to modify the target content. In contrast to mainstream optimization-based visual transfer method, the proposed process is initialized by the target content instead of random noise and the optimized loss is only about texture, not structure. These differences proved key for audio style transfer in our experiments. In order to extract features of interest, we investigate different architectures, whether pre-trained on other tasks, as done in image style transfer, or engineered based on the human auditory system. Experimental results on different types of audio signal confirm the potential of the proposed approach.

2018-04-11
Liu, Rui, Rawassizadeh, Reza, Kotz, David.  2017.  Toward Accurate and Efficient Feature Selection for Speaker Recognition on Wearables. Proceedings of the 2017 Workshop on Wearable Systems and Applications. :41–46.

Due to the user-interface limitations of wearable devices, voice-based interfaces are becoming more common; speaker recognition may then address the authentication requirements of wearable applications. Wearable devices have small form factor, limited energy budget and limited computational capacity. In this paper, we examine the challenge of computing speaker recognition on small wearable platforms, and specifically, reducing resource use (energy use, response time) by trimming the input through careful feature selections. For our experiments, we analyze four different feature-selection algorithms and three different feature sets for speaker identification and speaker verification. Our results show that Principal Component Analysis (PCA) with frequency-domain features had the highest accuracy, Pearson Correlation (PC) with time-domain features had the lowest energy use, and recursive feature elimination (RFE) with frequency-domain features had the least latency. Our results can guide developers to choose feature sets and configurations for speaker-authentication algorithms on wearable platforms.

2015-05-06
Shimauchi, S., Ohmuro, H..  2014.  Accurate adaptive filtering in square-root Hann windowed short-time fourier transform domain. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :1305-1309.

A novel short-time Fourier transform (STFT) domain adaptive filtering scheme is proposed that can be easily combined with nonlinear post filters such as residual echo or noise reduction in acoustic echo cancellation. Unlike normal STFT subband adaptive filters, which suffers from aliasing artifacts due to its poor prototype filter, our scheme achieves good accuracy by exploiting the relationship between the linear convolution and the poor prototype filter, i.e., the STFT window function. The effectiveness of our scheme was confirmed through the results of simulations conducted to compare it with conventional methods.

2015-05-04
Coover, B., Jinyu Han.  2014.  A Power Mask based audio fingerprint. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :1394-1398.

The Philips audio fingerprint[1] has been used for years, but its robustness against external noise has not been studied accurately. This paper shows the Philips fingerprint is noise resistant, and is capable of recognizing music that is corrupted by noise at a -4 to -7 dB signal to noise ratio. In addition, the drawbacks of the Philips fingerprint are addressed by utilizing a “Power Mask” in conjunction with the Philips fingerprint during the matching process. This Power Mask is a weight matrix given to the fingerprint bits, which allows mismatched bits to be penalized according to their relevance in the fingerprint. The effectiveness of the proposed fingerprint was evaluated by experiments using a database of 1030 songs and 1184 query files that were heavily corrupted by two types of noise at varying levels. Our experiments show the proposed method has significantly improved the noise resistance of the standard Philips fingerprint.

Jun-Yong Lee, Hyoung-Gook Kim.  2014.  Audio fingerprinting to identify TV commercial advertisement in real-noisy environment. Communications and Information Technologies (ISCIT), 2014 14th International Symposium on. :527-530.

This paper proposes a high-performance audio fingerprint extraction method for identifying TV commercial advertisement. In the proposed method, a salient audio peak pair fingerprints based on constant Q transform (CQT) are hashed and stored, to be efficiently compared to one another. Experimental results confirm that the proposed method is quite robust in different noise conditions and improves the accuracy of the audio fingerprinting system in real noisy environments.

Rafii, Z., Coover, B., Jinyu Han.  2014.  An audio fingerprinting system for live version identification using image processing techniques. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :644-648.

Suppose that you are at a music festival checking on an artist, and you would like to quickly know about the song that is being played (e.g., title, lyrics, album, etc.). If you have a smartphone, you could record a sample of the live performance and compare it against a database of existing recordings from the artist. Services such as Shazam or SoundHound will not work here, as this is not the typical framework for audio fingerprinting or query-by-humming systems, as a live performance is neither identical to its studio version (e.g., variations in instrumentation, key, tempo, etc.) nor it is a hummed or sung melody. We propose an audio fingerprinting system that can deal with live version identification by using image processing techniques. Compact fingerprints are derived using a log-frequency spectrogram and an adaptive thresholding method, and template matching is performed using the Hamming similarity and the Hough Transform.

Alias T, E., Naveen, N., Mathew, D..  2014.  A Novel Acoustic Fingerprint Method for Audio Signal Pattern Detection. Advances in Computing and Communications (ICACC), 2014 Fourth International Conference on. :64-68.

This paper presents a novel and efficient audio signal recognition algorithm with limited computational complexity. As the audio recognition system will be used in real world environment where background noises are high, conventional speech recognition techniques are not directly applicable, since they have a poor performance in these environments. So here, we introduce a new audio recognition algorithm which is optimized for mechanical sounds such as car horn, telephone ring etc. This is a hybrid time-frequency approach which makes use of acoustic fingerprint for the recognition of audio signal patterns. The limited computational complexity is achieved through efficient usage of both time domain and frequency domain in two different processing phases, detection and recognition respectively. And the transition between these two phases is carried out through a finite state machine(FSM)model. Simulation results shows that the algorithm effectively recognizes audio signals within a noisy environment.

Moussallam, M., Daudet, L..  2014.  A general framework for dictionary based audio fingerprinting. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :3077-3081.

Fingerprint-based Audio recognition system must address concurrent objectives. Indeed, fingerprints must be both robust to distortions and discriminative while their dimension must remain to allow fast comparison. This paper proposes to restate these objectives as a penalized sparse representation problem. On top of this dictionary-based approach, we propose a structured sparsity model in the form of a probabilistic distribution for the sparse support. A practical suboptimal greedy algorithm is then presented and evaluated on robustness and recognition tasks. We show that some existing methods can be seen as particular cases of this algorithm and that the general framework allows to reach other points of a Pareto-like continuum.