Visible to the public Biblio

Filters: Keyword is Acoustic signal processing  [Clear All Filters]
2021-01-20
Li, M., Chang, H., Xiang, Y., An, D..  2020.  A Novel Anti-Collusion Audio Fingerprinting Scheme Based on Fourier Coefficients Reversing. IEEE Signal Processing Letters. 27:1794—1798.

Most anti-collusion audio fingerprinting schemes are aiming at finding colluders from the illegal redistributed audio copies. However, the loss caused by the redistributed versions is inevitable. In this letter, a novel fingerprinting scheme is proposed to eliminate the motivation of collusion attack. The audio signal is transformed to the frequency domain by the Fourier transform, and the coefficients in frequency domain are reversed in different degrees according to the fingerprint sequence. Different from other fingerprinting schemes, the coefficients of the host media are excessively modified by the proposed method in order to reduce the quality of the colluded version significantly, but the imperceptibility is well preserved. Experiments show that the colluded audio cannot be reused because of the poor quality. In addition, the proposed method can also resist other common attacks. Various kinds of copyright risks and losses caused by the illegal redistribution are effectively avoided, which is significant for protecting the copyright of audio.

Mehmood, Z., Qazi, K. Ashfaq, Tahir, M., Yousaf, R. Muhammad, Sardaraz, M..  2020.  Potential Barriers to Music Fingerprinting Algorithms in the Presence of Background Noise. 2020 6th Conference on Data Science and Machine Learning Applications (CDMA). :25—30.

An acoustic fingerprint is a condensed and powerful digital signature of an audio signal which is used for audio sample identification. A fingerprint is the pattern of a voice or audio sample. A large number of algorithms have been developed for generating such acoustic fingerprints. These algorithms facilitate systems that perform song searching, song identification, and song duplication detection. In this study, a comprehensive and powerful survey of already developed algorithms is conducted. Four major music fingerprinting algorithms are evaluated for identifying and analyzing the potential hurdles that can affect their results. Since the background and environmental noise reduces the efficiency of music fingerprinting algorithms, behavioral analysis of fingerprinting algorithms is performed using audio samples of different languages and under different environmental conditions. The results of music fingerprint classification are more successful when deep learning techniques for classification are used. The testing of the acoustic feature modeling and music fingerprinting algorithms is performed using the standard dataset of iKala, MusicBrainz and MIR-1K.

Jiang, M., Lundgren, J., Pasha, S., Carratù, M., Liguori, C., Thungström, G..  2020.  Indoor Silent Object Localization using Ambient Acoustic Noise Fingerprinting. 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC). :1—6.

Indoor localization has been a popular research subject in recent years. Usually, object localization using sound involves devices on the objects, acquiring data from stationary sound sources, or by localizing the objects with external sensors when the object generates sounds. Indoor localization systems using microphones have traditionally also used systems with several microphones, setting the limitations on cost efficiency and required space for the systems. In this paper, the goal is to investigate whether it is possible for a stationary system to localize a silent object in a room, with only one microphone and ambient noise as information carrier. A subtraction method has been combined with a fingerprint technique, to define and distinguish the noise absorption characteristic of the silent object in the frequency domain for different object positions. The absorption characteristics of several positions of the object is taken as comparison references, serving as fingerprints of known positions for an object. With the experiment result, the tentative idea has been verified as feasible, and noise signal based lateral localization of silent objects can be achieved.

Zarazaga, P. P., Bäckström, T., Sigg, S..  2020.  Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks. IEEE Access. 8:166083—166094.

Voice user interfaces can offer intuitive interaction with our devices, but the usability and audio quality could be further improved if multiple devices could collaborate to provide a distributed voice user interface. To ensure that users' voices are not shared with unauthorized devices, it is however necessary to design an access management system that adapts to the users' needs. Prior work has demonstrated that a combination of audio fingerprinting and fuzzy cryptography yields a robust pairing of devices without sharing the information that they record. However, the robustness of these systems is partially based on the extensive duration of the recordings that are required to obtain the fingerprint. This paper analyzes methods for robust generation of acoustic fingerprints in short periods of time to enable the responsive pairing of devices according to changes in the acoustic scenery and can be integrated into other typical speech processing tools.

2020-12-21
Pialov, K., Slutsky, R., Maizel, A..  2020.  Coupled calculation of hydrodynamic and acoustic characteristics in the far-field of the ship propulsor. 2020 International Conference on Dynamics and Vibroacoustics of Machines (DVM). :1–6.
This report provides a calculation example of hydrodynamic and acoustic characteristics of the ship propulsor using numerical modelling with the help of RANS-models and eddy-resolving approaches in the hydrodynamics task, acoustic analogy in the acoustics tasks and harmonic analysis of the propulsor under hydrodynamic loads.
2020-12-11
Hassan, S. U., Khan, M. Zeeshan, Khan, M. U. Ghani, Saleem, S..  2019.  Robust Sound Classification for Surveillance using Time Frequency Audio Features. 2019 International Conference on Communication Technologies (ComTech). :13—18.

Over the years, technology has reformed the perception of the world related to security concerns. To tackle security problems, we proposed a system capable of detecting security alerts. System encompass audio events that occur as an outlier against background of unusual activity. This ambiguous behaviour can be handled by auditory classification. In this paper, we have discussed two techniques of extracting features from sound data including: time-based and signal based features. In first technique, we preserve time-series nature of sound, while in other signal characteristics are focused. Convolution neural network is applied for categorization of sound. Major aim of research is security challenges, so we have generated data related to surveillance in addition to available datasets such as UrbanSound 8k and ESC-50 datasets. We have achieved 94.6% accuracy for proposed methodology based on self-generated dataset. Improved accuracy on locally prepared dataset demonstrates novelty in research.

2020-08-03
Dai, Haipeng, Liu, Alex X., Li, Zeshui, Wang, Wei, Zhang, Fengmin, Dong, Chao.  2019.  Recognizing Driver Talking Direction in Running Vehicles with a Smartphone. 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). :10–18.
This paper addresses the fundamental problem of identifying driver talking directions using a single smartphone, which can help drivers by warning distraction of having conversations with passengers in a vehicle and enable safety enhancement. The basic idea of our system is to perform talking status and direction identification using two microphones on a smartphone. We first use the sound recorded by the two microphones to identify whether the driver is talking or not. If yes, we then extract the so-called channel fingerprint from the speech signal and classify it into one of three typical driver talking directions, namely, front, right and back, using a trained model obtained in advance. The key novelty of our scheme is the proposition of channel fingerprint which leverages the heavy multipath effects in the harsh in-vehicle environment and cancels the variability of human voice, both of which combine to invalidate traditional TDoA, DoA and fingerprint based sound source localization approaches. We conducted extensive experiments using two kinds of phones and two vehicles for four phone placements in three representative scenarios, and collected 23 hours voice data from 20 participants. The results show that our system can achieve 95.0% classification accuracy on average.
Zarazaga, Pablo Pérez, B¨ackström, Tom, Sigg, Stephan.  2019.  Robust and Responsive Acoustic Pairing of Devices Using Decorrelating Time-Frequency Modelling. 2019 27th European Signal Processing Conference (EUSIPCO). :1–5.
Voice user interfaces have increased in popularity, as they enable natural interaction with different applications using one's voice. To improve their usability and audio quality, several devices could interact to provide a unified voice user interface. However, with devices cooperating and sharing voice-related information, user privacy may be at risk. Therefore, access management rules that preserve user privacy are important. State-of-the-art methods for acoustic pairing of devices provide fingerprinting based on the time-frequency representation of the acoustic signal and error-correction. We propose to use such acoustic fingerprinting to authorise devices which are acoustically close. We aim to obtain fingerprints of ambient audio adapted to the requirements of voice user interfaces. Our experiments show that the responsiveness and robustness is improved by combining overlapping windows and decorrelating transforms.
Walczyński, Maciej, Ryba, Dagmara.  2019.  Effectiveness of the acoustic fingerprint in various acoustical environments. 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). :137–141.
In this article analysis of the effectiveness of the acoustic algorithm of the fingerprint in the conditions of various acoustic disturbances is presented and described. The described algorithm is stable and should identify music even in the presence of acoustic disturbances. This was checked in a series of tests in four different conditions: silence, street noise, noise from the railway station, noise from inside the moving car during rain. In the case of silence, 10 measurements were taken lasting 7 seconds each. For each of the remaining conditions, 21 attempts were made to identify the work. The capture time for each of the 21 trials was 7 seconds. Every 7 attempts were changed noise volume. Subsequently, they were disruptions at a volume lower than the volume of the intercepted song, another 7 with an altitude similar to the intercepted track, and the last with a much higher volume. The effectiveness of the algorithm was calculated for two different times, and general - for the average of two results. Base of "fingerprints" consisted of 20 previously analyzed music pieces belonging to different musical genres.
2019-01-21
Lu, L., Yu, J., Chen, Y., Liu, H., Zhu, Y., Liu, Y., Li, M..  2018.  LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. :1466–1474.

To prevent users' privacy from leakage, more and more mobile devices employ biometric-based authentication approaches, such as fingerprint, face recognition, voiceprint authentications, etc., to enhance the privacy protection. However, these approaches are vulnerable to replay attacks. Although state-of-art solutions utilize liveness verification to combat the attacks, existing approaches are sensitive to ambient environments, such as ambient lights and surrounding audible noises. Towards this end, we explore liveness verification of user authentication leveraging users' lip movements, which are robust to noisy environments. In this paper, we propose a lip reading-based user authentication system, LipPass, which extracts unique behavioral characteristics of users' speaking lips leveraging build-in audio devices on smartphones for user authentication. We first investigate Doppler profiles of acoustic signals caused by users' speaking lips, and find that there are unique lip movement patterns for different individuals. To characterize the lip movements, we propose a deep learning-based method to extract efficient features from Doppler profiles, and employ Support Vector Machine and Support Vector Domain Description to construct binary classifiers and spoofer detectors for user identification and spoofer detection, respectively. Afterwards, we develop a binary tree-based authentication approach to accurately identify each individual leveraging these binary classifiers and spoofer detectors with respect to registered users. Through extensive experiments involving 48 volunteers in four real environments, LipPass can achieve 90.21% accuracy in user identification and 93.1% accuracy in spoofer detection.

Yao, S., Niu, B., Liu, J..  2018.  Enhancing Sampling and Counting Method for Audio Retrieval with Time-Stretch Resistance. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1–5.

An ideal audio retrieval method should be not only highly efficient in identifying an audio track from a massive audio dataset, but also robust to any distortion. Unfortunately, none of the audio retrieval methods is robust to all types of distortions. An audio retrieval method has to do with both the audio fingerprint and the strategy, especially how they are combined. We argue that the Sampling and Counting Method (SC), a state-of-the-art audio retrieval method, would be promising towards an ideal audio retrieval method, if we could make it robust to time-stretch and pitch-stretch. Towards this objective, this paper proposes a turning point alignment method to enhance SC with resistance to time-stretch, which makes Philips and Philips-like fingerprints resist to time-stretch. Experimental results show that our approach can resist to time-stretch from 70% to 130%, which is on a par to the state-of-the-art methods. It also marginally improves the retrieval performance with various noise distortions.

Thoen, B., Wielandt, S., Strycker, L. De.  2018.  Fingerprinting Method for Acoustic Localization Using Low-Profile Microphone Arrays. 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN). :1–7.

Indoor localization of unknown acoustic events with MEMS microphone arrays have a huge potential in applications like home assisted living and surveillance. This article presents an Angle of Arrival (AoA) fingerprinting method for use in Wireless Acoustic Sensor Networks (WASNs) with low-profile microphone arrays. In a first research phase, acoustic measurements are performed in an anechoic room to evaluate two computationally efficient time domain delay-based AoA algorithms: one based on dot product calculations and another based on dot products with a PHAse Transform (PHAT). The evaluation of the algorithms is conducted with two sound events: white noise and a female voice. The algorithms are able to calculate the AoA with Root Mean Square Errors (RMSEs) of 3.5° for white noise and 9.8° to 16° for female vocal sounds. In the second research phase, an AoA fingerprinting algorithm is developed for acoustic event localization. The proposed solution is experimentally verified in a room of 4.25 m by 9.20 m with 4 acoustic sensor nodes. Acoustic fingerprints of white noise, recorded along a predefined grid in the room, are used to localize white noise and vocal sounds. The localization errors are evaluated using one node at a time, resulting in mean localization errors between 0.65 m and 0.98 m for white noise and between 1.18 m and 1.52 m for vocal sounds.

2018-12-10
Schonherr, L., Zeiler, S., Kolossa, D..  2017.  Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). :591–598.

Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.

Khan, M., Reza, M. Q., Sirdeshmukh, S. P. S. M. A..  2017.  A prototype model development for classification of material using acoustic resonance spectroscopy. 2017 International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT). :128–131.

In this work, a measurement system is developed based on acoustic resonance which can be used for classification of materials. Basically, the inspection methods based on acoustic, utilized for containers screening in the field, identification of defective pills hold high significance in the fields of health, security and protection. However, such techniques are constrained by costly instrumentation, offline analysis and complexities identified with transducer holder physical coupling. So a simple, non-destructive and amazingly cost effective technique in view of acoustic resonance has been formulated here for quick data acquisition and analysis of acoustic signature of liquids for their constituent identification and classification. In this system, there are two ceramic coated piezoelectric transducers attached at both ends of V-shaped glass, one is act as transmitter and another as receiver. The transmitter generates sound with the help of white noise generator. The pick up transducer on another end of the V-shaped glass rod detects the transmitted signal. The recording is being done with arduino interfaced to computer. The FFTs of recorded signals are being analyzed and the resulted resonant frequency observed for water, water+salt and water+sugar are 4.8 KHz, 6.8 KHz and 3.2 KHz respectively. The different resonant frequency in case different sample is being observed which shows that the developed prototype model effectively classifying the materials.

2017-03-08
Roth, J., Liu, X., Ross, A., Metaxas, D..  2015.  Investigating the Discriminative Power of Keystroke Sound. IEEE Transactions on Information Forensics and Security. 10:333–345.
The goal of this paper is to determine whether keystroke sound can be used to recognize a user. In this regard, we analyze the discriminative power of keystroke sound in the context of a continuous user authentication application. Motivated by the concept of digraphs used in modeling keystroke dynamics, a virtual alphabet is first learned from keystroke sound segments. Next, the digraph latency within the pairs of virtual letters, along with other statistical features, is used to generate match scores. The resultant scores are indicative of the similarities between two sound streams, and are fused to make a final authentication decision. Experiments on both static text-based and free text-based authentications on a database of 50 subjects demonstrate the potential as well as the limitations of keystroke sound.
2015-05-04
Luque, J., Anguera, X..  2014.  On the modeling of natural vocal emotion expressions through binary key. Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. :1562-1566.

This work presents a novel method to estimate natural expressed emotions in speech through binary acoustic modeling. Standard acoustic features are mapped to a binary value representation and a support vector regression model is used to correlate them with the three-continuous emotional dimensions. Three different sets of speech features, two based on spectral parameters and one on prosody are compared on the VAM corpus, a set of spontaneous dialogues from a German TV talk-show. The regression analysis, in terms of correlation coefficient and mean absolute error, show that the binary key modeling is able to successfully capture speaker emotion characteristics. The proposed algorithm obtains comparable results to those reported on the literature while it relies on a much smaller set of acoustic descriptors. Furthermore, we also report on preliminary results based on the combination of the binary models, which brings further performance improvements.

Kaghaz-Garan, S., Umbarkar, A., Doboli, A..  2014.  Joint localization and fingerprinting of sound sources for auditory scene analysis. Robotic and Sensors Environments (ROSE), 2014 IEEE International Symposium on. :49-54.

In the field of scene understanding, researchers have mainly focused on using video/images to extract different elements in a scene. The computational as well as monetary cost associated with such implementations is high. This paper proposes a low-cost system which uses sound-based techniques in order to jointly perform localization as well as fingerprinting of the sound sources. A network of embedded nodes is used to sense the sound inputs. Phase-based sound localization and Support-Vector Machine classification are used to locate and classify elements of the scene, respectively. The fusion of all this data presents a complete “picture” of the scene. The proposed concepts are applied to a vehicular-traffic case study. Experiments show that the system has a fingerprinting accuracy of up to 97.5%, localization error less than 4 degrees and scene prediction accuracy of 100%.

Alias T, E., Naveen, N., Mathew, D..  2014.  A Novel Acoustic Fingerprint Method for Audio Signal Pattern Detection. Advances in Computing and Communications (ICACC), 2014 Fourth International Conference on. :64-68.

This paper presents a novel and efficient audio signal recognition algorithm with limited computational complexity. As the audio recognition system will be used in real world environment where background noises are high, conventional speech recognition techniques are not directly applicable, since they have a poor performance in these environments. So here, we introduce a new audio recognition algorithm which is optimized for mechanical sounds such as car horn, telephone ring etc. This is a hybrid time-frequency approach which makes use of acoustic fingerprint for the recognition of audio signal patterns. The limited computational complexity is achieved through efficient usage of both time domain and frequency domain in two different processing phases, detection and recognition respectively. And the transition between these two phases is carried out through a finite state machine(FSM)model. Simulation results shows that the algorithm effectively recognizes audio signals within a noisy environment.

Yuxi Liu, Hatzinakos, D..  2014.  Human acoustic fingerprints: A novel biometric modality for mobile security. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :3784-3788.

Recently, the demand for more robust protection against unauthorized use of mobile devices has been rapidly growing. This paper presents a novel biometric modality Transient Evoked Otoacoustic Emission (TEOAE) for mobile security. Prior works have investigated TEOAE for biometrics in a setting where an individual is to be identified among a pre-enrolled identity gallery. However, this limits the applicability to mobile environment, where attacks in most cases are from imposters unknown to the system before. Therefore, we employ an unsupervised learning approach based on Autoencoder Neural Network to tackle such blind recognition problem. The learning model is trained upon a generic dataset and used to verify an individual in a random population. We also introduce the framework of mobile biometric system considering practical application. Experiments show the merits of the proposed method and system performance is further evaluated by cross-validation with an average EER 2.41% achieved.

Zurek, E.E., Gamarra, A.M.R., Escorcia, G.J.R., Gutierrez, C., Bayona, H., Perez, R., Garcia, X..  2014.  Spectral analysis techniques for acoustic fingerprints recognition. Image, Signal Processing and Artificial Vision (STSIVA), 2014 XIX Symposium on. :1-5.

This article presents results of the recognition process of acoustic fingerprints from a noise source using spectral characteristics of the signal. Principal Components Analysis (PCA) is applied to reduce the dimensionality of extracted features and then a classifier is implemented using the method of the k-nearest neighbors (KNN) to identify the pattern of the audio signal. This classifier is compared with an Artificial Neural Network (ANN) implementation. It is necessary to implement a filtering system to the acquired signals for 60Hz noise reduction generated by imperfections in the acquisition system. The methods described in this paper were used for vessel recognition.

2015-05-01
Andrade Esquef, P.A., Apolinario, J.A., Biscainho, L.W.P..  2014.  Edit Detection in Speech Recordings via Instantaneous Electric Network Frequency Variations. Information Forensics and Security, IEEE Transactions on. 9:2314-2326.

In this paper, an edit detection method for forensic audio analysis is proposed. It develops and improves a previous method through changes in the signal processing chain and a novel detection criterion. As with the original method, electrical network frequency (ENF) analysis is central to the novel edit detector, for it allows monitoring anomalous variations of the ENF related to audio edit events. Working in unsupervised manner, the edit detector compares the extent of ENF variations, centered at its nominal frequency, with a variable threshold that defines the upper limit for normal variations observed in unedited signals. The ENF variations caused by edits in the signal are likely to exceed the threshold providing a mechanism for their detection. The proposed method is evaluated in both qualitative and quantitative terms via two distinct annotated databases. Results are reported for originally noisy database signals as well as versions of them further degraded under controlled conditions. A comparative performance evaluation, in terms of equal error rate (EER) detection, reveals that, for one of the tested databases, an improvement from 7% to 4% EER is achieved, respectively, from the original to the new edit detection method. When the signals are amplitude clipped or corrupted by broadband background noise, the performance figures of the novel method follow the same profile of those of the original method.