Visible to the public Biblio

Filters: Keyword is Mel frequency cepstral coefficient  [Clear All Filters]
2022-01-10
Agarwal, Shivam, Khatter, Kiran, Relan, Devanjali.  2021.  Security Threat Sounds Classification Using Neural Network. 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom). :690–694.
Sound plays a key role in human life and therefore sound recognition system has a great future ahead. Sound classification and identification system has many applications such as system for personal security, critical surveillance, etc. The main aim of this paper is to detect and classify the security sound event using the surveillance camera systems with integrated microphone based on the generated spectrograms of the sounds. This will enable to track security events in cases of emergencies. The goal is to propose a security system to accurately detect sound events and make a better security sound event detection system. We propose to use a convolutional neural network (CNN) to design the security sound detection system to detect a security event with minimal sound. We used the spectrogram images to train the CNN. The neural network was trained using different security sounds data which was then used to detect security sound events during testing phase. We used two datasets for our experiment training and testing datasets. Both the datasets contain 3 different sound events (glass break, gun shots and smoke alarms) to train and test the model, respectively. The proposed system yields the good accuracy for the sound event detection even with minimum available sound data. The designed system achieved accuracy was 92% and 90% using CNN on training dataset and testing dataset. We conclude that the proposed sound classification framework which using the spectrogram images of sounds can be used efficiently to develop the sound classification and recognition systems.
2021-08-17
Ouchi, Yumo, Okudera, Ryosuke, Shiomi, Yuya, Uehara, Kota, Sugimoto, Ayaka, Ohki, Tetsushi, Nishigaki, Masakatsu.  2020.  Study on Possibility of Estimating Smartphone Inputs from Tap Sounds. 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). :1425—1429.
Side-channel attacks occur on smartphone keystrokes, where the input can be intercepted by a tapping sound. Ilia et al. reported that keystrokes can be predicted with 61% accuracy from tapping sounds listened to by the built-in microphone of a legitimate user's device. Li et al. reported that by emitting sonar sounds from an attacker smartphone's built-in speaker and analyzing the reflected waves from a legitimate user's finger at the time of tap input, keystrokes can be estimated with 90% accuracy. However, the method proposed by Ilia et al. requires prior penetration of the target smartphone and the attack scenario lacks plausibility; if the attacker's smartphone can be penetrated, the keylogger can directly acquire the keystrokes of a legitimate user. In addition, the method proposed by Li et al. is a side-channel attack in which the attacker actively interferes with the terminals of legitimate users and can be described as an active attack scenario. Herein, we analyze the extent to which a user's keystrokes are leaked to the attacker in a passive attack scenario, where the attacker wiretaps the sounds of the legitimate user's keystrokes using an external microphone. First, we limited the keystrokes to the personal identification number input. Subsequently, mel-frequency cepstrum coefficients of tapping sound data were represented as image data. Consequently, we found that the input is discriminated with high accuracy using a convolutional neural network to estimate the key input.
2020-12-11
Hassan, S. U., Khan, M. Zeeshan, Khan, M. U. Ghani, Saleem, S..  2019.  Robust Sound Classification for Surveillance using Time Frequency Audio Features. 2019 International Conference on Communication Technologies (ComTech). :13—18.

Over the years, technology has reformed the perception of the world related to security concerns. To tackle security problems, we proposed a system capable of detecting security alerts. System encompass audio events that occur as an outlier against background of unusual activity. This ambiguous behaviour can be handled by auditory classification. In this paper, we have discussed two techniques of extracting features from sound data including: time-based and signal based features. In first technique, we preserve time-series nature of sound, while in other signal characteristics are focused. Convolution neural network is applied for categorization of sound. Major aim of research is security challenges, so we have generated data related to surveillance in addition to available datasets such as UrbanSound 8k and ESC-50 datasets. We have achieved 94.6% accuracy for proposed methodology based on self-generated dataset. Improved accuracy on locally prepared dataset demonstrates novelty in research.

2020-08-03
Liu, Meng, Wang, Longbiao, Dang, Jianwu, Nakagawa, Seiichi, Guan, Haotian, Li, Xiangang.  2019.  Replay Attack Detection Using Magnitude and Phase Information with Attention-based Adaptive Filters. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :6201–6205.
Automatic Speech Verification (ASV) systems are highly vulnerable to spoofing attacks, and replay attack poses the greatest threat among various spoofing attacks. In this paper, we propose a novel multi-channel feature extraction method with attention-based adaptive filters (AAF). Original phase information, discarded by conventional feature extraction techniques after Fast Fourier Transform (FFT), is promising in distinguishing genuine from replay spoofed speech. Accordingly, phase and magnitude information are respectively extracted as phase channel and magnitude channel complementary features in our system. First, we make discriminative ability analysis on full frequency bands with F-ratio methods. Then attention-based adaptive filters are implemented to maximize capturing of high discriminative information on frequency bands, and the results on ASVspoof 2017 challenge indicate that our proposed approach achieved relative error reduction rates of 78.7% and 59.8% on development and evaluation dataset than the baseline method.
Liu, Fuxiang, Jiang, Qi.  2019.  Research on Recognition of Criminal Suspects Based on Foot Sounds. 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :1347–1351.
There are two main contributions in this paper: Firstly, by analyzing the frequency domain features and Mel domain features, we can identify footstep events and non-footstep events. Secondly, we compared the two footstep sound signals of the same person in frequency domain under different experimental conditions, finding that almost all of their peak frequencies and trough frequencies in the main frequency band are respectively corresponding one-to-one. However for the two different people, even under the same experimental conditions, it is difficult to have the same peak frequencies and trough frequencies in the main frequency band of their footstep sound signals. Therefore, this feature of footstep sound signals can be used to identify different people.
2019-01-16
Kreuk, F., Adi, Y., Cisse, M., Keshet, J..  2018.  Fooling End-To-End Speaker Verification With Adversarial Examples. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :1962–1966.
Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable, by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on a deep end-to-end network that was either trained on YOHO or NTIMIT. We also present two black-box attacks. In the first one, we generate adversarial examples with a system trained on NTIMIT and perform the attack on a system that trained on YOHO. In the second one, we generate the adversarial examples with a system trained using Mel-spectrum features and perform the attack on a system trained using MFCCs. Our results show that one can significantly decrease the accuracy of a target system even when the adversarial examples are generated with different system potentially using different features.