Visible to the public Biblio

Filters: Keyword is music  [Clear All Filters]
2023-08-03
Pardede, Hilman, Zilvan, Vicky, Ramdan, Ade, Yuliani, Asri R., Suryawati, Endang, Kusumowardani, Renni.  2022.  Adversarial Networks-Based Speech Enhancement with Deep Regret Loss. 2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS). :1–6.
Speech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.
2023-07-21
Almutairi, Mishaal M., Apostolopoulou, Dimitra, Halikias, George, Abi Sen, Adnan Ahmed, Yamin, Mohammad.  2022.  Enhancing Privacy and Security in Crowds using Fog Computing. 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom). :57—62.
Thousands of crowded events take place every year. Often, management does not properly implement and manage privacy and security of data of the participants and personnel of the events. Crowds are also prone to significant security issues and become vulnerable to terrorist attacks. The aim of this paper is to propose a privacy and security framework for large, crowded events like the Hajj, Kumbh, Arba'een, and many sporting events and musical concerts. The proposed framework uses the latest technologies including Internet of Things, and Fog computing, especially in the Location based Services environments. The proposed framework can also be adapted for many other scenarios and situations.
2022-06-30
Mathai, Angelo, Nirmal, Atharv, Chaudhari, Purva, Deshmukh, Vedant, Dhamdhere, Shantanu, Joglekar, Pushkar.  2021.  Audio CAPTCHA for Visually Impaired. 2021 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). :1—5.
Completely Automated Public Turing Tests (CAPTCHA) have been used to differentiate between computers and humans for quite some time now. There are many different varieties of CAPTCHAs - text-based, image-based, audio, video, arithmetic, etc. However, not all varieties are suitable for the visually impaired. As time goes by and Spambots and APIs grow more accurate, the CAPTCHA tests have been constantly updated to stay relevant, but that has not happened with the audio CAPTCHA. There exists an audio CAPTCHA intended for the blind/visually impaired but many blind/visually impaired find it difficult to solve. We propose an alternative to the existing system, which would make use of unique sound samples layered with music generated through GANs (Generative Adversarial Networks) along with noise and other layers of sounds to make it difficult to dissect. The user has to count the number of times the unique sound was heard in the sample and then input that number. Since there are no letters or numbers involved in the samples, speech-to-text bots/APIs cannot be used directly to decipher this system. Also, any user regardless of their native language can comfortably use this system.
2022-05-23
Hyodo, Yasuhide, Sugai, Chihiro, Suzuki, Junya, Takahashi, Masafumi, Koizumi, Masahiko, Tomura, Asako, Mitsufuji, Yuki, Komoriya, Yota.  2021.  Psychophysiological Effect of Immersive Spatial Audio Experience Enhanced Using Sound Field Synthesis. 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII). :1–8.
Recent advancements of spatial audio technologies to enhance human’s emotional and immersive experiences are gathering attention. Many studies are clarifying the neural mechanisms of acoustic spatial perception; however, they are limited to the evaluation of mechanisms using basic sound stimuli. Therefore, it remains challenging to evaluate the experience of actual music contents and to verify the effects of higher-order neurophysiological responses including a sense of immersive and realistic experience. To investigate the effects of spatial audio experience, we verified the psychophysiological responses of immersive spatial audio experience using sound field synthesis (SFS) technology. Specifically, we evaluated alpha power as the central nervous system activity, heart rate/heart rate variability and skin conductance as the autonomic nervous system activity during an acoustic experience of an actual music content by comparing stereo and SFS conditions. As a result, statistically significant differences (p \textbackslashtextless 0.05) were detected in the changes in alpha wave power, high frequency wave power of heart rate variability (HF), and skin conductance level (SCL) among the conditions. The results of the SFS condition showed enhanced the changes in alpha power in the frontal and parietal regions, suggesting enhancement of emotional experience. The results of the SFS condition also suggested that close objects are grouped and perceived on the basis of the spatial proximity of sounds in the presence of multiple sound sources. It is demonstrating that the potential use of SFS technology can enhance emotional and immersive experiences by spatial acoustic expression.
2021-08-17
Belman, Amith K., Paul, Tirthankar, Wang, Li, Iyengar, S. S., Śniatała, Paweł, Jin, Zhanpeng, Phoha, Vir V., Vainio, Seppo, Röning, Juha.  2020.  Authentication by Mapping Keystrokes to Music: The Melody of Typing. 2020 International Conference on Artificial Intelligence and Signal Processing (AISP). :1—6.
Expressing Keystroke Dynamics (KD) in form of sound opens new avenues to apply sound analysis techniques on KD. However this mapping is not straight-forward as varied feature space, differences in magnitudes of features and human interpretability of the music bring in complexities. We present a musical interface to KD by mapping keystroke features to music features. Music elements like melody, harmony, rhythm, pitch and tempo are varied with respect to the magnitude of their corresponding keystroke features. A pitch embedding technique makes the music discernible among users. Using the data from 30 users, who typed fixed strings multiple times on a desktop, shows that these auditory signals are distinguishable between users by both standard classifiers (SVM, Random Forests and Naive Bayes) and humans alike.
2021-01-20
Mehmood, Z., Qazi, K. Ashfaq, Tahir, M., Yousaf, R. Muhammad, Sardaraz, M..  2020.  Potential Barriers to Music Fingerprinting Algorithms in the Presence of Background Noise. 2020 6th Conference on Data Science and Machine Learning Applications (CDMA). :25—30.

An acoustic fingerprint is a condensed and powerful digital signature of an audio signal which is used for audio sample identification. A fingerprint is the pattern of a voice or audio sample. A large number of algorithms have been developed for generating such acoustic fingerprints. These algorithms facilitate systems that perform song searching, song identification, and song duplication detection. In this study, a comprehensive and powerful survey of already developed algorithms is conducted. Four major music fingerprinting algorithms are evaluated for identifying and analyzing the potential hurdles that can affect their results. Since the background and environmental noise reduces the efficiency of music fingerprinting algorithms, behavioral analysis of fingerprinting algorithms is performed using audio samples of different languages and under different environmental conditions. The results of music fingerprint classification are more successful when deep learning techniques for classification are used. The testing of the acoustic feature modeling and music fingerprinting algorithms is performed using the standard dataset of iKala, MusicBrainz and MIR-1K.

2020-08-03
Walczyński, Maciej, Ryba, Dagmara.  2019.  Effectiveness of the acoustic fingerprint in various acoustical environments. 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). :137–141.
In this article analysis of the effectiveness of the acoustic algorithm of the fingerprint in the conditions of various acoustic disturbances is presented and described. The described algorithm is stable and should identify music even in the presence of acoustic disturbances. This was checked in a series of tests in four different conditions: silence, street noise, noise from the railway station, noise from inside the moving car during rain. In the case of silence, 10 measurements were taken lasting 7 seconds each. For each of the remaining conditions, 21 attempts were made to identify the work. The capture time for each of the 21 trials was 7 seconds. Every 7 attempts were changed noise volume. Subsequently, they were disruptions at a volume lower than the volume of the intercepted song, another 7 with an altitude similar to the intercepted track, and the last with a much higher volume. The effectiveness of the algorithm was calculated for two different times, and general - for the average of two results. Base of "fingerprints" consisted of 20 previously analyzed music pieces belonging to different musical genres.
2017-03-07
Summers, Cameron, Tronel, Greg, Cramer, Jason, Vartakavi, Aneesh, Popp, Phillip.  2016.  GNMID14: A Collection of 110 Million Global Music Identification Matches. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. :693–696.

A new dataset is presented composed of music identification matches from Gracenote, a leading global music metadata company. Matches from January 1, 2014 to December 31, 2014 have been curated and made available as a public dataset called Gracenote Music Identification 2014, or GNMID14, at the following address: https://developer.gracenote.com/mid2014. This collection is the first significant music identification dataset and one of the largest music related datasets available containing more than 110M matches in 224 countries for 3M unique tracks, and 509K unique artists. It features geotemporal information (i.e. country and match date), genre and mood metadata. In this paper, we characterize the dataset and demonstrate its utility for Information Retrieval (IR) research.

2015-05-04
Coover, B., Jinyu Han.  2014.  A Power Mask based audio fingerprint. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :1394-1398.

The Philips audio fingerprint[1] has been used for years, but its robustness against external noise has not been studied accurately. This paper shows the Philips fingerprint is noise resistant, and is capable of recognizing music that is corrupted by noise at a -4 to -7 dB signal to noise ratio. In addition, the drawbacks of the Philips fingerprint are addressed by utilizing a “Power Mask” in conjunction with the Philips fingerprint during the matching process. This Power Mask is a weight matrix given to the fingerprint bits, which allows mismatched bits to be penalized according to their relevance in the fingerprint. The effectiveness of the proposed fingerprint was evaluated by experiments using a database of 1030 songs and 1184 query files that were heavily corrupted by two types of noise at varying levels. Our experiments show the proposed method has significantly improved the noise resistance of the standard Philips fingerprint.