Visible to the public Biblio

Filters: Keyword is Speech recognition  [Clear All Filters]
2020-04-06
Ahmed, Syed Umaid, Sabir, Arbaz, Ashraf, Talha, Ashraf, Usama, Sabir, Shahbaz, Qureshi, Usama.  2019.  Security Lock with Effective Verification Traits. 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). :164–169.
To manage and handle the issues of physical security in the modern world, there is a dire need for a multilevel security system to ensure the safety of precious belongings that could be money, military equipment or medical life-saving drugs. Security locker solution is proposed which is a multiple layer security system consisting of various levels of authentication. In most cases, only relevant persons should have access to their precious belongings. The unlocking of the box is only possible when all of the security levels are successfully cleared. The five levels of security include entering of password on interactive GUI, thumbprint, facial recognition, speech pattern recognition, and vein pattern recognition. This project is unique and effective in a sense that it incorporates five levels of security in a single prototype with the use of cost-effective equipment. Assessing our security system, it is seen that security is increased many a fold as it is near to impossible to breach all these five levels of security. The Raspberry Pi microcomputers, handling all the traits efficiently and smartly makes it easy for performing all the verification tasks. The traits used involves checking, training and verifying processes with application of machine learning operations.
2020-03-27
Tamura, Keiichi, Omagari, Akitada, Hashida, Shuichi.  2019.  Novel Defense Method against Audio Adversarial Example for Speech-to-Text Transcription Neural Networks. 2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA). :115–120.
With the developments in deep learning, the security of neural networks against vulnerabilities has become one of the most urgent research topics in deep learning. There are many types of security countermeasures. Adversarial examples and their defense methods, in particular, have been well-studied in recent years. An adversarial example is designed to make neural networks misclassify or produce inaccurate output. Audio adversarial examples are a type of adversarial example where the main target of attack is a speech-to-text transcription neural network. In this study, we propose a new defense method against audio adversarial examples for the speech-to-text transcription neural networks. It is difficult to determine whether an input waveform data representing the sound of voice is an audio adversarial example. Therefore, the main framework of the proposed defense method is based on a sandbox approach. To evaluate the proposed defense method, we used actual audio adversarial examples that were created on Deep Speech, which is a speech-to-text transcription neural network. We confirmed that our defense method can identify audio adversarial examples to protect speech-to-text systems.
2019-08-05
Thapliyal, H., Ratajczak, N., Wendroth, O., Labrado, C..  2018.  Amazon Echo Enabled IoT Home Security System for Smart Home Environment. 2018 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). :31–36.

Ever-driven by technological innovation, the Internet of Things (IoT) is continuing its exceptional evolution and growth into the common consumer space. In the wake of these developments, this paper proposes a framework for an IoT home security system that is secure, expandable, and accessible. Congruent with the ideals of the IoT, we are proposing a system utilizing an ultra-low-power wireless sensor network which would interface with a central hub via Bluetooth 4, commonly referred to as Bluetooth Low Energy (BLE), to monitor the home. Additionally, the system would interface with an Amazon Echo to accept user voice commands. The aforementioned central hub would also act as a web server and host an internet accessible configuration page from which users could monitor and customize their system. An internet-connected system would carry the capability to notify the users of system alarms via SMS or email. Finally, this proof of concept is intended to demonstrate expandability into other areas of home automation or building monitoring functions in general.

2019-02-08
Ivanova, M., Durcheva, M., Baneres, D., Rodríguez, M. E..  2018.  eAssessment by Using a Trustworthy System in Blended and Online Institutions. 2018 17th International Conference on Information Technology Based Higher Education and Training (ITHET). :1-7.

eAssessment uses technology to support online evaluation of students' knowledge and skills. However, challenging problems must be addressed such as trustworthiness among students and teachers in blended and online settings. The TeSLA system proposes an innovative solution to guarantee correct authentication of students and to prove the authorship of their assessment tasks. Technologically, the system is based on the integration of five instruments: face recognition, voice recognition, keystroke dynamics, forensic analysis, and plagiarism. The paper aims to analyze and compare the results achieved after the second pilot performed in an online and a blended university revealing the realization of trust-driven solutions for eAssessment.

2019-01-16
Carlini, N., Wagner, D..  2018.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. 2018 IEEE Security and Privacy Workshops (SPW). :1–7.
We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.
2018-12-10
Schonherr, L., Zeiler, S., Kolossa, D..  2017.  Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). :591–598.

Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.

2018-06-07
Jiao, X., Luo, M., Lin, J. H., Gupta, R. K..  2017.  An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations. 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). :945–950.

As a problem solving method, neural networks have shown broad applicability from medical applications, speech recognition, and natural language processing. This success has even led to implementation of neural network algorithms into hardware. In this paper, we explore two questions: (a) to what extent microelectronic variations affects the quality of results by neural networks; and (b) if the answer to first question represents an opportunity to optimize the implementation of neural network algorithms. Regarding first question, variations are now increasingly common in aggressive process nodes and typically manifest as an increased frequency of timing errors. Combating variations - due to process and/or operating conditions - usually results in increased guardbands in circuit and architectural design, thus reducing the gains from process technology advances. Given the inherent resilience of neural networks due to adaptation of their learning parameters, one would expect the quality of results produced by neural networks to be relatively insensitive to the rising timing error rates caused by increased variations. On the contrary, using two frequently used neural networks (MLP and CNN), our results show that variations can significantly affect the inference accuracy. This paper outlines our assessment methodology and use of a cross-layer evaluation approach that extracts hardware-level errors from twenty different operating conditions and then inject such errors back to the software layer in an attempt to answer the second question posed above.

2018-02-27
Zhang, Guoming, Yan, Chen, Ji, Xiaoyu, Zhang, Tianchen, Zhang, Taimin, Xu, Wenyuan.  2017.  DolphinAttack: Inaudible Voice Commands. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :103–117.

Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems (VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though "hidden", are nonetheless audible. In this work, we design a totally inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f textgreater 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low-frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validated DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions, and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.

2017-12-20
Yamaguchi, M., Kikuchi, H..  2017.  Audio-CAPTCHA with distinction between random phoneme sequences and words spoken by multi-speaker. 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). :3071–3076.
Audio-CAPTCHA prevents malicious bots from attacking Web services and provides Web accessibility for visually-impaired persons. Most of the conventional methods employ statistical noise to distort sounds and let users remember and spell the words, which are difficult and laborious work for humans. In this paper, we utilize the difficulty on speaker-independent recognition for ASR machines instead of distortion with statistical noise. Our scheme synthesizes various voices by changing voice speed, pitch and native language of speakers. Moreover, we employ semantic identification problems between random phoneme sequences and meaningful words to release users from remembering and spelling words, so it improves the accuracy of humans and usability. We also evaluated our scheme in several experiments.
2015-05-04
Luque, J., Anguera, X..  2014.  On the modeling of natural vocal emotion expressions through binary key. Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. :1562-1566.

This work presents a novel method to estimate natural expressed emotions in speech through binary acoustic modeling. Standard acoustic features are mapped to a binary value representation and a support vector regression model is used to correlate them with the three-continuous emotional dimensions. Three different sets of speech features, two based on spectral parameters and one on prosody are compared on the VAM corpus, a set of spontaneous dialogues from a German TV talk-show. The regression analysis, in terms of correlation coefficient and mean absolute error, show that the binary key modeling is able to successfully capture speaker emotion characteristics. The proposed algorithm obtains comparable results to those reported on the literature while it relies on a much smaller set of acoustic descriptors. Furthermore, we also report on preliminary results based on the combination of the binary models, which brings further performance improvements.

Ghatak, S., Lodh, A., Saha, E., Goyal, A., Das, A., Dutta, S..  2014.  Development of a keyboardless social networking website for visually impaired: SocialWeb. Global Humanitarian Technology Conference - South Asia Satellite (GHTC-SAS), 2014 IEEE. :232-236.

Over the past decade, we have witnessed a huge upsurge in social networking which continues to touch and transform our lives till present day. Social networks help us to communicate amongst our acquaintances and friends with whom we share similar interests on a common platform. Globally, there are more than 200 million visually impaired people. Visual impairment has many issues associated with it, but the one that stands out is the lack of accessibility to content for entertainment and socializing safely. This paper deals with the development of a keyboard less social networking website for visually impaired. The term keyboard less signifies minimum use of keyboard and allows the user to explore the contents of the website using assistive technologies like screen readers and speech to text (STT) conversion technologies which in turn provides a user friendly experience for the target audience. As soon as the user with minimal computer proficiency opens this website, with the help of screen reader, he/she identifies the username and password fields. The user speaks out his username and with the help of STT conversion (using Web Speech API), the username is entered. Then the control moves over to the password field and similarly, the password of the user is obtained and matched with the one saved in the website database. The concept of acoustic fingerprinting has been implemented for successfully validating the passwords of registered users and foiling intentions of malicious attackers. On successful match of the passwords, the user is able to enjoy the services of the website without any further hassle. Once the access obstacles associated to deal with social networking sites are successfully resolved and proper technologies are put to place, social networking sites can be a rewarding, fulfilling, and enjoyable experience for the visually impaired people.