Biblio
Ever-driven by technological innovation, the Internet of Things (IoT) is continuing its exceptional evolution and growth into the common consumer space. In the wake of these developments, this paper proposes a framework for an IoT home security system that is secure, expandable, and accessible. Congruent with the ideals of the IoT, we are proposing a system utilizing an ultra-low-power wireless sensor network which would interface with a central hub via Bluetooth 4, commonly referred to as Bluetooth Low Energy (BLE), to monitor the home. Additionally, the system would interface with an Amazon Echo to accept user voice commands. The aforementioned central hub would also act as a web server and host an internet accessible configuration page from which users could monitor and customize their system. An internet-connected system would carry the capability to notify the users of system alarms via SMS or email. Finally, this proof of concept is intended to demonstrate expandability into other areas of home automation or building monitoring functions in general.
eAssessment uses technology to support online evaluation of students' knowledge and skills. However, challenging problems must be addressed such as trustworthiness among students and teachers in blended and online settings. The TeSLA system proposes an innovative solution to guarantee correct authentication of students and to prove the authorship of their assessment tasks. Technologically, the system is based on the integration of five instruments: face recognition, voice recognition, keystroke dynamics, forensic analysis, and plagiarism. The paper aims to analyze and compare the results achieved after the second pilot performed in an online and a blended university revealing the realization of trust-driven solutions for eAssessment.
Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.
As a problem solving method, neural networks have shown broad applicability from medical applications, speech recognition, and natural language processing. This success has even led to implementation of neural network algorithms into hardware. In this paper, we explore two questions: (a) to what extent microelectronic variations affects the quality of results by neural networks; and (b) if the answer to first question represents an opportunity to optimize the implementation of neural network algorithms. Regarding first question, variations are now increasingly common in aggressive process nodes and typically manifest as an increased frequency of timing errors. Combating variations - due to process and/or operating conditions - usually results in increased guardbands in circuit and architectural design, thus reducing the gains from process technology advances. Given the inherent resilience of neural networks due to adaptation of their learning parameters, one would expect the quality of results produced by neural networks to be relatively insensitive to the rising timing error rates caused by increased variations. On the contrary, using two frequently used neural networks (MLP and CNN), our results show that variations can significantly affect the inference accuracy. This paper outlines our assessment methodology and use of a cross-layer evaluation approach that extracts hardware-level errors from twenty different operating conditions and then inject such errors back to the software layer in an attempt to answer the second question posed above.
Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems (VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though "hidden", are nonetheless audible. In this work, we design a totally inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f textgreater 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low-frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validated DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions, and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.
This work presents a novel method to estimate natural expressed emotions in speech through binary acoustic modeling. Standard acoustic features are mapped to a binary value representation and a support vector regression model is used to correlate them with the three-continuous emotional dimensions. Three different sets of speech features, two based on spectral parameters and one on prosody are compared on the VAM corpus, a set of spontaneous dialogues from a German TV talk-show. The regression analysis, in terms of correlation coefficient and mean absolute error, show that the binary key modeling is able to successfully capture speaker emotion characteristics. The proposed algorithm obtains comparable results to those reported on the literature while it relies on a much smaller set of acoustic descriptors. Furthermore, we also report on preliminary results based on the combination of the binary models, which brings further performance improvements.
Over the past decade, we have witnessed a huge upsurge in social networking which continues to touch and transform our lives till present day. Social networks help us to communicate amongst our acquaintances and friends with whom we share similar interests on a common platform. Globally, there are more than 200 million visually impaired people. Visual impairment has many issues associated with it, but the one that stands out is the lack of accessibility to content for entertainment and socializing safely. This paper deals with the development of a keyboard less social networking website for visually impaired. The term keyboard less signifies minimum use of keyboard and allows the user to explore the contents of the website using assistive technologies like screen readers and speech to text (STT) conversion technologies which in turn provides a user friendly experience for the target audience. As soon as the user with minimal computer proficiency opens this website, with the help of screen reader, he/she identifies the username and password fields. The user speaks out his username and with the help of STT conversion (using Web Speech API), the username is entered. Then the control moves over to the password field and similarly, the password of the user is obtained and matched with the one saved in the website database. The concept of acoustic fingerprinting has been implemented for successfully validating the passwords of registered users and foiling intentions of malicious attackers. On successful match of the passwords, the user is able to enjoy the services of the website without any further hassle. Once the access obstacles associated to deal with social networking sites are successfully resolved and proper technologies are put to place, social networking sites can be a rewarding, fulfilling, and enjoyable experience for the visually impaired people.