Visible to the public Biblio

Filters: Keyword is audio  [Clear All Filters]
2017-10-18
Gingold, Mathew, Schiphorst, Thecla, Pasquier, Philippe.  2017.  Never Alone: A Video Agents Based Generative Audio-Visual Installation. Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. :1425–1430.

Never Alone (2016) is a generative large-scale urban screen video-sound installation, which presents the idea of generative choreographies amongst multiple video agents, or "digital performers". This generative installation questions how we navigate in urban spaces and the ubiquity and disruptive nature of encounters within the cities' landscapes. The video agents explore precarious movement paths along the façade inhabiting landscapes that are both architectural and emotional.

2017-03-07
Ruan, Wenjie, Sheng, Quan Z., Yang, Lei, Gu, Tao, Xu, Peipei, Shangguan, Longfei.  2016.  AudioGest: Enabling Fine-grained Hand Gesture Detection by Decoding Echo Signal. Proceedings of the 2016 {ACM} {International} {Joint} {Conference} on {Pervasive} and {Ubiquitous} {Computing}. :474–485.
Hand gesture is becoming an increasingly popular means of interacting with consumer electronic devices, such as mobile phones, tablets and laptops. In this paper, we present AudioGest, a device-free gesture recognition system that can accurately sense the hand in-air movement around user's devices. Compared to the state-of-the-art, AudioGest is superior in using only one pair of built-in speaker and microphone, without any extra hardware or infrastructure support and with no training, to achieve fine-grained hand detection. Our system is able to accurately recognize various hand gestures, estimate the hand in-air time, as well as average moving speed and waving range. We achieve this by transforming the device into an active sonar system that transmits inaudible audio signal and decodes the echoes of hand at its microphone. We address various challenges including cleaning the noisy reflected sound signal, interpreting the echo spectrogram into hand gestures, decoding the Doppler frequency shifts into the hand waving speed and range, as well as being robust to the environmental motion and signal drifting. We implement the proof-of-concept prototype in three different electronic devices and extensively evaluate the system in four real-world scenarios using 3,900 hand gestures that collected by five users for more than two weeks. Our results show that AudioGest can detect six hand gestures with an accuracy up to 96%, and by distinguishing the gesture attributions, it can provide up to 162 control commands for various applications.
2015-05-04
Hui Su, Hajj-Ahmad, A., Min Wu, Oard, D.W..  2014.  Exploring the use of ENF for multimedia synchronization. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :4613-4617.

The electric network frequency (ENF) signal can be captured in multimedia recordings due to electromagnetic influences from the power grid at the time of recording. Recent work has exploited the ENF signals for forensic applications, such as authenticating and detecting forgery of ENF-containing multimedia signals, and inferring their time and location of creation. In this paper, we explore a new potential of ENF signals for automatic synchronization of audio and video. The ENF signal as a time-varying random process can be used as a timing fingerprint of multimedia signals. Synchronization of audio and video recordings can be achieved by aligning their embedded ENF signals. We demonstrate the proposed scheme with two applications: multi-view video synchronization and synchronization of historical audio recordings. The experimental results show the ENF based synchronization approach is effective, and has the potential to solve problems that are intractable by other existing methods.

2014-09-26
Bursztein, E., Bethard, S., Fabry, C., Mitchell, J.C., Jurafsky, D..  2010.  How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation Security and Privacy (SP), 2010 IEEE Symposium on. :399-413.

Captchas are designed to be easy for humans but hard for machines. However, most recent research has focused only on making them hard for machines. In this paper, we present what is to the best of our knowledge the first large scale evaluation of captchas from the human perspective, with the goal of assessing how much friction captchas present to the average user. For the purpose of this study we have asked workers from Amazon’s Mechanical Turk and an underground captchabreaking service to solve more than 318 000 captchas issued from the 21 most popular captcha schemes (13 images schemes and 8 audio scheme). Analysis of the resulting data reveals that captchas are often difficult for humans, with audio captchas being particularly problematic. We also find some demographic trends indicating, for example, that non-native speakers of English are slower in general and less accurate on English-centric captcha schemes. Evidence from a week’s worth of eBay captchas (14,000,000 samples) suggests that the solving accuracies found in our study are close to real-world values, and that improving audio captchas should become a priority, as nearly 1% of all captchas are delivered as audio rather than images. Finally our study also reveals that it is more effective for an attacker to use Mechanical Turk to solve captchas than an underground service.