Title | Hello, Is It Me You'Re Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Blue, Logan, Vargas, Luis, Traynor, Patrick |
Conference Name | Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-5731-9 |
Keywords | composability, Human Behavior, Internet of Things, IoT Security 2018, Metrics, pubcrawl, Resiliency, Voice interface |
Abstract | Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces. |
URL | http://doi.acm.org/10.1145/3212480.3212505 |
DOI | 10.1145/3212480.3212505 |
Citation Key | blue_hello_2018 |