Study on Possibility of Estimating Smartphone Inputs from Tap Sounds

Submitted by grigby1 on Tue, 08/17/2021 - 4:18pm

Title	Study on Possibility of Estimating Smartphone Inputs from Tap Sounds
Publication Type	Conference Paper
Year of Publication	2020
Authors	Ouchi, Yumo, Okudera, Ryosuke, Shiomi, Yuya, Uehara, Kota, Sugimoto, Ayaka, Ohki, Tetsushi, Nishigaki, Masakatsu
Conference Name	2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date Published	Dec. 2020
Publisher	IEEE
ISBN Number	978-988-14768-8-3
Keywords	Human Behavior, Keyboards, keystroke analysis, Libraries, Mel frequency cepstral coefficient, Metrics, microphones, Pins, pubcrawl, side-channel attacks, Training data
Abstract	Side-channel attacks occur on smartphone keystrokes, where the input can be intercepted by a tapping sound. Ilia et al. reported that keystrokes can be predicted with 61% accuracy from tapping sounds listened to by the built-in microphone of a legitimate user's device. Li et al. reported that by emitting sonar sounds from an attacker smartphone's built-in speaker and analyzing the reflected waves from a legitimate user's finger at the time of tap input, keystrokes can be estimated with 90% accuracy. However, the method proposed by Ilia et al. requires prior penetration of the target smartphone and the attack scenario lacks plausibility; if the attacker's smartphone can be penetrated, the keylogger can directly acquire the keystrokes of a legitimate user. In addition, the method proposed by Li et al. is a side-channel attack in which the attacker actively interferes with the terminals of legitimate users and can be described as an active attack scenario. Herein, we analyze the extent to which a user's keystrokes are leaked to the attacker in a passive attack scenario, where the attacker wiretaps the sounds of the legitimate user's keystrokes using an external microphone. First, we limited the keystrokes to the personal identification number input. Subsequently, mel-frequency cepstrum coefficients of tapping sound data were represented as image data. Consequently, we found that the input is discriminated with high accuracy using a convolutional neural network to estimate the key input.
URL	https://ieeexplore.ieee.org/document/9306453
Citation Key	ouchi_study_2020

Groups:

Science of Security VO