Visible to the public Study on Possibility of Estimating Smartphone Inputs from Tap Sounds

TitleStudy on Possibility of Estimating Smartphone Inputs from Tap Sounds
Publication TypeConference Paper
Year of Publication2020
AuthorsOuchi, Yumo, Okudera, Ryosuke, Shiomi, Yuya, Uehara, Kota, Sugimoto, Ayaka, Ohki, Tetsushi, Nishigaki, Masakatsu
Conference Name2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date PublishedDec. 2020
PublisherIEEE
ISBN Number978-988-14768-8-3
KeywordsHuman Behavior, Keyboards, keystroke analysis, Libraries, Mel frequency cepstral coefficient, Metrics, microphones, Pins, pubcrawl, side-channel attacks, Training data
AbstractSide-channel attacks occur on smartphone keystrokes, where the input can be intercepted by a tapping sound. Ilia et al. reported that keystrokes can be predicted with 61% accuracy from tapping sounds listened to by the built-in microphone of a legitimate user's device. Li et al. reported that by emitting sonar sounds from an attacker smartphone's built-in speaker and analyzing the reflected waves from a legitimate user's finger at the time of tap input, keystrokes can be estimated with 90% accuracy. However, the method proposed by Ilia et al. requires prior penetration of the target smartphone and the attack scenario lacks plausibility; if the attacker's smartphone can be penetrated, the keylogger can directly acquire the keystrokes of a legitimate user. In addition, the method proposed by Li et al. is a side-channel attack in which the attacker actively interferes with the terminals of legitimate users and can be described as an active attack scenario. Herein, we analyze the extent to which a user's keystrokes are leaked to the attacker in a passive attack scenario, where the attacker wiretaps the sounds of the legitimate user's keystrokes using an external microphone. First, we limited the keystrokes to the personal identification number input. Subsequently, mel-frequency cepstrum coefficients of tapping sound data were represented as image data. Consequently, we found that the input is discriminated with high accuracy using a convolutional neural network to estimate the key input.
URLhttps://ieeexplore.ieee.org/document/9306453
Citation Keyouchi_study_2020