Visible to the public Removing Personally Identifiable Information from Shared Dataset for Keystroke Authentication Research

TitleRemoving Personally Identifiable Information from Shared Dataset for Keystroke Authentication Research
Publication TypeConference Paper
Year of Publication2019
AuthorsHuang, Jiaju, Klee, Bryan, Schuckers, Daniel, Hou, Daqing, Schuckers, Stephanie
Conference Name2019 IEEE 5th International Conference on Identity, Security, and Behavior Analysis (ISBA)
ISBN Number978-1-7281-0532-1
KeywordsArrays, authentication, authorisation, bank card numbers, biometrics (access control), Collaboration, Continuous Authentication, conventional authentication methods, data privacy, Electronic mail, free-text authentication algorithm, free-text keystroke authentication, Human Behavior, identity theft, insider threat, insider threats, keystroke analysis, keystroke authentication research, keystroke dataset, keystroke dynamics, message authentication, Metrics, password, personally identifiable information, personally sensitive information, PII detection program, policy-based governance, privacy removal, pubcrawl, resilience, Resiliency, social security numbers, Task Analysis, text analysis, user authentication systems
Abstract

Research on keystroke dynamics has the good potential to offer continuous authentication that complements conventional authentication methods in combating insider threats and identity theft before more harm can be done to the genuine users. Unfortunately, the large amount of data required by free-text keystroke authentication often contain personally identifiable information, or PII, and personally sensitive information, such as a user's first name and last name, username and password for an account, bank card numbers, and social security numbers. As a result, there are privacy risks associated with keystroke data that must be mitigated before they are shared with other researchers. We conduct a systematic study to remove PII's from a recent large keystroke dataset. We find substantial amounts of PII's from the dataset, including names, usernames and passwords, social security numbers, and bank card numbers, which, if leaked, may lead to various harms to the user, including personal embarrassment, blackmails, financial loss, and identity theft. We thoroughly evaluate the effectiveness of our detection program for each kind of PII. We demonstrate that our PII detection program can achieve near perfect recall at the expense of losing some useful information (lower precision). Finally, we demonstrate that the removal of PII's from the original dataset has only negligible impact on the detection error tradeoff of the free-text authentication algorithm by Gunetti and Picardi. We hope that this experience report will be useful in informing the design of privacy removal in future keystroke dynamics based user authentication systems.

URLhttps://ieeexplore.ieee.org/document/8778628
DOI10.1109/ISBA.2019.8778628
Citation Keyhuang_removing_2019