Biblio
Keystroke dynamics study the way in which users input text via their keyboards, which is unique to each individual, and can form a component of a behavioral biometric system to improve existing account security. Keystroke dynamics systems on free-text data use n-graphs that measure the timing between consecutive keystrokes to distinguish between users. Many algorithms require 500, 1,000, or more keystrokes to achieve EERs of below 10%. In this paper, we propose an instance-based graph comparison algorithm to reduce the number of keystrokes required to authenticate users. Commonly used features such as monographs and digraphs are investigated. Feature importance is determined and used to construct a fused classifier. Detection error tradeoff (DET) curves are produced with different numbers of keystrokes. The fused classifier outperforms the state-of-the-art with EERs of 7.9%, 5.7%, 3.4%, and 2.7% for test samples of 50, 100, 200, and 500 keystrokes.
One of the basic behavioural biometric methods is keystroke element. Being less expensive and not requiring any extra bit of equipment is the main advantage of keystroke element. The primary concentration of this paper is to give an inevitable review of behavioural biometrics strategies, measurements and different methodologies and difficulties and future bearings specially of keystroke analysis and mouse dynamics. Keystrokes elements frameworks utilize insights, e.g. time between keystrokes, word decisions, word mixes, general speed of writing and so on. Mouse Dynamics is termed as the course of actions captured from the moving mouse by an individual when interacting with a GUI. These are representative factors which may be called mouse dynamics signature of an individual, and may be used for verification of identity of an individual. In this paper, we compare the authentication system based on keystroke dynamics and mouse dynamics.
keystroke dynamics authenticates the system user by analyzing his typing rhythm. Given that each of us has his own typing rhythm and that the method is based on the keyboard makes it available in all computer machines, these two reasons (uniqueness and reduced cost) have made the method very solicit by administrators of security. In addition, the researchers used the method in different fields that are listed later in the paper.
Research on keystroke dynamics has the good potential to offer continuous authentication that complements conventional authentication methods in combating insider threats and identity theft before more harm can be done to the genuine users. Unfortunately, the large amount of data required by free-text keystroke authentication often contain personally identifiable information, or PII, and personally sensitive information, such as a user's first name and last name, username and password for an account, bank card numbers, and social security numbers. As a result, there are privacy risks associated with keystroke data that must be mitigated before they are shared with other researchers. We conduct a systematic study to remove PII's from a recent large keystroke dataset. We find substantial amounts of PII's from the dataset, including names, usernames and passwords, social security numbers, and bank card numbers, which, if leaked, may lead to various harms to the user, including personal embarrassment, blackmails, financial loss, and identity theft. We thoroughly evaluate the effectiveness of our detection program for each kind of PII. We demonstrate that our PII detection program can achieve near perfect recall at the expense of losing some useful information (lower precision). Finally, we demonstrate that the removal of PII's from the original dataset has only negligible impact on the detection error tradeoff of the free-text authentication algorithm by Gunetti and Picardi. We hope that this experience report will be useful in informing the design of privacy removal in future keystroke dynamics based user authentication systems.
The paper considers an issues of protecting data from unauthorized access by users' authentication through keystroke dynamics. It proposes to use keyboard pressure parameters in combination with time characteristics of keystrokes to identify a user. The authors designed a keyboard with special sensors that allow recording complementary parameters. The paper presents an estimation of the information value for these new characteristics and error probabilities of users' identification based on the perceptron algorithms, Bayes' rule and quadratic form networks. The best result is the following: 20 users are identified and the error rate is 0.6%.
Users search for multimedia content with different underlying motivations or intentions. Study of user search intentions is an emerging topic in information retrieval since understanding why a user is searching for a content is crucial for satisfying the user's need. In this paper, we aimed at automatically recognizing a user's intent for image search in the early stage of a search session. We designed seven different search scenarios under the intent conditions of finding items, re-finding items and entertainment. We collected facial expressions, physiological responses, eye gaze and implicit user interactions from 51 participants who performed seven different search tasks on a custom-built image retrieval platform. We analyzed the users' spontaneous and explicit reactions under different intent conditions. Finally, we trained machine learning models to predict users' search intentions from the visual content of the visited images, the user interactions and the spontaneous responses. After fusing the visual and user interaction features, our system achieved the F-1 score of 0.722 for classifying three classes in a user-independent cross-validation. We found that eye gaze and implicit user interactions, including mouse movements and keystrokes are the most informative features. Given that the most promising results are obtained by modalities that can be captured unobtrusively and online, the results demonstrate the feasibility of deploying such methods for improving multimedia retrieval platforms.
Acoustic emanations of computer keyboards represent a serious privacy issue. As demonstrated in prior work, physical properties of keystroke sounds might reveal what a user is typing. However, previous attacks assumed relatively strong adversary models that are not very practical in many real-world settings. Such strong models assume: (i) adversary's physical proximity to the victim, (ii) precise profiling of the victim's typing style and keyboard, and/or (iii) significant amount of victim's typed information (and its corresponding sounds) available to the adversary. This paper presents and explores a new keyboard acoustic eavesdropping attack that involves Voice-over-IP (VoIP), called Skype & Type (S&T), while avoiding prior strong adversary assumptions. This work is motivated by the simple observation that people often engage in secondary activities (including typing) while participating in VoIP calls. As expected, VoIP software acquires and faithfully transmits all sounds, including emanations of pressed keystrokes, which can include passwords and other sensitive information. We show that one very popular VoIP software (Skype) conveys enough audio information to reconstruct the victim's input – keystrokes typed on the remote keyboard. Our results demonstrate that, given some knowledge on the victim's typing style and keyboard model, the attacker attains top-5 accuracy of 91.7% in guessing a random key pressed by the victim. Furthermore, we demonstrate that S&T is robust to various VoIP issues (e.g., Internet bandwidth fluctuations and presence of voice over keystrokes), thus confirming feasibility of this attack. Finally, it applies to other popular VoIP software, such as Google Hangouts.
Web identifiers such as usernames, hashtags, and domain names serve important roles in online navigation, communication, and community building. Therefore the entities that choose such names must ensure that end-users are able to quickly and accurately enter them in applications. Uniqueness requirements, a desire for short strings, and an absence of delimiters often constrain this name selection process. To gain perspective on the speed and correctness of name entry, we crowdsource the typing of 51,000+ web identifiers. Surface level analysis reveals, for example, that typing speed is generally a linear function of identifier length. Examining keystroke dynamics at finer granularity proves more interesting. First, we identify features predictive of typing time/accuracy, finding: (1) the commonality of character bi-grams inside a name, and (2) the degree of ambiguity when tokenizing a name - to be most indicative. A machine-learning model built over 10 such features exhibits moderate predictive capability. Second, we evaluate our hypothesis that users subconsciously insert pauses in their typing cadence where text delimiters (e.g., spaces) would exist, if permitted. The data generally supports this claim, suggesting its application alongside algorithmic tokenization methods, and possibly in name suggestion frameworks.
In this paper, we propose a novel method, based on keystroke dynamics, to distinguish between fake and truthful personal information written via a computer keyboard. Our method does not need any prior knowledge about the user who is providing data. To our knowledge, this is the first work that associates the typing human behavior with the production of lies regarding personal information. Via experimental analysis involving 190 subjects, we assess that this method is able to distinguish between truth and lies on specific types of autobiographical information, with an accuracy higher than 75%. Specifically, for information usually required in online registration forms (e.g., name, surname and email), the typing behavior diverged significantly between truthful or untruthful answers. According to our results, keystroke analysis could have a great potential in detecting the veracity of self-declared information, and it could be applied to a large number of practical scenarios requiring users to input personal data remotely via keyboard.
Free text keystroke dynamics is a behavioral biometric that has the strong potential to offer unobtrusive and continuous user authentication. Unfortunately, due to the limited data availability, free text keystroke dynamics have not been tested adequately. Based on a novel large dataset of free text keystrokes from our ongoing data collection using behavior in natural settings, we present the first study to evaluate keystroke dynamics while respecting the temporal order of the data. Specifically, we evaluate the performance of different ways of forming a test sample using sessions, as well as a form of continuous authentication that is based on a sliding window on the keystroke time series. Instead of accumulating a new test sample of keystrokes, we update the previous sample with keystrokes that occur in the immediate past sliding window of n minutes. We evaluate sliding windows of 1 to 5, 10, and 30 minutes. Our best performer using a sliding window of 1 minute, achieves an FAR of 1% and an FRR of 11.5%. Lastly, we evaluate the sensitivity of the keystroke dynamics algorithm to short quick insider attacks that last only several minutes, by artificially injecting different portions of impostor keystrokes into the genuine test samples. For example, the evaluated algorithm is found to be able to detect insider attacks that last 2.5 minutes or longer, with a probability of 98.4%.
Keystroke Dynamics can be used as an unobtrusive method to enhance password authentication, by checking the typing rhythm of the user. Fixed passwords will give an attacker the possibility to try to learn to mimic the typing behaviour of a victim. In this paper we will investigate the performance of a keystroke dynamic (KD) system when the users have to type given (English) words. Under the assumption that it is easy to type words in your native language and difficult in a foreign language will we also test the performance of such a challenge-based KD system when the challenges are not common English words, but words in the native language of the user. We collected data from participants with 6 different native language backgrounds and had them type random 8-12 character words in each of the 6 languages. The participants also typed random English words and random French words. English was assumed to be a language familiar to all participants, while French was not a native language to any participant and most likely most participants were not fluent in French. Analysis showed that using language dependent words gave a better performance of the challenge-based KD compared to an all English challenge-based system. When using words in a native language, then the performance of the participants with their mother-tongue equal to that native language had a similar performance compared to the all English challenge-based system, but the non-native speakers had an FMR that was significantly lower than the native language speakers. We found that native Telugu speakers had an FMR of less than 1% when writing Spanish or Slovak words. We also found that duration features were best to recognize genuine users, but latency features performed best to recognize non-native impostor users.
In this paper, an innovative approach to keyboard user monitoring (authentication), using keyboard dynamics and founded on the concept of time series analysis, is presented. The work is motivated by the need for robust authentication mechanisms in the context of on-line assessment such as those featured in many online learning platforms. Four analysis mechanisms are considered: analysis of keystroke time series in their raw form (without any translation), analysis consequent to translating the time series into a more compact form using either the Discrete Fourier Transform or the Discrete Wavelet Transform, and a "benchmark" feature vector representation of the form typically used in previous related work. All four mechanisms are fully described and evaluated. A best authentication accuracy of 99% was obtained using the wavelet transform.
Third-party IME (Input Method Editor) apps are often the preference means of interaction for Android users' input. In this paper, we first discuss the insecurity of IME apps, including the Potentially Harmful Apps (PHA) and malicious IME apps, which may leak users' sensitive keystrokes. The current defense system, such as I-BOX, is vulnerable to the prefix-substitution attack and the colluding attack due to the post-IME nature. We provide a deeper understanding that all the designs with the post-IME nature are subject to the prefix-substitution and colluding attacks. To remedy the above post-IME system's flaws, we propose a new idea, pre-IME, which guarantees that "Is this touch event a sensitive keystroke?" analysis will always access user touch events prior to the execution of any IME app code. We designed an innovative TrustZone-based framework named IM-Visor which has the pre-IME nature. Specifically, IM-Visor creates the isolation environment named STIE as soon as a user intends to type on a soft keyboard, then the STIE intercepts, translates and analyzes the user's touch input. If the input is sensitive, the translation of keystrokes will be delivered to user apps through a trusted path. Otherwise, IM-Visor replays non-sensitive keystroke touch events for IME apps or replays non-keystroke touch events for other apps. A prototype of IM-Visor has been implemented and tested with several most popular IMEs. The experimental results show that IM-Visor has small runtime overheads.
Biometrics has become ubiquitous and spurred common use in many authentication mechanisms. Keystroke dynamics is a form of behavioral biometrics that can be used for user authentication while actively working at a terminal. The proposed mechanisms involve digraph, trigraph and n-graph analysis as separate solutions or suggest a fusion mechanism with certain limitations. However, deep learning can be used as a unifying machine learning technique that consolidates the power of all different features since it has shown tremendous results in image recognition and natural language processing. In this paper, we investigate the applicability of deep learning on three different datasets by using convolutional neural networks and Gaussian data augmentation technique. We achieve 10% higher accuracy and 7.3% lower equal error rate (EER) than existing methods. Also, our sensitivity analysis indicates that the convolution operation and the fully-connected layer are the most prominent factors that affect the accuracy and the convergence rate of a network trained with keystroke data.
Authenticating a user based on her unique behavioral bio-metric traits has been extensively researched over the past few years. The most researched behavioral biometrics techniques are based on keystroke and mouse dynamics. These schemes, however, have been shown to be vulnerable to human-based and robotic attacks that attempt to mimic the user's behavioral pattern to impersonate the user. In this paper, we aim to verify the user's identity through the use of active, cognition-based user interaction in the authentication process. Such interaction boasts to provide two key advantages. First, it may enhance the security of the authentication process as multiple rounds of active interaction would serve as a mechanism to prevent against several types of attacks, including zero-effort attack, expert trained attackers, and automated attacks. Second, it may enhance the usability of the authentication process by actively engaging the user in the process. We explore the cognitive authentication paradigm through very simplistic interactive challenges, called Dynamic Cognitive Games, which involve objects floating around within the images, where the user's task is to match the objects with their respective target(s) and drag/drop them to the target location(s). Specifically, we introduce, build and study Gametrics ("Game-based biometrics"), an authentication mechanism based on the unique way the user solves such simple challenges captured by multiple features related to her cognitive abilities and mouse dynamics. Based on a comprehensive data set collected in both online and lab settings, we show that Gametrics can identify the users with a high accuracy (false negative rates, FNR, as low as 0.02) while rejecting zero-effort attackers (false positive rates, FPR, as low as 0.02). Moreover, Gametrics shows promising results in defending against expert attackers that try to learn and later mimic the user's pattern of solving the challenges (FPR for expert human attacker as low as 0.03). Furthermore, we argue that the proposed biometrics is hard to be replayed or spoofed by automated means, such as robots or malware attacks.
The function of query auto-completion in modern search engines is to help users formulate queries fast and precisely. Conventional context-aware methods primarily rank candidate queries according to term- and query- relationships to the context. However, most sessions are extremely short. How to capture search intents with such relationships becomes difficult when the context generally contains only few queries. In this paper, we investigate the feasibility of discovering search intents within short context for query auto-completion. The class distribution of the search session (i.e., issued queries and click behavior) is derived as search intents. Several distribution-based features are proposed to estimate the proximity between candidates and search intents. Finally, we apply learning-to-rank to predict the user's intended query according to these features. Moreover, we also design an ensemble model to combine the benefits of our proposed features and term-based conventional approaches. Extensive experiments have been conducted on the publicly available AOL search engine log. The experimental results demonstrate that our approach significantly outperforms six competitive baselines. The performance of keystrokes is also evaluated in experiments. Furthermore, an in-depth analysis is made to justify the usability of search intent classification for query auto-completion.
Important information regarding the learning experience and relative preparedness of Computer Science students can be obtained by analyzing their coding activity at a fine-grained level, using an online IDE that records student code editing, compiling, and testing activities down to the individual keystroke. We report results from analyses of student coding patterns using such an online IDE. In particular, we gather data from a group of students performing an assigned programming lab, using the online IDE indicated to gather statistics. We extract high-level statistics from the student data, and apply supervised learning techniques to identify those that are the most salient prediction of student success as measured by later performance in the class. We use these results to make predictions of course performance for another student group, and report on the reliability of those predictions
In this paper, we consider side-channel mechanisms, specifically using smart device ambient light sensors, to capture information about user computing activity. We distinguish keyboard keystrokes using only the ambient light sensor readings from a smart watch worn on the user's non-dominant hand. Additionally, we investigate the feasibility of capturing screen emanations for determining user browser usage patterns. The experimental results expose privacy and security risks, as well as the potential for new mobile user interfaces and applications.
In traditional programming courses, students have usually been at least partly graded using pen and paper exams. One of the problems related to such exams is that they only partially connect to the practice conducted within such courses. Testing students in a more practical environment has been constrained due to the limited resources that are needed, for example, for authentication. In this work, we study whether students in a programming course can be identified in an exam setting based solely on their typing patterns. We replicate an earlier study that indicated that keystroke analysis can be used for identifying programmers. Then, we examine how a controlled machine examination setting affects the identification accuracy, i.e. if students can be identified reliably in a machine exam based on typing profiles built with data from students' programming assignments from a course. Finally, we investigate the identification accuracy in an uncontrolled machine exam, where students can complete the exam at any time using any computer they want. Our results indicate that even though the identification accuracy deteriorates when identifying students in an exam, the accuracy is high enough to reliably identify students if the identification is not required to be exact, but top k closest matches are regarded as correct.
Wearable devices, such as smartwatches, are furnished with state-of-the-art sensors that enable a range of context-aware applications. However, malicious applications can misuse these sensors, if access is left unaudited. In this paper, we demonstrate how applications that have access to motion or inertial sensor data on a modern smartwatch can recover text typed on an external QWERTY keyboard. Due to the distinct nature of the perceptible motion sensor data, earlier research efforts on emanation based keystroke inference attacks are not readily applicable in this scenario. The proposed novel attack framework characterizes wrist movements (captured by the inertial sensors of the smartwatch worn on the wrist) observed during typing, based on the relative physical position of keys and the direction of transition between pairs of keys. Eavesdropped keystroke characteristics are then matched to candidate words in a dictionary. Multiple evaluations show that our keystroke inference framework has an alarmingly high classification accuracy and word recovery rate. With the information recovered from the wrist movements perceptible by a smartwatch, we exemplify the risks associated with unaudited access to seemingly innocuous sensors (e.g., accelerometers and gyroscopes) of wearable devices. As part of our efforts towards preventing such side-channel attacks, we also develop and evaluate a novel context-aware protection framework which can be used to automatically disable (or downgrade) access to motion sensors, whenever typing activity is detected.
In this study, we present WindTalker, a novel and practical keystroke inference framework that allows an attacker to infer the sensitive keystrokes on a mobile device through WiFi-based side-channel information. WindTalker is motivated from the observation that keystrokes on mobile devices will lead to different hand coverage and the finger motions, which will introduce a unique interference to the multi-path signals and can be reflected by the channel state information (CSI). The adversary can exploit the strong correlation between the CSI fluctuation and the keystrokes to infer the user's number input. WindTalker presents a novel approach to collect the target's CSI data by deploying a public WiFi hotspot. Compared with the previous keystroke inference approach, WindTalker neither deploys external devices close to the target device nor compromises the target device. Instead, it utilizes the public WiFi to collect user's CSI data, which is easy-to-deploy and difficult-to-detect. In addition, it jointly analyzes the traffic and the CSI to launch the keystroke inference only for the sensitive period where password entering occurs. WindTalker can be launched without the requirement of visually seeing the smart phone user's input process, backside motion, or installing any malware on the tablet. We implemented Windtalker on several mobile phones and performed a detailed case study to evaluate the practicality of the password inference towards Alipay, the largest mobile payment platform in the world. The evaluation results show that the attacker can recover the key with a high successful rate.
Traditional authentication techniques such as static passwords are vulnerable to replay and guessing attacks. Recently, many studies have been conducted on keystroke dynamics as a promising behavioral biometrics for strengthening user authentication, however, current keystroke based solutions suffer from a numerous number of features with an insufficient number of samples which lead to a high verification error rate and high verification time. In this paper, a keystroke dynamics based authentication system is proposed for cloud environments that supports fixed and free text samples. The proposed system utilizes the ReliefF dimensionality reduction method, as a preprocessing step, to minimize the feature space dimensionality. The proposed system applies clustering to users' profile templates to reduce the verification time. The proposed system is applied to two different benchmark datasets. Experimental results prove the effectiveness and efficiency of the proposed system.
In recent years, simple password-based authentication systems have increasingly proven ineffective for many classes of real-world devices. As a result, many researchers have concentrated their efforts on the design of new biometric authentication systems. This trend has been further accelerated by the advent of mobile devices, which offer numerous sensors and capabilities to implement a variety of mobile biometric authentication systems. Along with the advances in biometric authentication, however, attacks have also become much more sophisticated and many biometric techniques have ultimately proven inadequate in face of advanced attackers in practice. In this paper, we investigate the effectiveness of sensor-enhanced keystroke dynamics, a recent mobile biometric authentication mechanism that combines a particularly rich set of features. In our analysis, we consider different types of attacks, with a focus on advanced attacks that draw from general population statistics. Such attacks have already been proven effective in drastically reducing the accuracy of many state-of-the-art biometric authentication systems. We implemented a statistical attack against sensor-enhanced keystroke dynamics and evaluated its impact on detection accuracy. On one hand, our results show that sensor-enhanced keystroke dynamics are generally robust against statistical attacks with a marginal equal-error rate impact (textless0.14%). On the other hand, our results show that, surprisingly, keystroke timing features non-trivially weaken the security guarantees provided by sensor features alone. Our findings suggest that sensor dynamics may be a stronger biometric authentication mechanism against recently proposed practical attacks.
Detecting early trends indicating cognitive decline can allow older adults to better manage their health, but current assessments present barriers precluding the use of such continuous monitoring by consumers. To explore the effects of cognitive status on computer interaction patterns, the authors collected typed text samples from older adults with and without pre-mild cognitive impairment (PreMCI) and constructed statistical models from keystroke and linguistic features for differentiating between the two groups. Using both feature sets, they obtained a 77.1 percent correct classification rate with 70.6 percent sensitivity, 83.3 percent specificity, and a 0.808 area under curve (AUC). These results are in line with current assessments for MC–a more advanced disease–but using an unobtrusive method. This research contributes a combination of features for text and keystroke analysis and enhances understanding of how clinicians or older adults themselves might monitor for PreMCI through patterns in typed text. It has implications for embedded systems that can enable healthcare providers and consumers to proactively and continuously monitor changes in cognitive function.