Biblio
The current authentication systems based on password and pin code are not enough to guarantee attacks from malicious users. For this reason, in the last years, several studies are proposed with the aim to identify the users basing on their typing dynamics. In this paper, we propose a deep neural network architecture aimed to discriminate between different users using a set of keystroke features. The idea behind the proposed method is to identify the users silently and continuously during their typing on a monitored system. To perform such user identification effectively, we propose a feature model able to capture the typing style that is specific to each given user. The proposed approach is evaluated on a large dataset derived by integrating two real-world datasets from existing studies. The merged dataset contains a total of 1530 different users each writing a set of different typing samples. Several deep neural networks, with an increasing number of hidden layers and two different sets of features, are tested with the aim to find the best configuration. The final best classifier scores a precision equal to 0.997, a recall equal to 0.99 and an accuracy equal to 99% using an MLP deep neural network with 9 hidden layers. Finally, the performances obtained by using the deep learning approach are also compared with the performance of traditional decision-trees machine learning algorithm, attesting the effectiveness of the deep learning-based classifiers in the domain of keystroke analysis.
In this paper we examine the use of covert channels based on CPU load in order to achieve persistent user identification through browser sessions. In particular, we demonstrate that an HTML5 video, a GIF image, or CSS animations on a webpage can be used to force the CPU to produce a sequence of distinct load levels, even without JavaScript or any client-side code. These load levels can be then captured either by another browsing session, running on the same or a different browser in parallel to the browsing session we want to identify, or by a malicious app installed on the device. To get a good estimation of the CPU load caused by the target session, the receiver can observe system statistics about CPU activity (app), or constantly measure time it takes to execute a known code segment (app and browser). Furthermore, for mobile devices we propose a sensor-based approach to estimate the CPU load, based on exploiting disturbances of the magnetometer sensor data caused by the high CPU activity. Captured loads can be decoded and translated into an identifying bit string, which is transmitted back to the attacker. Due to the way loads are produced, these methods are applicable even in highly restrictive browsers, such as the Tor Browser, and run unnoticeably to the end user. Therefore, unlike existing ways of web tracking, our methods circumvent most of the existing countermeasures, as they store the identifying information outside the browsing session being targeted. Finally, we also thoroughly evaluate and assess each presented method of generating and receiving the signal, and provide an overview of potential countermeasures.
The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.
The paper considers an issues of protecting data from unauthorized access by users' authentication through keystroke dynamics. It proposes to use keyboard pressure parameters in combination with time characteristics of keystrokes to identify a user. The authors designed a keyboard with special sensors that allow recording complementary parameters. The paper presents an estimation of the information value for these new characteristics and error probabilities of users' identification based on the perceptron algorithms, Bayes' rule and quadratic form networks. The best result is the following: 20 users are identified and the error rate is 0.6%.
The Internet of Things (IoT) paradigm, in conjunction with the one of smart cities, is pursuing toward the concept of smart buildings, i.e., “intelligent” buildings able to receive data from a network of sensors and thus to adapt the environment. IoT sensors can monitor a wide range of environmental features such as the energy consumption inside a building at fine-grained level (e.g., for a specific wall-socket). Some smart buildings already deploy energy monitoring in order to optimize the energy use for good purposes (e.g., to save money, to reduce pollution). Unfortunately, such measurements raise a significant amount of privacy concerns. In this paper, we investigate the feasibility of recognizing the pair laptop-user (i.e., a user using her own laptop) from the energy traces produced by her laptop. We design MTPlug, a framework that achieves this goal relying on supervised machine learning techniques as pattern recognition in multivariate time series. We present a comprehensive implementation of this system and run a thorough set of experiments. In particular, we collected data by monitoring the energy consumption of two groups of laptop users, some office employees and some intruders, for a total of 27 people. We show that our system is able to build an energy profile for a laptop user with accuracy above 80%, in less than 3.5 hours of laptop usage. To the best of our knowledge, this is the first research that assesses the feasibility of laptop users profiling relying uniquely on fine-grained energy traces collected using wall-socket smart meters.
As any veteran of the editor wars can attest, Unix users can be fiercely and irrationally attached to the commands they use and the manner in which they use them. In this work, we investigate the problem of identifying users out of a large set of candidates (25-97) through their command-line histories. Using standard algorithms and feature sets inspired by natural language authorship attribution literature, we demonstrate conclusively that individual users can be identified with a high degree of accuracy through their command-line behavior. Further, we report on the best performing feature combinations, from the many thousands that are possible, both in terms of accuracy and generality. We validate our work by experimenting on three user corpora comprising data gathered over three decades at three distinct locations. These are the Greenberg user profile corpus (168 users), Schonlau masquerading corpus (50 users) and Cal Poly command history corpus (97 users). The first two are well known corpora published in 1991 and 2001 respectively. The last is developed by the authors in a year-long study in 2014 and represents the most recent corpus of its kind. For a 50 user configuration, we find feature sets that can successfully identify users with over 90% accuracy on the Cal Poly, Greenberg and one variant of the Schonlau corpus, and over 87% on the other Schonlau variant.
This paper proposes and describes an active authentication model based on user profiles built from user-issued commands when interacting with GUI-based application. Previous behavioral models derived from user issued commands were limited to analyzing the user's interaction with the *Nix (Linux or Unix) command shell program. Human-computer interaction (HCI) research has explored the idea of building users profiles based on their behavioral patterns when interacting with such graphical interfaces. It did so by analyzing the user's keystroke and/or mouse dynamics. However, none had explored the idea of creating profiles by capturing users' usage characteristics when interacting with a specific application beyond how a user strikes the keyboard or moves the mouse across the screen. We obtain and utilize a dataset of user command streams collected from working with Microsoft (MS) Word to serve as a test bed. User profiles are first built using MS Word commands and identification takes place using machine learning algorithms. Best performance in terms of both accuracy and Area under the Curve (AUC) for Receiver Operating Characteristic (ROC) curve is reported using Random Forests (RF) and AdaBoost with random forests.