Evaluating Behavioral Biometrics for Continuous Authentication: Challenges and Metrics

Submitted by grigby1 on Tue, 05/01/2018 - 11:21am

Title	Evaluating Behavioral Biometrics for Continuous Authentication: Challenges and Metrics
Publication Type	Conference Paper
Year of Publication	2017
Authors	Eberz, Simon, Rasmussen, Kasper B., Lenders, Vincent, Martinovic, Ivan
Conference Name	Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4944-4
Keywords	biometrics, Continuous Authentication, Metrics
Abstract	In recent years, behavioral biometrics have become a popular approach to support continuous authentication systems. Most generally, a continuous authentication system can make two types of errors: false rejects and false accepts. Based on this, the most commonly reported metrics to evaluate systems are the False Reject Rate (FRR) and False Accept Rate (FAR). However, most papers only report the mean of these measures with little attention paid to their distribution. This is problematic as systematic errors allow attackers to perpetually escape detection while random errors are less severe. Using 16 biometric datasets we show that these systematic errors are very common in the wild. We show that some biometrics (such as eye movements) are particularly prone to systematic errors, while others (such as touchscreen inputs) show more even error distributions. Our results also show that the inclusion of some distinctive features lowers average error rates but significantly increases the prevalence of systematic errors. As such, blind optimization of the mean EER (through feature engineering or selection) can sometimes lead to lower security. Following this result we propose the Gini Coefficient (GC) as an additional metric to accurately capture different error distributions. We demonstrate the usefulness of this measure both to compare different systems and to guide researchers during feature selection. In addition to the selection of features and classifiers, some non- functional machine learning methodologies also affect error rates. The most notable examples of this are the selection of training data and the attacker model used to develop the negative class. 13 out of the 25 papers we analyzed either include imposter data in the negative class or randomly sample training data from the entire dataset, with a further 6 not giving any information on the methodology used. Using real-world data we show that both of these decisions lead to significant underestimation of error rates by 63% and 81%, respectively. This is an alarming result, as it suggests that researchers are either unaware of the magnitude of these effects or might even be purposefully attempting to over-optimize their EER without actually improving the system.
DOI	10.1145/3052973.3053032
Citation Key	eberz_evaluating_2017

Groups:

Science of Security VO