SaTC: CORE: Small: Detecting Social Engineering Attacks Using Semantic Language Analysis

Submitted by Ian Harris on Wed, 03/06/2019 - 5:56pm

Project Details

Lead PI

Ian Harris

Performance Period

Aug 01, 2018 - Jul 31, 2021

Institution(s)

University of California-Irvine

Award Number

1813858

A critical threat to information security is social engineering, the psychological manipulation of people in order to gain access to a system for which the attacker is not authorized. Cyberattackers target the weakest link, and people are often more vulnerable than a hardened computer system. Phishing emails, which fraudulently request private information, are a common version of the attack, but social engineering comes in many more complex conversational forms designed to exploit psychological weaknesses of the target. This project will confront the problem of social engineering by developing automated approaches to detect social engineering attacks in real time and alert the victim before harm can occur. The approach will leverage question answering and natural language understanding techniques to identify conversational statements that have malicious intent across multiple communication media including email, text messages, and text recognized from verbal speech. As part of evaluating these techniques, the project will also develop a large corpus of non-phishing social engineering attacks in the form of audio recordings and written transcripts and make it widely available to support the community of researchers studying these attacks, addressing a key problem around lack of data for research around social engineering. The project will also support the development of several short course modules on social engineering attacks, designed to be easily integrated into existing high school and undergraduate courses in the general space of cybersecurity and human behavior.

The two main goals of the project are to develop approaches to detect social engineering attacks, and develop a social engineering attack corpus. Social engineering attacks will be identified by the fact that the attacker must always perform one of two dialog actions, either asking a question whose answer is private, or issuing a command to perform a forbidden operation. Research in detection will involve two corresponding tasks, question evaluation, to detect malicious questions, and command evaluation, to detect malicious commands. Question evaluation will advance research in question-answering systems to determine the privacy status of the answers to questions posed by the attacker. Because only privacy status and not actual answers are required, the approach developed will tolerate imprecision inherent to current question-answering approaches while still achieving high precision with respect to privacy status. Command evaluation will be performed by summarizing the meaning of a request as a combination of the main verb and the object(s) of that verb in the sentence. Each verb-object pair will be compared to a blacklist of verb-object pairs which are known to describe forbidden operations. Reducing the meaning of a command to the verb-object pair is beneficial as a method to normalize the description of sentences with different syntaxes but identical meaning. In addition to developing detection approaches, a social engineering attack corpus will be developed and will be made publicly available. The most important property of a social engineering attack is that it successfully achieves the goal of either gathering private information or coercing the target to perform an inappropriate operation. In order to ensure that the attacks in the corpus have this property, a validation experiment will be performed with human subjects. Each attack will be applied against several subjects to determine the rate of success of the attack. The realism of the attacks in the corpus will be estimated by evaluating the frequency with which the attack is successful when applied against subjects in our study.

Ian Harris

SaTC: CORE: Small: Detecting Social Engineering Attacks Using Semantic Language Analysis

Lead PI

Performance Period

Institution(s)

Award Number

Related Artifacts