Data Driven Security Models and Analysis - January 2015
Public Audience
Purpose: To highlight project progress. Information is generally at a higher level which is accessible to the interested public. All information contained in the report (regions 1-3) is a Government Deliverable/CDRL.
PI(s): Ravi Iyer
Co-PI(s): Zbigniew Kalbarczyk and Adam Slagell
HARD PROBLEM(S) ADDRESSED
This refers to Hard Problems, released November 2012.
- Predictive security metrics – design, development, and validation
- Resilient architectures – in the end we want to use the metrics to achieve a measurable enhancement in system resiliency, i.e., the ability to withstand attacks
- Human behavior – data contain traces of the steps the attacker took, and hence inherently include some aspects of the human behavior (of both users and miscreants)
PUBLICATIONS
Papers published in this quarter as a result of this research. Include title, author(s), venue published/presented, and a short description or abstract. Identify which hard problem(s) the publication addressed. Papers that have not yet been published should be reported in region 2 below.
[1] Cuong Pham, Zachary Estrada, Phuong Cao, Zbigniew Kalbarczyk, and Ravishankar Iyer, “Building Reliable and Secure Virtual Machines using Architectural Invariants”, IEEE Security and Privacy Magazine 2014 Vol. 12, Issue No. 5 Sept.-Oct. 2014.
Paper addresses: Resilient architectures.
Abstract: In this article, we discuss how to address this challenge in the context of cloud computing, for which both reliability and security are growing concerns. Since cloud deployments are usually composed of commodity hardware and software, efficient monitoring is key to achieving resiliency. Although reliability and security monitoring may use different types of analytics, the same sensing infrastructure can be used to provide inputs to monitoring modules. We split the monitoring process into two phases: logging and auditing. We applied the principles stated above when designing HyperTap, a hypervisor-level monitoring framework for Virtual Machines.
ACCOMPLISHMENT HIGHLIGHTS
This quarter we focused on evaluating the effectiveness of applying factor graph (a probabilistic graphical model) to model and detect masquerade attacks (i.e., attacks that use stolen user credentials such as username/passwords or private keys to deliver attack payloads). Automatically collected data logs (e.g., network flows, syslogs, and IDS logs) corresponding to the attacks combined with human written incident reports were used to evaluate our approach. Specifically, each log entry is automatically mapped to a discrete event or events are manually extracted from the incident reports. Each event is associated with a user state (e.g., benign, suspicious, or malicious). Potentially malicious users can be detected by constructing and evaluating factor graph model in which observed/hidden variables/events are linked by factor functions representing functional relations among the variables/events.
For evaluation of this approach, we use data on real-world security attacks collected at the National Center for Supercomputing Applications. The workflow of our experiment is depicted in the Figure 1. The set of the collected incidents is divided into two subsets: the first one consists of all incidents that happened before a certain point in time in the past (Set-A), the second one consists of the remaining incidents (Set-B). The goal is to construct our factor graph model using events from Set-A, and evaluate the detection capabilities of the obtained model using Set-B of unseen incidents. Specifically, the factor functions are manually constructed using Set-A. Based on the constructed factor functions, a factor graph is built for each user in the Set-B. The evolution of user states is inferred from the factor graph using Gibbs sampling method. The compromised user is identified when the last label in the sequence indicates that the user state is malicious.
Specific accomplishments for this reporting period include:
- Implement the process and associated tools to produce experimental dataset (Set-A and Set-B) by processing (i) raw logs (automatically using regular expression scripts) and (ii) written incident reports (manually).
- Define factor functions based on the obtained data sets; each entry in the data set represents a time sequence of events for a user, collected over the duration of a security incident.
- Construct a factor graph for each user involved in the analyzed incidents and evaluate the detection capability of the factor graph based detection
- Discover six hidden malicious users not identified by NCSA security analysts; those users exhibit suspicious activities, e.g., logging in using a password of an inactive account (the inactive account was not properly disabled), followed by anomalous commands (such as querying system information or downloading files with sensitive extensions).
Figure 1 – Workflow for data analysis, factor graph creation, and experimental evaluation