Data Driven Security Models and Analysis - July 2016
Public Audience
Purpose: To highlight project progress. Information is generally at a higher level which is accessible to the interested public. All information contained in the report (regions 1-3) is a Government Deliverable/CDRL.
PI(s): Ravi Iyer
Co-PI(s): Zbigniew Kalbarczyk and Adam Slagell
Researchers: Phuong Cao and Key-whan Chung
HARD PROBLEM(S) ADDRESSED
This refers to Hard Problems, released November 2012.
- Predictive security metrics - design, development, and validation
- Resilient architectures - in the end we want to use the metrics to achieve a measurable enhancement in system resiliency, i.e., the ability to withstand attacks
- Human behavior - data contain traces of the steps the attacker took, and hence inherently include some aspects of the human behavior (of both users and miscreants)
PUBLICATIONS
Papers published in this quarter as a result of this research. Include title, author(s), venue published/presented, and a short description or abstract. Identify which hard problem(s) the publication addressed. Papers that have not yet been published should be reported in region 2 below.
- P. Cao, E. C. Badger, Z. T. Kalbarczyk, R. K. Iyer, "A Framework for Generation, Replay, and Analysis of Real-World Attack Variants," in Symposium and Bootcamp on the Science of Security (HotSoS), Carnegie Mellon University, April 19-21, 2016.
- H. Lin, H. Alemzadeh, D. Chen, Z. Kalbarczyk, R. K. Iyer, "Safety-critical Cyber-physical Attacks: Analysis, Detection, and Mitigation" in Symposium and Bootcamp on the Science of Security (HotSoS), Carnegie Mellon University, April 19-21, 2016.
ACCOMPLISHMENT HIGHLIGHTS
This quarter we extended our effort to study applications of probabilistic graphical models, specifically Factor Graphs, in other application domains (a cloud virtualization environment) and new large enterprise environment (Blue Waters supercomputer at NCSA). Specific accomplishments include:
- Collection of operational logs data including network and system logs from Blue Waters, a petascale supercomputer hosted at NCSA. We extended the AttackTagger (an attack detection tool based on Factor Graphs) to detect attacks targeting Blue Waters.
- Extended evaluation of detection capabilities of machine learning (ML) based techniques (using 648 unique attack variants generated by our framework) and their comparison with the AttackTagger. We found that detection performance of ML techniques are mixed, no technique has an outstanding detection accuracy.
- Applications of Factor Graphs based approach to security monitoring in a virtualized environment. We focused on detection of key-loggers in Windows operating system. Toward this we built a Factor Graph model that provides an online estimation of the probability a process to be a key-logger.
To enrich our attack database, we continuously update our Factor Graph model with current threats. Specifically, we adapt AttackTagger to handle attacks in Blue Waters network. Since Blue Waters is a new system that we want to target, it offers a new set of security-related events that we use to create new factor functions in the Factor Graph model used by the AttackTagger.
To evaluate how other machine learning techniques handle attack variants (we reported in the previous quarter) generated by our framework, we compared detection capabilities of the AttackTagger with Support Vector Machines, Decision Tree, and Stratified Classifier approaches. None of the techniques can consistently detect all the attack variants. Only the Factor Graph based model detects over half of the attack variants for the three experimented attacks.
Attack 1 | Attack 2 | Ataack 3 | |
Factor Graph-based (AttackTagger) |
108/144 (75%) | 108/216 (50%) | 186/288 (64.6%) |
Support Vector Machines | 48 (33.3%) | 0 (0%) | 288 (100%) |
Decision Tree | 48 (33.3%) | 0 (0%) | 260 (90%) |
Stratified Classifer | 1 (0.6%) | 4 (1.8%) | 6 (2%) |
Total number of variants | 144 | 216 | 288 |
Table 1: Attack variant detection results
To explore applications of Factor Graphs in other domains, we are extending the application of Factor Graph based approach to a key-logger detection in Windows operating system. Using a hypervisor-based monitor to monitor Windows Virtual Machines, we detect when a process is scheduled to handle a key press event. For each key press event, an ordered list of scheduled process is available for analysis. The challenge is how to identify a key-logger among the legitimate system and application processes present in the system. Heuristic and threshold-based techniques that rely on monitoring a process position at the list of all active processes usually do not work well because the position of the key-logger process is not consistently at the top of the process list. Thus, those techniques may cause many false positives.
We have built a model to perform online detection of a key-logger process. Given a list of key-press events and a corresponding list of the process positions in the process list, we use factor graph to model the probability of a process being a key-logger. The factor graph model performs online update of the probability of a process being a key-logger, i.e., the probability increases when a process is at the top of the process list and decreases otherwise. The model is being evaluated and compared with heuristic based approaches.