Monitoring, Fusion, and Response for Cyber Resilience - October 2019
PI: William Sanders
Researchers: Brett Feddersen, Carmen Cheh, and Uttam Thakore
HARD PROBLEM(S) ADDRESSED
This refers to Hard Problems, released November 2012.
- Resilient Architectures - Experience suggests that even heavily defended systems can be breached by attackers given enough time, resources and talent. We propose the concept of a response and recovery engine (RRE) so that a system could "tolerate" an intrusion and provide a base level of service. RRE incorporates modules to monitor current state of a system, detect intrusions, and respond to achieve resilience-specific goals. Our work focuses on a few example attacks. These attacks include lateral movement within a network as part of an Advanced Persistent Threat (APT), tampering with monitoring data to hide attacker activity, and application-level distributed denial of service attacks (DDoS).
- Policy-Governed Secure Collaboration - We analyzed the issues surrounding the software-defined networking (SDN) architecture from an accountability standpoint, considering various principals involved (e.g., controller software, network applications, administrators, end users, organizations), mechanisms for assurance about past network state (e.g., data provenance, replicated data stores, roots of trust), thoughts on judging and assessing standards for accountability (e.g., legal, contractual, regulatory), and mechanisms for decentralized enforcement (e.g., blockchain-based smart contracts). We motivated the need for accountability though a network application use case, and we argued that an assured understanding of the past for attribution can help lead to taking better responses for resiliency.
PUBLICATIONS
Papers written as a result of your research from the current quarter only.
[1] U. Thakore, H. V. Ramasamy, W. H. Sanders, "Coordinated Analysis of heterogeneous Monitor Data in Enterprise Clouds for Incident Response," to appear in the 30th International Symposium on Software Reliability Engineering (ISSRE 2019).
Abstract: During incident analysis and response, enterprise cloud administrators want to use as much of their generated monitor data as possible. However, the reality is that decisions are often dictated by the tools actually available to automatically process the monitor data, rather than by an understanding of the relevance of the data for incident response. The significant manual effort and domain expertise required to process diverse cloud monitors means that much monitor data remain unexamined. We propose a framework for simplifying the complexity of data analysis for incident response. Our framework enables coordinated analysis of both metric (numerical) data and log (semi-structured, textual) data and exposes salient features within those data. As a foundation for the framework, we define a taxonomy for fields within monitor data based on insights gained from analyzing logs and metrics collected from all levels of an experimental platform-as-a-service (PaaS) cloud (EPC). Using the taxonomy, we lay out a method for semi-automated feature extraction and discovery across heterogeneous monitors. We then describe a method for feature clustering to promote effective analysis of the data, and to remove redundant and uninformative features. We discuss the application of our framework for incident response within the EPC, including root cause analysis.
[1] C. Cheh, U. Thakore, B. Chen, W.G. Temple, and W.H. Sanders, "Leveraging Physical Access Logs to Identify Tailgating: Limitations and Solutions", to appear in European Dependable Computing Conference.
Abstract: Critical infrastructure facilities use physical access systems to control movement in their facilities. However, the cyber logs collected from such systems are not representative of all human movement in real life, including "tailgating", which is an important problem because it potentially allows unauthorized physical access to critical equipment. In this paper, we identify physical constraints on human movement and use those constraints to motivate several approaches for inferring tailgating from card tap logs. In particular, using our approach, we found 3,999 instances of tailgating in a railway station during a 17-month period. However, certain movement scenarios are not visible in card tap logs. We overcome that limitation by leveraging additional physical data sources to provide information regarding the physical presence of people within a space. We support our findings with an observation experiment that we conducted in a railway station.
KEY HIGHLIGHTS
Each effort should submit one or two specific highlights. Each item should include a paragraph or two along with a citation if available. Write as if for the general reader of IEEE S&P.
The purpose of the highlights is to give our immediate sponsors a body of evidence that the funding they are providing (in the framework of the SoS lablet model) is delivering results that "more than justify" the investment they are making.
Our RRE work incorporates modules to monitor current state of a system, detect intrusions, and respond to achieve resilience-specific goals. Intrusion detection in large-scale distributed systems, which is a necessary precondition for intrusion tolerance and resilience, is highly susceptible to malicious manipulation of system data used for detection (e.g., using rootkits and log tampering), which we term "monitor compromise". Existing literature attempts to counteract the problem using reputation systems, which weight the trustworthiness of monitor data based on past trustworthiness of the data, but such systems are themselves subject to "betrayal attacks" and "sleeper attacks". We instead propose the use of data-driven methods for detecting potential monitor compromise. We leverage the insight that systems usually contain multiple monitors that provide redundant information about system activity, so we can use discrepancies between observations of system activity across different monitors to identify potential monitor compromise.
For monitor compromise detection, we have developed a data-driven ensemble method for detecting potential monitor compromise using evidential reasoning and data mining. To construct the model for our approach, we have devised a method to mine meaningful correlations between system activity (i.e., events) and the discrete data points produced by monitors (i.e., alerts) and between alerts of different types from heterogeneous historical system data. We have trained our models for evidential reasoning and association rule mining on real data from an enterprise system, and applied our detection ensemble method to the real data with meaningful results. We implemented our monitor compromise detection approach using Storm, a real-time stream processing framework, such that it runs in real-time on online monitor data and ran experiments on enterprise network and host data from the National Center for Supercomputing Applications (NCSA) with different, injected compromise scenarios.
To support coordinated analysis of heterogeneous monitor data (which spans numerical metrics to unstructured, textual log data) that is present in large-scale distributed systems, such as enterprise and cloud systems, we have developed a framework to semi-automatically process monitor data from multiple levels of said systems into a manageable set of meaningful time series features for further intrusion or incident analysis. Based on an analysis of how incident response teams in industry utilize monitor data, we have come up with a taxonomy of monitor data fields and devised an approach in which we can take monitor data for which fields have been annotated using our taxonomy, automatically unstack and aggregate them into meaningful time series features, and group together redundant features across all monitors in the system. We have evaluated our approach on experimental PaaS cloud data from an industry partner containing eight different monitor types.
We have also begun work on an approach to explicitly protect monitoring infrastructures against monitor compromise through defensive monitor placement, in a manner that does not require changing intrusion and incident analysis mechanisms already in place. We are developing a methodology to examine a system and network monitoring infrastructure in the context of incident and intrusion detection to quantify susceptibility to monitor compromise and suggest changes to the monitoring that can improve the resiliency of the monitoring infrastructure. As part of this work, we are methodically examining the motives an attacker might have to compromise monitors and the types of compromise that can occur, and considering the effects of each type of compromise on incident detection ability.
COMMUNITY ENGAGEMENTS
No community engagements this quarter.
EDUCATIONAL ADVANCES:
Carmen Chen successfully defended her PhD thesis on June 4th, and is a postdoctoral researcher at Singapore University of Design and Technology.