Visible to the public Biblio

Found 320 results

Filters: Keyword is anomaly detection  [Clear All Filters]
2018-07-18
Fauri, Davide, dos Santos, Daniel Ricardo, Costante, Elisa, den Hartog, Jerry, Etalle, Sandro, Tonetta, Stefano.  2017.  From System Specification to Anomaly Detection (and Back). Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy. :13–24.

Industrial control systems have stringent safety and security demands. High safety assurance can be obtained by specifying the system with possible faults and monitoring it to ensure these faults are properly addressed. Addressing security requires considering unpredictable attacker behavior. Anomaly detection, with its data driven approach, can detect simple unusual behavior and system-based attacks like the propagation of malware; on the other hand, anomaly detection is less suitable to detect more complex \textbackslashtextbackslashemph\process-based\ attacks and it provides little actionability in presence of an alert. The alternative to anomaly detection is to use specification-based intrusion detection, which is more suitable to detect process-based attacks, but is typically expensive to set up and less scalable. We propose to combine a lightweight formal system specification with anomaly detection, providing data-driven monitoring. The combination is based on mapping elements of the specification to elements of the network traffic. This allows extracting locations to monitor and relevant context information from the formal specification, thus semantically enriching the raised alerts and making them actionable. On the other hand, it also allows under-specification of data-based properties in the formal model; some predicates can be left uninterpreted and the monitoring can be used to learn a model for them. We demonstrate our methodology on a smart manufacturing use case.

2018-07-06
Kloft, Marius, Laskov, Pavel.  2012.  Security Analysis of Online Centroid Anomaly Detection. J. Mach. Learn. Res.. 13:3681–3724.

Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, we analyze the performance of a particular method–online centroid anomaly detection–in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker's full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker's gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms.

Lampesberger, H..  2016.  An Incremental Learner for Language-Based Anomaly Detection in XML. 2016 IEEE Security and Privacy Workshops (SPW). :156–170.

The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.

2018-06-20
Yadav, S., Trivedi, M. C., Singh, V. K., Kolhe, M. L..  2017.  Securing AODV routing protocol against black hole attack in MANET using outlier detection scheme. 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON). :1–4.

Imposing security in MANET is very challenging and hot topic of research science last two decades because of its wide applicability in applications like defense. Number of efforts has been made in this direction. But available security algorithms, methods, models and framework may not completely solve this problem. Motivated from various existing security methods and outlier detection, in this paper novel simple but efficient outlier detection scheme based security algorithm is proposed to protect the Ad hoc on demand distance vector (AODV) reactive routing protocol from Black hole attack in mobile ad hoc environment. Simulation results obtained from network simulator tool evident the simplicity, robustness and effectiveness of the proposed algorithm over the original AODV protocol and existing methods.

Petersen, E., To, M. A., Maag, S..  2017.  A novel online CEP learning engine for MANET IDS. 2017 IEEE 9th Latin-American Conference on Communications (LATINCOM). :1–6.

In recent years the use of wireless ad hoc networks has seen an increase of applications. A big part of the research has focused on Mobile Ad Hoc Networks (MAnETs), due to its implementations in vehicular networks, battlefield communications, among others. These peer-to-peer networks usually test novel communications protocols, but leave out the network security part. A wide range of attacks can happen as in wired networks, some of them being more damaging in MANETs. Because of the characteristics of these networks, conventional methods for detection of attack traffic are ineffective. Intrusion Detection Systems (IDSs) are constructed on various detection techniques, but one of the most important is anomaly detection. IDSs based only in past attacks signatures are less effective, even more if these IDSs are centralized. Our work focuses on adding a novel Machine Learning technique to the detection engine, which recognizes attack traffic in an online way (not to store and analyze after), re-writing IDS rules on the fly. Experiments were done using the Dockemu emulation tool with Linux Containers, IPv6 and OLSR as routing protocol, leading to promising results.

2018-06-07
Aygun, R. C., Yavuz, A. G..  2017.  Network Anomaly Detection with Stochastically Improved Autoencoder Based Models. 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud). :193–198.

Intrusion detection systems do not perform well when it comes to detecting zero-day attacks, therefore improving their performance in that regard is an active research topic. In this study, to detect zero-day attacks with high accuracy, we proposed two deep learning based anomaly detection models using autoencoder and denoising autoencoder respectively. The key factor that directly affects the accuracy of the proposed models is the threshold value which was determined using a stochastic approach rather than the approaches available in the current literature. The proposed models were tested using the KDDTest+ dataset contained in NSL-KDD, and we achieved an accuracy of 88.28% and 88.65% respectively. The obtained results show that, as a singular model, our proposed anomaly detection models outperform any other singular anomaly detection methods and they perform almost the same as the newly suggested hybrid anomaly detection models.

Liang, Jingxi, Zhao, Wen, Ye, Wei.  2017.  Anomaly-Based Web Attack Detection: A Deep Learning Approach. Proceedings of the 2017 VI International Conference on Network, Communication and Computing. :80–85.
As the era of cloud technology arises, more and more people are beginning to migrate their applications and personal data to the cloud. This makes web-based applications an attractive target for cyber-attacks. As a result, web-based applications now need more protections than ever. However, current anomaly-based web attack detection approaches face the difficulties like unsatisfying accuracy and lack of generalization. And the rule-based web attack detection can hardly fight unknown attacks and is relatively easy to bypass. Therefore, we propose a novel deep learning approach to detect anomalous requests. Our approach is to first train two Recurrent Neural Networks (RNNs) with the complicated recurrent unit (LSTM unit or GRU unit) to learn the normal request patterns using only normal requests unsupervisedly and then supervisedly train a neural network classifier which takes the output of RNNs as the input to discriminate between anomalous and normal requests. We tested our model on two datasets and the results showed that our model was competitive with the state-of-the-art. Our approach frees us from feature selection. Also to the best of our knowledge, this is the first time that the RNN is applied on anomaly-based web attack detection systems.
2018-05-30
Price-Williams, M., Heard, N., Turcotte, M..  2017.  Detecting Periodic Subsequences in Cyber Security Data. 2017 European Intelligence and Security Informatics Conference (EISIC). :84–90.

Anomaly detection for cyber-security defence hasgarnered much attention in recent years providing an orthogonalapproach to traditional signature-based detection systems.Anomaly detection relies on building probability models ofnormal computer network behaviour and detecting deviationsfrom the model. Most data sets used for cyber-security havea mix of user-driven events and automated network events,which most often appears as polling behaviour. Separating theseautomated events from those caused by human activity is essentialto building good statistical models for anomaly detection. This articlepresents a changepoint detection framework for identifyingautomated network events appearing as periodic subsequences ofevent times. The opening event of each subsequence is interpretedas a human action which then generates an automated, periodicprocess. Difficulties arising from the presence of duplicate andmissing data are addressed. The methodology is demonstrated usingauthentication data from Los Alamos National Laboratory'senterprise computer network.

Moriano, Pablo, Pendleton, Jared, Rich, Steven, Camp, L Jean.  2017.  Insider Threat Event Detection in User-System Interactions. Proceedings of the 2017 International Workshop on Managing Insider Security Threats. :1–12.

Detection of insider threats relies on monitoring individuals and their interactions with organizational resources. Identification of anomalous insiders typically relies on supervised learning models that use labeled data. However, such labeled data is not easily obtainable. The labeled data that does exist is also limited by current insider threat detection methods and undetected insiders would not be included. These models also inherently assume that the insider threat is not rapidly evolving between model generation and use of the model in detection. Yet there is a large body of research that illustrates that the insider threat changes significantly after some types of precipitating events, such as layoffs, significant restructuring, and plant or facility closure. To capture this temporal evolution of user-system interactions, we use an unsupervised learning framework to evaluate whether potential insider threat events are triggered following precipitating events. The analysis leverages a bipartite graph of user and system interactions. The approach shows a clear correlation between precipitating events and the number of apparent anomalies. The results of our empirical analysis show a clear shift in behaviors after events which have previously been shown to increase insider activity, specifically precipitating events. We argue that this metadata about the level of insider threat behaviors validates the potential of the approach. We apply our method to a dataset that comprises interactions between engineers and software components in an enterprise version control system spanning more than 22 years. We use this unlabeled dataset and automatically detect statistically significant events. We show that there is statistically significant evidence that a subset of users diversify their committing behavior after precipitating events have been announced. Although these findings do not constitute detection of insider threat events per se, they do identify patterns of potentially malicious high-risk insider behavior. They reinforce the idea that insider operations can be motivated by the insiders' environment. Our proposed framework outperforms algorithms based on naive random approaches and algorithms using volume dependent statistics. This graph mining technique has potential for early detection of insider threat behavior in user-system interactions independent of the volume of interactions. The proposed method also enables organizations without a corpus of identified insider threats to train its own anomaly detection system.

2018-05-24
Sallam, A., Bertino, E..  2017.  Detection of Temporal Insider Threats to Relational Databases. 2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC). :406–415.

The mitigation of insider threats against databases is a challenging problem as insiders often have legitimate access privileges to sensitive data. Therefore, conventional security mechanisms, such as authentication and access control, may be insufficient for the protection of databases against insider threats and need to be complemented with techniques that support real-time detection of access anomalies. The existing real-time anomaly detection techniques consider anomalies in references to the database entities and the amounts of accessed data. However, they are unable to track the access frequencies. According to recent security reports, an increase in the access frequency by an insider is an indicator of a potential data misuse and may be the result of malicious intents for stealing or corrupting the data. In this paper, we propose techniques for tracking users' access frequencies and detecting anomalous related activities in real-time. We present detailed algorithms for constructing accurate profiles that describe the access patterns of the database users and for matching subsequent accesses by these users to the profiles. Our methods report and log mismatches as anomalies that may need further investigation. We evaluated our techniques on the OLTP-Benchmark. The results of the evaluation indicate that our techniques are very effective in the detection of anomalies.

2018-05-09
Markman, Chen, Wool, Avishai, Cardenas, Alvaro A..  2017.  A New Burst-DFA Model for SCADA Anomaly Detection. Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy. :1–12.

In Industrial Control Systems (ICS/SCADA), machine to machine data traffic is highly periodic. Past work showed that in many cases, it is possible to model the traffic between each individual Programmable Logic Controller (PLC) and the SCADA server by a cyclic Deterministic Finite Automaton (DFA), and to use the model to detect anomalies in the traffic. However, a recent analysis of network traffic in a water facility in the U.S, showed that cyclic-DFA models have limitations. In our research, we examine the same data corpus; our study shows that the communication on all of the channels in the network is done in bursts of packets, and that the bursts have semantic meaning---the order within a burst depends on the messages. Using these observations, we suggest a new burst-DFA model that fits the data much better than previous work. Our model treats the traffic on each channel as a series of bursts, and matches each burst to the DFA, taking the burst's beginning and end into account. Our burst-DFA model successfully explains between 95% and 99% of the packets in the data-corpus, and goes a long way toward the construction of a practical anomaly detection system.

Jonsdottir, G., Wood, D., Doshi, R..  2017.  IoT network monitor. 2017 IEEE MIT Undergraduate Research Technology Conference (URTC). :1–5.
IoT Network Monitor is an intuitive and user-friendly interface for consumers to visualize vulnerabilities of IoT devices in their home. Running on a Raspberry Pi configured as a router, the IoT Network Monitor analyzes the traffic of connected devices in three ways. First, it detects devices with default passwords exploited by previous attacks such as the Mirai Botnet, changes default device passwords to randomly generated 12 character strings, and reports the new passwords to the user. Second, it conducts deep packet analysis on the network data from each device and notifies the user of potentially sensitive personal information that is being transmitted in cleartext. Lastly, it detects botnet traffic originating from an IoT device connected to the network and instructs the user to disconnect the device if it has been hacked. The user-friendly IoT Network Monitor will enable homeowners to maintain the security of their home network and better understand what actions are appropriate when a certain security vulnerability is detected. Wide adoption of this tool will make consumer home IoT networks more secure.
2018-04-11
Zeng, H., Wang, B., Deng, W., Gao, X..  2017.  CENTRA: CENtrally Trusted Routing vAlidation for IGP. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :21–24.

Trusted routing is a hot spot in network security. Lots of efforts have been made on trusted routing validation for Interior Gateway Protocols (IGP), e.g., using Public Key Infrastructure (PKI) to enhance the security of protocols, or routing monitoring systems. However, the former is limited by further deployment in the practical Internet, the latter depends on a complete, accurate, and fresh knowledge base-this is still a big challenge (Internet Service Providers (ISPs) are not willing to leak their routing policies). In this paper, inspired by the idea of centrally controlling in Software Defined Network (SDN), we propose a CENtrally Trusted Routing vAlidation framework, named CENTRA, which can automated collect routing information, centrally detect anomaly and deliver secure routing policy. We implement the proposed framework using NETCONF as the communication protocol and YANG as the data model. The experimental results reveal that CENTRA can detect and block anomalous routing in real time. Comparing to existing secure routing mechanism, CENTRA improves the detection efficiency and real-time significantly.

Yoon, Man-Ki, Mohan, Sibin, Choi, Jaesik, Christodorescu, Mihai, Sha, Lui.  2017.  Learning Execution Contexts from System Call Distribution for Anomaly Detection in Smart Embedded System. Proceedings of the Second International Conference on Internet-of-Things Design and Implementation. :191–196.

Existing techniques used for anomaly detection do not fully utilize the intrinsic properties of embedded devices. In this paper, we propose a lightweight method for detecting anomalous executions using a distribution of system call frequencies. We use a cluster analysis to learn the legitimate execution contexts of embedded applications and then monitor them at run-time to capture abnormal executions. Our prototype applied to a real-world open-source embedded application shows that the proposed method can effectively detect anomalous executions without relying on sophisticated analyses or affecting the critical execution paths.

2018-04-04
Nawaratne, R., Bandaragoda, T., Adikari, A., Alahakoon, D., Silva, D. De, Yu, X..  2017.  Incremental knowledge acquisition and self-learning for autonomous video surveillance. IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society. :4790–4795.

The world is witnessing a remarkable increase in the usage of video surveillance systems. Besides fulfilling an imperative security and safety purpose, it also contributes towards operations monitoring, hazard detection and facility management in industry/smart factory settings. Most existing surveillance techniques use hand-crafted features analyzed using standard machine learning pipelines for action recognition and event detection. A key shortcoming of such techniques is the inability to learn from unlabeled video streams. The entire video stream is unlabeled when the requirement is to detect irregular, unforeseen and abnormal behaviors, anomalies. Recent developments in intelligent high-level video analysis have been successful in identifying individual elements in a video frame. However, the detection of anomalies in an entire video feed requires incremental and unsupervised machine learning. This paper presents a novel approach that incorporates high-level video analysis outcomes with incremental knowledge acquisition and self-learning for autonomous video surveillance. The proposed approach is capable of detecting changes that occur over time and separating irregularities from re-occurrences, without the prerequisite of a labeled dataset. We demonstrate the proposed approach using a benchmark video dataset and the results confirm its validity and usability for autonomous video surveillance.

2018-03-26
Pallaprolu, S. C., Sankineni, R., Thevar, M., Karabatis, G., Wang, J..  2017.  Zero-Day Attack Identification in Streaming Data Using Semantics and Spark. 2017 IEEE International Congress on Big Data (BigData Congress). :121–128.

Intrusion Detection Systems (IDS) have been in existence for many years now, but they fall short in efficiently detecting zero-day attacks. This paper presents an organic combination of Semantic Link Networks (SLN) and dynamic semantic graph generation for the on the fly discovery of zero-day attacks using the Spark Streaming platform for parallel detection. In addition, a minimum redundancy maximum relevance (MRMR) feature selection algorithm is deployed to determine the most discriminating features of the dataset. Compared to previous studies on zero-day attack identification, the described method yields better results due to the semantic learning and reasoning on top of the training data and due to the use of collaborative classification methods. We also verified the scalability of our method in a distributed environment.

Lu, Sixing, Lysecky, Roman.  2017.  Time and Sequence Integrated Runtime Anomaly Detection for Embedded Systems. ACM Trans. Embed. Comput. Syst.. 17:38:1–38:27.

Network-connected embedded systems grow on a large scale as a critical part of Internet of Things, and these systems are under the risk of increasing malware. Anomaly-based detection methods can detect malware in embedded systems effectively and provide the advantage of detecting zero-day exploits relative to signature-based detection methods, but existing approaches incur significant performance overheads and are susceptible to mimicry attacks. In this article, we present a formal runtime security model that defines the normal system behavior including execution sequence and execution timing. The anomaly detection method in this article utilizes on-chip hardware to non-intrusively monitor system execution through trace port of the processor and detect malicious activity at runtime. We further analyze the properties of the timing distribution for control flow events, and select subset of monitoring targets by three selection metrics to meet hardware constraint. The designed detection method is evaluated by a network-connected pacemaker benchmark prototyped in FPGA and simulated in SystemC, with several mimicry attacks implemented at different levels. The resulting detection rate and false positive rate considering constraints on the number of monitored events supported in the on-chip hardware demonstrate good performance of our approach.

2018-02-27
Lighari, S. N., Hussain, D. M. A..  2017.  Hybrid Model of Rule Based and Clustering Analysis for Big Data Security. 2017 First International Conference on Latest Trends in Electrical Engineering and Computing Technologies (IN℡LECT). :1–5.

The most of the organizations tend to accumulate the data related to security, which goes up-to terabytes in every month. They collect this data to meet the security requirements. The data is mostly in the shape of logs like Dns logs, Pcap files, and Firewall data etc. The data can be related to any communication network like cloud, telecom, or smart grid network. Generally, these logs are stored in databases or warehouses which becomes ultimately gigantic in size. Such a huge size of data upsurge the importance of security analytics in big data. In surveys, the security experts grumble about the existing tools and recommend for special tools and methods for big data security analysis. In this paper, we are using a big data analysis tool, which is known as apache spark. Although this tool is used for general purpose but we have used this for security analysis. It offers a very good library for machine learning algorithms including the clustering which is the main algorithm used in our work. In this work, we have developed a novel model, which combines rule based and clustering analysis for security analysis of big dataset. The dataset we are using in our experiment is the Kddcup99 which is a widely used dataset for intrusion detection. It is of MBs in size but can be used as a test case for big data security analysis.

Moore, Michael R., Bridges, Robert A., Combs, Frank L., Starr, Michael S., Prowell, Stacy J..  2017.  Modeling Inter-Signal Arrival Times for Accurate Detection of CAN Bus Signal Injection Attacks: A Data-Driven Approach to In-Vehicle Intrusion Detection. Proceedings of the 12th Annual Conference on Cyber and Information Security Research. :11:1–11:4.

Modern vehicles rely on hundreds of on-board electronic control units (ECUs) communicating over in-vehicle networks. As external interfaces to the car control networks (such as the on-board diagnostic (OBD) port, auxiliary media ports, etc.) become common, and vehicle-to-vehicle / vehicle-to-infrastructure technology is in the near future, the attack surface for vehicles grows, exposing control networks to potentially life-critical attacks. This paper addresses the need for securing the controller area network (CAN) bus by detecting anomalous traffic patterns via unusual refresh rates of certain commands. While previous works have identified signal frequency as an important feature for CAN bus intrusion detection, this paper provides the first such algorithm with experiments using three attacks in five (total) scenarios. Our data-driven anomaly detection algorithm requires only five seconds of training time (on normal data) and achieves true positive / false discovery rates of 0.9998/0.00298, respectively (micro-averaged across the five experimental tests).

2018-02-15
Bittner, Daniel M., Sarwate, Anand D., Wright, Rebecca N..  2017.  Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract). Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. :53–53.
We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous individuals and still enables efficient and accurate detection of the anomalous individuals. Our method ensures that using differentially private aggregation of groups of points, providing privacy to individuals within the group while refining the group selection to the point that we can probabilistically narrow attention to a small numbers of individuals or samples for further attention. To summarize: We introduce a new notion of anomaly-restriction differential privacy, which may be of independent interest. We provide a noisy group-based search algorithm that satisfies the anomaly-restricted differential privacy definition. We provide both theoretical and empirical analysis of our noisy search algorithm, showing that it performs well in some cases, and exhibits the usual privacy/accuracy tradeoff of differentially private mechanisms. Potential anomaly detection applications for our work might include spatial search for outliers: this would rely on new sensing technologies that can perform queries in aggregate to reveal and isolate anomalous outliers. For example, this could lead to privacy-sensitive methods for searching for outlying cell phone activity patterns or Internet activity patterns in a geographic location.
2018-02-06
Eslami, M., Zheng, G., Eramian, H., Levchuk, G..  2017.  Anomaly Detection on Bipartite Graphs for Cyber Situational Awareness and Threat Detection. 2017 IEEE International Conference on Big Data (Big Data). :4741–4743.

Data from cyber logs can often be represented as a bipartite graph (e.g. internal IP-external IP, user-application, or client-server). State-of-the-art graph based anomaly detection often generalizes across all types of graphs — namely bipartite and non-bipartite. This confounds the interpretation and use of specific graph features such as degree, page rank, and eigencentrality that can provide a security analyst with rapid situational awareness of their network. Furthermore, graph algorithms applied to data collected from large, distributed enterprise scale networks require accompanying methods that allow them to scale to the data collected. In this paper, we provide a novel, scalable, directional graph projection framework that operates on cyber logs that can be represented as bipartite graphs. This framework computes directional graph projections and identifies a set of interpretable graph features that describe anomalies within each partite.

Tiwari, T., Turk, A., Oprea, A., Olcoz, K., Coskun, A. K..  2017.  User-Profile-Based Analytics for Detecting Cloud Security Breaches. 2017 IEEE International Conference on Big Data (Big Data). :4529–4535.

While the growth of cloud-based technologies has benefited the society tremendously, it has also increased the surface area for cyber attacks. Given that cloud services are prevalent today, it is critical to devise systems that detect intrusions. One form of security breach in the cloud is when cyber-criminals compromise Virtual Machines (VMs) of unwitting users and, then, utilize user resources to run time-consuming, malicious, or illegal applications for their own benefit. This work proposes a method to detect unusual resource usage trends and alert the user and the administrator in real time. We experiment with three categories of methods: simple statistical techniques, unsupervised classification, and regression. So far, our approach successfully detects anomalous resource usage when experimenting with typical trends synthesized from published real-world web server logs and cluster traces. We observe the best results with unsupervised classification, which gives an average F1-score of 0.83 for web server logs and 0.95 for the cluster traces.

2018-02-02
Scofield, Daniel, Miles, Craig, Kuhn, Stephen.  2017.  Fast Model Learning for the Detection of Malicious Digital Documents. Proceedings of the 7th Software Security, Protection, and Reverse Engineering / Software Security and Protection Workshop. :3:1–3:8.

Modern cyber attacks are often conducted by distributing digital documents that contain malware. The approach detailed herein, which consists of a classifier that uses features derived from dynamic analysis of a document viewer as it renders the document in question, is capable of classifying the disposition of digital documents with greater than 98% accuracy even when its model is trained on just small amounts of data. To keep the classification model itself small and thereby to provide scalability, we employ an entity resolution strategy that merges syntactically disparate features that are thought to be semantically equivalent but vary due to programmatic randomness. Entity resolution enables construction of a comprehensive model of benign functionality using relatively few training documents, and the model does not improve significantly with additional training data.

2018-01-23
Eslami, M., Zheng, G., Eramian, H., Levchuk, G..  2017.  Deriving cyber use cases from graph projections of cyber data represented as bipartite graphs. 2017 IEEE International Conference on Big Data (Big Data). :4658–4663.

Graph analysis can capture relationships between network entities and can be used to identify and rank anomalous hosts, users, or applications from various types of cyber logs. It is often the case that the data in the logs can be represented as a bipartite graph (e.g. internal IP-external IP, user-application, or client-server). State-of-the-art graph based anomaly detection often generalizes across all types of graphs — namely bipartite and non-bipartite. This confounds the interpretation and use of specific graph features such as degree, page rank, and eigencentrality that can provide a security analyst with situational awareness and even insights to potential attacks on enterprise scale networks. Furthermore, graph algorithms applied to data collected from large, distributed enterprise scale networks require accompanying methods that allow them to scale to the data collected. In this paper, we provide a novel, scalable, directional graph projection framework that operates on cyber logs that can be represented as bipartite graphs. We also present methodologies to further narrow returned results to anomalous/outlier cases that may be indicative of a cyber security event. This framework computes directional graph projections and identifies a set of interpretable graph features that describe anomalies within each partite.

2018-01-16
Zhou, Chong, Paffenroth, Randy C..  2017.  Anomaly Detection with Robust Deep Autoencoders. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :665–674.

Deep autoencoders, and other deep neural networks, have demonstrated their effectiveness in discovering non-linear features across many problem domains. However, in many real-world problems, large outliers and pervasive noise are commonplace, and one may not have access to clean training data as required by standard deep denoising autoencoders. Herein, we demonstrate novel extensions to deep autoencoders which not only maintain a deep autoencoders' ability to discover high quality, non-linear features but can also eliminate outliers and noise without access to any clean training data. Our model is inspired by Robust Principal Component Analysis, and we split the input data X into two parts, \$X = L\_\D\ + S\$, where \$L\_\D\\$ can be effectively reconstructed by a deep autoencoder and \$S\$ contains the outliers and noise in the original data X. Since such splitting increases the robustness of standard deep autoencoders, we name our model a "Robust Deep Autoencoder (RDA)". Further, we present generalizations of our results to grouped sparsity norms which allow one to distinguish random anomalies from other types of structured corruptions, such as a collection of features being corrupted across many instances or a collection of instances having more corruptions than their fellows. Such "Group Robust Deep Autoencoders (GRDA)" give rise to novel anomaly detection approaches whose superior performance we demonstrate on a selection of benchmark problems.