Biblio
The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.
Automated server parameter tuning is crucial to performance and availability of Internet applications hosted in cloud environments. It is challenging due to high dynamics and burstiness of workloads, multi-tier service architecture, and virtualized server infrastructure. In this paper, we investigate automated and agile server parameter tuning for maximizing effective throughput of multi-tier Internet applications. A recent study proposed a reinforcement learning based server parameter tuning approach for minimizing average response time of multi-tier applications. Reinforcement learning is a decision making process determining the parameter tuning direction based on trial-and-error, instead of quantitative values for agile parameter tuning. It relies on a predefined adjustment value for each tuning action. However it is nontrivial or even infeasible to find an optimal value under highly dynamic and bursty workloads. We design a neural fuzzy control based approach that combines the strengths of fast online learning and self-adaptiveness of neural networks and fuzzy control. Due to the model independence, it is robust to highly dynamic and bursty workloads. It is agile in server parameter tuning due to its quantitative control outputs. We implemented the new approach on a testbed of virtualized data center hosting RUBiS and WikiBench benchmark applications. Experimental results demonstrate that the new approach significantly outperforms the reinforcement learning based approach for both improving effective system throughput and minimizing average response time.
In this paper, it is shown that the high automation level of the object-oriented modeling paradigm for physical systems can significantly rationalize the design procedure of fault detection and isolation (FDI) systems. Consequently, an object-oriented FDI method for complex engineering systems consisting of subsystems from different physical domains like mechatronic systems, commercial vehicles, and chemical process plants is developed. The mathematical composition of the objects corresponding to the subsystems results in a differential algebraic equation (DAE) that describes the overall system. This DAE is automatically analyzed and transferred into a set of residual generators that enable a two-stage FDI procedure for multiple fault modes.
Organizations are experiencing an ever-growing concern of how to identify and defend against insider threats. Those who have authorized access to sensitive organizational data are placed in a position of power that could well be abused and could cause significant damage to an organization. This could range from financial theft and intellectual property theft to the destruction of property and business reputation. Traditional intrusion detection systems are neither designed nor capable of identifying those who act maliciously within an organization. In this paper, we describe an automated system that is capable of detecting insider threats within an organization. We define a tree-structure profiling approach that incorporates the details of activities conducted by each user and each job role and then use this to obtain a consistent representation of features that provide a rich description of the user's behavior. Deviation can be assessed based on the amount of variance that each user exhibits across multiple attributes, compared against their peers. We have performed experimentation using ten synthetic data-driven scenarios and found that the system can identify anomalous behavior that may be indicative of a potential threat. We also show how our detection system can be combined with visual analytics tools to support further investigation by an analyst.
Bitcoin is popular not only with consumers, but also with cybercriminals (e.g., in ransomware and online extortion, and commercial online child exploitation). Given the potential of Bitcoin to be involved in a criminal investigation, the need to have an up-to-date and in-depth understanding on the forensic acquisition and analysis of Bitcoins is crucial. However, there has been limited forensic research of Bitcoin in the literature. The general focus of existing research is on postmortem analysis of specific locations (e.g. wallets on mobile devices), rather than a forensic approach that combines live data forensics and postmortem analysis to facilitate the identification, acquisition, and analysis of forensic traces relating to the use of Bitcoins on a system. Hence, the latter is the focus of this paper where we present an open source tool for live forensic and postmortem analysing automatically. Using this open source tool, we describe a list of target artifacts that can be obtained from a forensic investigation of popular Bitcoin clients and Web Wallets on different web browsers installed on Windows 7 and Windows 10 platforms.
Due to the extensive use of network services and emerging security threats, enterprise networks deploy varieties of security devices for controlling resource access based on organizational security requirements. These requirements need fine-grained access control rules based on heterogeneous isolation patterns like access denial, trusted communication, and payload inspection. Organizations are also seeking for usable and optimal security configurations that can harden the network security within enterprise budget constraints. In order to design a security architecture, i.e., the distribution of security devices along with their security policies, that satisfies the organizational security requirements as well as the business constraints, it is required to analyze various alternative security architectures considering placements of network security devices in the network and the corresponding access controls. In this paper, we present an automated formal framework for synthesizing network security configurations. The main design alternatives include different kinds of isolation patterns for network traffic flows. The framework takes security requirements and business constraints along with the network topology as inputs. Then, it synthesizes cost-effective security configurations satisfying the constraints and provides placements of different security devices, optimally distributed in the network, according to the given network topology. In addition, we provide a hypothesis testing-based security architecture refinement mechanism that explores various security design alternatives using ConfigSynth and improves the security architecture by systematically increasing the security requirements. We demonstrate the execution of ConfigSynth and the refinement mechanism using case studies. Finally, we evaluate their scalability using simulated experiments.
Traffic of Industrial Control System (ICS) between the Human Machine Interface (HMI) and the Programmable Logic Controller (PLC) is known to be highly periodic. However, it is sometimes multiplexed, due to asynchronous scheduling. Modeling the network traffic patterns of multiplexed ICS streams using Deterministic Finite Automata (DFA) for anomaly detection typically produces a very large DFA and a high false-alarm rate. In this article, we introduce a new modeling approach that addresses this gap. Our Statechart DFA modeling includes multiple DFAs, one per cyclic pattern, together with a DFA-selector that de-multiplexes the incoming traffic into sub-channels and sends them to their respective DFAs. We demonstrate how to automatically construct the statechart from a captured traffic stream. Our unsupervised learning algorithms first build a Discrete-Time Markov Chain (DTMC) from the stream. Next, we split the symbols into sets, one per multiplexed cycle, based on symbol frequencies and node degrees in the DTMC graph. Then, we create a sub-graph for each cycle and extract Euler cycles for each sub-graph. The final statechart is comprised of one DFA per Euler cycle. The algorithms allow for non-unique symbols, which appear in more than one cycle, and also for symbols that appear more than once in a cycle. We evaluated our solution on traces from a production ICS using the Siemens S7-0x72 protocol. We also stress-tested our algorithms on a collection of synthetically-generated traces that simulated multiplexed ICS traces with varying levels of symbol uniqueness and time overlap. The algorithms were able to split the symbols into sets with 99.6% accuracy. The resulting statechart modeled the traces with a median false-alarm rate of as low as 0.483%. In all but the most extreme scenarios, the Statechart model drastically reduced both the false-alarm rate and the learned model size in comparison with the naive single-DFA model.
The proliferation of digital devices in a networked industrial ecosystem, along with an exponential growth in complexity and scope, has resulted in elevated security concerns and management complexity issues. This paper describes a novel architecture utilizing concepts of autonomic computing and a simple object access protocol (SOAP)-based interface to metadata access points (IF-MAP) external communication layer to create a network security sensor. This approach simplifies integration of legacy software and supports a secure, scalable, and self-managed framework. The contribution of this paper is twofold: 1) A flexible two-level communication layer based on autonomic computing and service oriented architecture is detailed and 2) three complementary modules that dynamically reconfigure in response to a changing environment are presented. One module utilizes clustering and fuzzy logic to monitor traffic for abnormal behavior. Another module passively monitors network traffic and deploys deceptive virtual network hosts. These components of the sensor system were implemented in C++ and PERL and utilize a common internal D-Bus communication mechanism. A proof of concept prototype was deployed on a mixed-use test network showing the possible real-world applicability. In testing, 45 of the 46 network attached devices were recognized and 10 of the 12 emulated devices were created with specific operating system and port configurations. In addition, the anomaly detection algorithm achieved a 99.9% recognition rate. All output from the modules were correctly distributed using the common communication structure.
The proliferation of digital devices in a networked industrial ecosystem, along with an exponential growth in complexity and scope, has resulted in elevated security concerns and management complexity issues. This paper describes a novel architecture utilizing concepts of autonomic computing and a simple object access protocol (SOAP)-based interface to metadata access points (IF-MAP) external communication layer to create a network security sensor. This approach simplifies integration of legacy software and supports a secure, scalable, and self-managed framework. The contribution of this paper is twofold: 1) A flexible two-level communication layer based on autonomic computing and service oriented architecture is detailed and 2) three complementary modules that dynamically reconfigure in response to a changing environment are presented. One module utilizes clustering and fuzzy logic to monitor traffic for abnormal behavior. Another module passively monitors network traffic and deploys deceptive virtual network hosts. These components of the sensor system were implemented in C++ and PERL and utilize a common internal D-Bus communication mechanism. A proof of concept prototype was deployed on a mixed-use test network showing the possible real-world applicability. In testing, 45 of the 46 network attached devices were recognized and 10 of the 12 emulated devices were created with specific operating system and port configurations. In addition, the anomaly detection algorithm achieved a 99.9% recognition rate. All output from the modules were correctly distributed using the common communication structure.
With the increasing number of catastrophic weather events and resulting disruption in the energy supply to essential loads, the distribution grid operators’ focus has shifted from reliability to resiliency against high impact, low-frequency events. Given the enhanced automation to enable the smarter grid, there are several assets/resources at the disposal of electric utilities to enhances resiliency. However, with a lack of comprehensive resilience tools for informed operational decisions and planning, utilities face a challenge in investing and prioritizing operational control actions for resiliency. The distribution system resilience is also highly dependent on system attributes, including network, control, generating resources, location of loads and resources, as well as the progression of an extreme event. In this work, we present a novel multi-stage resilience measure called the Anticipate-Withstand-Recover (AWR) metrics. The AWR metrics are based on integrating relevant ‘system characteristics based factors’, before, during, and after the extreme event. The developed methodology utilizes a pragmatic and flexible approach by adopting concepts from the national emergency preparedness paradigm, proactive and reactive controls of grid assets, graph theory with system and component constraints, and multi-criteria decision-making process. The proposed metrics are applied to provide decision support for a) the operational resilience and b) planning investments, and validated for a real system in Alaska during the entirety of the event progression.
The malware detection arms race involves constant change: malware changes to evade detection and labels change as detection mechanisms react. Recognizing that malware changes over time, prior work has enforced temporally consistent samples by requiring that training binaries predate evaluation binaries. We present temporally consistent labels, requiring that training labels also predate evaluation binaries since training labels collected after evaluation binaries constitute label knowledge from the future. Using a dataset containing 1.1 million binaries from over 2.5 years, we show that enforcing temporal label consistency decreases detection from 91% to 72% at a 0.5% false positive rate compared to temporal samples alone.
The impact of temporal labeling demonstrates the potential of improved labels to increase detection results. Hence, we present a detector capable of selecting binaries for submission to an expert labeler for review. At a 0.5% false positive rate, our detector achieves a 72% true positive rate without an expert, which increases to 77% and 89% with 10 and 80 expert queries daily, respectively. Additionally, we detect 42% of malicious binaries initially undetected by all 32 antivirus vendors from VirusTotal used in our evaluation. For evaluation at scale, we simulate the human expert labeler and show that our approach is robust against expert labeling errors. Our novel contributions include a scalable malware detector integrating manual review with machine learning and the examination of temporal label consistency
The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.
The Internet is vulnerable to bandwidth distributed denial-of-service (BW-DDoS) attacks, wherein many hosts send a huge number of packets to cause congestion and disrupt legitimate traffic. So far, BW-DDoS attacks have employed relatively crude, inefficient, brute force mechanisms; future attacks might be significantly more effective and harmful. To meet the increasing threats, we must deploy more advanced defenses.
A key characteristic of simultaneous fault diagnosis is that the features extracted from the original patterns are strongly dependent. This paper proposes a new model of Bayesian classifier, which removes the fundamental assumption of naive Bayesian, i.e., the independence among features. In our model, the optimal bandwidth selection is applied to estimate the class-conditional probability density function (p.d.f.), which is the essential part of joint p.d.f. estimation. Three well-known indices, i.e., classification accuracy, area under ROC curve, and probability mean square error, are used to measure the performance of our model in simultaneous fault diagnosis. Simulations show that our model is significantly superior to the traditional ones when the dependence exists among features.
Despite decades of effort to combat spam, unwanted and even malicious emails, such as phish which aim to deceive recipients into disclosing sensitive information, still routinely find their way into one’s mailbox. To be sure, email filters manage to stop a large fraction of spam emails from ever reaching users, but spammers and phishers have mastered the art of filter evasion, or manipulating the content of email messages to avoid being filtered. We present a unique behavioral experiment designed to study email filter evasion. Our experiment is framed in somewhat broader terms: given the widespread use of machine learning methods for distinguishing spam and non-spam, we investigate how human subjects manipulate a spam template to evade a classification-based filter. We find that adding a small amount of noise to a filter significantly reduces the ability of subjects to evade it, observing that noise does not merely have a short-term impact, but also degrades evasion performance in the longer term. Moreover, we find that greater coverage of an email template by the classifier (filter) features significantly increases the difficulty of evading it. This observation suggests that aggressive feature reduction—a common practice in applied machine learning—can actually facilitate evasion. In addition to the descriptive analysis of behavior, we develop a synthetic model of human evasion behavior which closely matches observed behavior and effectively replicates experimental findings in simulation.
The delay-tolerant-network (DTN) model is becoming a viable communication alternative to the traditional infrastructural model for modern mobile consumer electronics equipped with short-range communication technologies such as Bluetooth, NFC, and Wi-Fi Direct. Proximity malware is a class of malware that exploits the opportunistic contacts and distributed nature of DTNs for propagation. Behavioral characterization of malware is an effective alternative to pattern matching in detecting malware, especially when dealing with polymorphic or obfuscated malware. In this paper, we first propose a general behavioral characterization of proximity malware which based on naive Bayesian model, which has been successfully applied in non-DTN settings such as filtering email spams and detecting botnets. We identify two unique challenges for extending Bayesian malware detection to DTNs ("insufficient evidence versus evidence collection risk" and "filtering false evidence sequentially and distributedly"), and propose a simple yet effective method, look ahead, to address the challenges. Furthermore, we propose two extensions to look ahead, dogmatic filtering, and adaptive look ahead, to address the challenge of "malicious nodes sharing false evidence." Real mobile network traces are used to verify the effectiveness of the proposed methods.
Computing systems and networks become increasingly large and complex with a variety of compromises and vulnerabilities. The network security and privacy are of great concern today, where self-defense against different kinds of attacks in an autonomous and holistic manner is a challenging topic. To address this problem, we developed an innovative technology called Bionic Autonomic Nervous System (BANS). The BANS is analogous to biological nervous system, which consists of basic modules like cyber axon, cyber neuron, peripheral nerve and central nerve. We also presented an innovative self-defense mechanism which utilizes the Fuzzy Logic, Neural Networks, and Entropy Awareness, etc. Equipped with the BANS, computer and network systems can intelligently self-defend against both known and unknown compromises/attacks including denial of services (DoS), spyware, malware, and virus. BANS also enabled multiple computers to collaboratively fight against some distributed intelligent attacks like DDoS. We have implemented the BANS in practice. Some case studies and experimental results exhibited the effectiveness and efficiency of the BANS and the self-defense mechanism.
We present a technique for bounded invariant verification of nonlinear networked dynamical systems with delayed interconnections. The underlying problem in precise boundedtime verification lies with computing bounds on the sensitivity of trajectories (or solutions) to changes in initial states and inputs of the system. For large networks, computing this sensitivity
with precision guarantees is challenging. We introduce the notion of input-to-state (IS) discrepancy of each module or subsystem in a larger nonlinear networked dynamical system. The IS discrepancy bounds the distance between two solutions or trajectories of a module in terms of their initial states and their inputs. Given the IS discrepancy functions of the modules, we show that it is possible to effectively construct a reduced (low dimensional) time-delayed dynamical system, such that the trajectory of this reduced model precisely bounds the distance between the trajectories of the complete network with changed initial states. Using the above results we develop a sound and relatively complete algorithm for bounded invariant verification of networked dynamical systems consisting of nonlinear modules interacting through possibly delayed signals. Finally, we introduce a local version of IS discrepancy and show that it is possible to compute them using only the Lipschitz constant and the Jacobian of the dynamic function of the modules.