Biblio
Volume anomaly such as distributed denial-of-service (DDoS) has been around for ages but with advancement in technologies, they have become stronger, shorter and weapon of choice for attackers. Digital forensic analysis of intrusions using alerts generated by existing intrusion detection system (IDS) faces major challenges, especially for IDS deployed in large networks. In this paper, the concept of automatically sifting through a huge volume of alerts to distinguish the different stages of a DDoS attack is developed. The proposed novel framework is purpose-built to analyze multiple logs from the network for proactive forecast and timely detection of DDoS attacks, through a combined approach of Shannon-entropy concept and clustering algorithm of relevant feature variables. Experimental studies on a cyber-range simulation dataset from the project industrial partners show that the technique is able to distinguish precursor alerts for DDoS attacks, as well as the attack itself with a very low false positive rate (FPR) of 22.5%. Application of this technique greatly assists security experts in network analysis to combat DDoS attacks.
In this paper, the principle of the kernel extreme learning machine (ELM) is analyzed. Based on that, we introduce a kind of multi-scale wavelet kernel extreme learning machine classifier and apply it to electroencephalographic (EEG) signal feature classification. Experiments show that our classifier achieves excellent performance.
Language vector space models (VSMs) have recently proven to be effective across a variety of tasks. In VSMs, each word in a corpus is represented as a real-valued vector. These vectors can be used as features in many applications in machine learning and natural language processing. In this paper, we study the effect of vector space representations in cyber security. In particular, we consider a passive traffic analysis attack (Website Fingerprinting) that threatens users' navigation privacy on the web. By using anonymous communication, Internet users (such as online activists) may wish to hide the destination of web pages they access for different reasons such as avoiding tyrant governments. Traditional website fingerprinting studies collect packets from the users' network and extract features that are used by machine learning techniques to reveal the destination of certain web pages. In this work, we propose the packet to vector (P2V) approach where we model website fingerprinting attack using word vector representations. We show how the suggested model outperforms previous website fingerprinting works.
Increased use of Android devices and its open source development framework has attracted many digital crime groups to use Android devices as one of the key attack surfaces. Due to the extensive connectivity and multiple sources of network connections, Android devices are most suitable to botnet based malware attacks. The research focuses on developing a cloud-based Android botnet malware detection system. A prototype of the proposed system is deployed which provides a runtime Android malware analysis. The paper explains architectural implementation of the developed system using a botnet detection learning dataset and multi-layered algorithm used to predict botnet family of a particular application.
Language vector space models (VSMs) have recently proven to be effective across a variety of tasks. In VSMs, each word in a corpus is represented as a real-valued vector. These vectors can be used as features in many applications in machine learning and natural language processing. In this paper, we study the effect of vector space representations in cyber security. In particular, we consider a passive traffic analysis attack (Website Fingerprinting) that threatens users' navigation privacy on the web. By using anonymous communication, Internet users (such as online activists) may wish to hide the destination of web pages they access for different reasons such as avoiding tyrant governments. Traditional website fingerprinting studies collect packets from the users' network and extract features that are used by machine learning techniques to reveal the destination of certain web pages. In this work, we propose the packet to vector (P2V) approach where we model website fingerprinting attack using word vector representations. We show how the suggested model outperforms previous website fingerprinting works.
Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations – asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
Spatial-multiplexing cameras have emerged as a promising alternative to classical imaging devices, often enabling acquisition of `more for less'. One popular architecture for spatial multiplexing is the single-pixel camera (SPC), which acquires coded measurements of the scene with pseudo-random spatial masks. Significant theoretical developments over the past few years provide a means for reconstruction of the original imagery from coded measurements at sub-Nyquist sampling rates. Yet, accurate reconstruction generally requires high measurement rates and high signal-to-noise ratios. In this paper, we enquire if one can perform high-level visual inference problems (e.g. face recognition or action recognition) from compressive cameras without the need for image reconstruction. This is an interesting question since in many practical scenarios, our goals extend beyond image reconstruction. However, most inference tasks often require non-linear features and it is not clear how to extract such features directly from compressed measurements. In this paper, we show that one can extract nontrivial correlational features directly without reconstruction of the imagery. As a specific example, we consider the problem of face recognition beyond the visible spectrum e.g in the short-wave infra-red region (SWIR) - where pixels are expensive. We base our framework on smashed filters which suggests that inner-products between high-dimensional signals can be computed in the compressive domain to a high degree of accuracy. We collect a new face image dataset of 30 subjects, obtained using an SPC. Using face recognition as an example, we show that one can indeed perform reconstruction-free inference with a very small loss of accuracy at very high compression ratios of 100 and more.
A robust appearance model is usually required in visual tracking, which can handle pose variation, illumination variation, occlusion and many other interferences occurring in video. So far, a number of tracking algorithms make use of image samples in previous frames to update appearance models. There are many limitations of that approach: 1) At the beginning of tracking, there exists no sufficient amount of data for online update because these adaptive models are data-dependent and 2) in many challenging situations, robustly updating the appearance models is difficult, which often results in drift problems. In this paper, we proposed a tracking algorithm based on compressive sensing theory and particle filter framework. Features are extracted by random projection with data-independent basis. Particle filter is employed to make a more accurate estimation of the target location and make much of the updated classifier. The robustness and the effectiveness of our tracker have been demonstrated in several experiments.
More and more advanced persistent threat attacks has happened since 2009. This kind of attacks usually use more than one zero-day exploit to achieve its goal. Most of the times, the target computer will execute malicious program after the user open an infected compound document. The original detection method becomes inefficient as the attackers using a zero-day exploit to structure these compound documents. Inspired by the detection method based on structural entropy, we apply wavelet analysis to malicious document detection system. In our research, we use wavelet analysis to extract features from the raw data. These features will be used todetect whether the compound document was embed malicious code.
In this paper, we present the design, architecture, and implementation of a novel analysis engine, called Feature Collection and Correlation Engine (FCCE), that finds correlations across a diverse set of data types spanning over large time windows with very small latency and with minimal access to raw data. FCCE scales well to collecting, extracting, and querying features from geographically distributed large data sets. FCCE has been deployed in a large production network with over 450,000 workstations for 3 years, ingesting more than 2 billion events per day and providing low latency query responses for various analytics. We explore two security analytics use cases to demonstrate how we utilize the deployment of FCCE on large diverse data sets in the cyber security domain: 1) detecting fluxing domain names of potential botnet activity and identifying all the devices in the production network querying these names, and 2) detecting advanced persistent threat infection. Both evaluation results and our experience with real-world applications show that FCCE yields superior performance over existing approaches, and excels in the challenging cyber security domain by correlating multiple features and deriving security intelligence.
Among most of the cyber attacks that occured, the most drastic are advanced persistent threats. APTs are differ from other attacks as they have multiple phases, often silent for long period of time and launched by adamant, well-funded opponents. These targeted attacks mainly concentrated on government agencies and organizations in industries, as are those involved in international trade and having sensitive data. APTs escape from detection by antivirus solutions, intrusion detection and intrusion prevention systems and firewalls. In this paper we proposes a classification model having 99.8% accuracy, for the detection of APT.
An intelligent recovery evaluation system is presented for objective assessment and performance monitoring of anterior cruciate ligament reconstructed (ACL-R) subjects. The system acquires 3-D kinematics of tibiofemoral joint and electromyography (EMG) data from surrounding muscles during various ambulatory and balance testing activities through wireless body-mounted inertial and EMG sensors, respectively. An integrated feature set is generated based on different features extracted from data collected for each activity. The fuzzy clustering and adaptive neuro-fuzzy inference techniques are applied to these integrated feature sets in order to provide different recovery progress assessment indicators (e.g., current stage of recovery, percentage of recovery progress as compared to healthy group, etc.) for ACL-R subjects. The system was trained and tested on data collected from a group of healthy and ACL-R subjects. For recovery stage identification, the average testing accuracy of the system was found above 95% (95-99%) for ambulatory activities and above 80% (80-84%) for balance testing activities. The overall recovery evaluation performed by the proposed system was found consistent with the assessment made by the physiotherapists using standard subjective/objective scores. The validated system can potentially be used as a decision supporting tool by physiatrists, physiotherapists, and clinicians for quantitative rehabilitation analysis of ACL-R subjects in conjunction with the existing recovery monitoring systems.
The dynamic nature of the Web 2.0 and the heavy obfuscation of web-based attacks complicate the job of the traditional protection systems such as Firewalls, Anti-virus solutions, and IDS systems. It has been witnessed that using ready-made toolkits, cyber-criminals can launch sophisticated attacks such as cross-site scripting (XSS), cross-site request forgery (CSRF) and botnets to name a few. In recent years, cyber-criminals have targeted legitimate websites and social networks to inject malicious scripts that compromise the security of the visitors of such websites. This involves performing actions using the victim browser without his/her permission. This poses the need to develop effective mechanisms for protecting against Web 2.0 attacks that mainly target the end-user. In this paper, we address the above challenges from information flow control perspective by developing a framework that restricts the flow of information on the client-side to legitimate channels. The proposed model tracks sensitive information flow and prevents information leakage from happening. The proposed model when applied to the context of client-side web-based attacks is expected to provide a more secure browsing environment for the end-user.
The number of vulnerabilities and reported attacks on Web systems are showing increasing trends, which clearly illustrate the need for better understanding of malicious cyber activities. In this paper we use clustering to classify attacker activities aimed at Web systems. The empirical analysis is based on four datasets, each in duration of several months, collected by high-interaction honey pots. The results show that behavioral clustering analysis can be used to distinguish between attack sessions and vulnerability scan sessions. However, the performance heavily depends on the dataset. Furthermore, the results show that attacks differ from vulnerability scans in a small number of features (i.e., session characteristics). Specifically, for each dataset, the best feature selection method (in terms of the high probability of detection and low probability of false alarm) selects only three features and results into three to four clusters, significantly improving the performance of clustering compared to the case when all features are used. The best subset of features and the extent of the improvement, however, also depend on the dataset.
Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These hashes find extensive applications in content authentication, image indexing for database search and watermarking. Modern robust hashing algorithms consist of feature extraction, a randomization stage to introduce non-invertibility, followed by quantization and binary encoding to produce a binary hash. This paper describes a novel algorithm for generating an image hash based on Log-Polar transform features. The Log-Polar transform is a part of the Fourier-Mellin transformation, often used in image recognition and registration techniques due to its invariant properties to geometric operations. First, we show that the proposed perceptual hash is resistant to content-preserving operations like compression, noise addition, moderate geometric and filtering. Second, we illustrate the discriminative capability of our hash in order to rapidly distinguish between two perceptually different images. Third, we study the security of our method for image authentication purposes. Finally, we show that the proposed hashing method can provide both excellent security and robustness.
Although current Internet operations generate voluminous data, they remain largely oblivious of traffic data semantics. This poses many inefficiencies and challenges due to emergent or anomalous behavior impacting the vast array of Internet elements such as services and protocols. In this paper, we propose a Data Semantics Management System (DSMS) for learning Internet traffic data semantics to enable smarter semantics- driven networking operations. We extract networking semantics and build and utilize a dynamic ontology of network concepts to better recognize and act upon emergent or abnormal behavior. Our DSMS utilizes: (1) Latent Dirichlet Allocation algorithm (LDA) for latent features extraction and semantics reasoning; (2) big tables as a cloud-like data storage technique to maintain large-scale data; and (3) Locality Sensitive Hashing algorithm (LSH) for reducing data dimensionality. Our preliminary evaluation using real Internet traffic shows the efficacy of DSMS for learning behavior of normal and abnormal traffic data and for accurately detecting anomalies at low cost.
Due to design and fabrication outsourcing to foundries, the problem of malicious modifications to integrated circuits known as hardware Trojans has attracted attention in academia as well as industry. To reduce the risks associated with Trojans, researchers have proposed different approaches to detect them. Among these approaches, test-time detection approaches have drawn the greatest attention and most approaches assume the existence of a “golden model”. Prior works suggest using reverse-engineering to identify such Trojan-free ICs for the golden model but they did not state how to do this efficiently. In this paper, we propose an innovative and robust reverseengineering approach to identify the Trojan-free ICs. We adapt a well-studied machine learning method, one-class support vector machine, to solve our problem. Simulation results using state-of-the-art tools on several publicly available circuits show that our approach can detect hardware Trojans with high accuracy rate across different modeling and algorithm parameters.
The dynamic nature of the Web 2.0 and the heavy obfuscation of web-based attacks complicate the job of the traditional protection systems such as Firewalls, Anti-virus solutions, and IDS systems. It has been witnessed that using ready-made toolkits, cyber-criminals can launch sophisticated attacks such as cross-site scripting (XSS), cross-site request forgery (CSRF) and botnets to name a few. In recent years, cyber-criminals have targeted legitimate websites and social networks to inject malicious scripts that compromise the security of the visitors of such websites. This involves performing actions using the victim browser without his/her permission. This poses the need to develop effective mechanisms for protecting against Web 2.0 attacks that mainly target the end-user. In this paper, we address the above challenges from information flow control perspective by developing a framework that restricts the flow of information on the client-side to legitimate channels. The proposed model tracks sensitive information flow and prevents information leakage from happening. The proposed model when applied to the context of client-side web-based attacks is expected to provide a more secure browsing environment for the end-user.
Botnets are one of the most destructive threats against the cyber security. Recently, HTTP protocol is frequently utilized by botnets as the Command and Communication (C&C) protocol. In this work, we aim to detect HTTP based botnet activity based on botnet behaviour analysis via machine learning approach. To achieve this, we employ flow-based network traffic utilizing NetFlow (via Softflowd). The proposed botnet analysis system is implemented by employing two different machine learning algorithms, C4.5 and Naive Bayes. Our results show that C4.5 learning algorithm based classifier obtained very promising performance on detecting HTTP based botnet activity.
Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs to be observed per flow in order to capture the patterns of malicious traffic. The proposed system has been tested through the series of experiments using traffic traces originating from two well-known P2P botnets and diverse non-malicious applications. The results of experiments indicate that the system is able to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates a strong potential of using the proposed approach within a future on-line detection framework.
Dynamic firewalls with stateful inspection have added a lot of security features over the stateless traditional static filters. Dynamic firewalls need to be adaptive. In this paper, we have designed a framework for dynamic firewalls based on probabilistic ontology using Multi Entity Bayesian Networks (MEBN) logic. MEBN extends ordinary Bayesian networks to allow representation of graphical models with repeated substructures and can express a probability distribution over models of any consistent first order theory. The motivation of our proposed work is about preventing novel attacks (i.e. those attacks for which no signatures have been generated yet). The proposed framework is in two important parts: first part is the data flow architecture which extracts important connection based features with the prime goal of an explicit rule inclusion into the rule base of the firewall; second part is the knowledge flow architecture which uses semantic threat graph as well as reasoning under uncertainty to fulfill the required objective of providing futuristic threat prevention technique in dynamic firewalls.
Mobile ad hoc network (MANET) is a self-created and self organized network of wireless mobile nodes. Due to special characteristics of these networks, security issue is a difficult task to achieve. Hence, applying current intrusion detection techniques developed for fixed networks is not sufficient for MANETs. In this paper, we proposed an approach based on genetic algorithm (GA) and artificial immune system (AIS), called GAAIS, for dynamic intrusion detection in AODV-based MANETs. GAAIS is able to adapting itself to network topology changes using two updating methods: partial and total. Each normal feature vector extracted from network traffic is represented by a hypersphere with fix radius. A set of spherical detector is generated using NicheMGA algorithm for covering the nonself space. Spherical detectors are used for detecting anomaly in network traffic. The performance of GAAIS is evaluated for detecting several types of routing attacks simulated using the NS2 simulator, such as Flooding, Blackhole, Neighbor, Rushing, and Wormhole. Experimental results show that GAAIS is more efficient in comparison with similar approaches.
Enhanced situational awareness is integral to risk management and response evaluation. Dynamic systems that incorporate both hard and soft data sources allow for comprehensive situational frameworks which can supplement physical models with conceptual notions of risk. The processing of widely available semi-structured textual data sources can produce soft information that is readily consumable by such a framework. In this paper, we augment the situational awareness capabilities of a recently proposed risk management framework (RMF) with the incorporation of soft data. We illustrate the beneficial role of the hard-soft data fusion in the characterization and evaluation of potential vessels in distress within Maritime Domain Awareness (MDA) scenarios. Risk features pertaining to maritime vessels are defined a priori and then quantified in real time using both hard (e.g., Automatic Identification System, Douglas Sea Scale) as well as soft (e.g., historical records of worldwide maritime incidents) data sources. A risk-aware metric to quantify the effectiveness of the hard-soft fusion process is also proposed. Though illustrated with MDA scenarios, the proposed hard-soft fusion methodology within the RMF can be readily applied to other domains.
Very high resolution satellite imagery used to be a rare commodity, with infrequent satellite pass-over times over a specific area-of-interest obviating many useful applications. Today, more and more such satellite systems are available, with visual analysis and interpretation of imagery still important to derive relevant features and changes from satellite data. In order to allow efficient, robust and routine image analysis for humanitarian purposes, semi-automated feature extraction is of increasing importance for operational emergency mapping tasks. In the frame of the European Earth Observation program COPERNICUS and related research activities under the European Union's Seventh Framework Program, substantial scientific developments and mapping services are dedicated to satellite based humanitarian mapping and monitoring. In this paper, recent results in methodological research and development of routine services in satellite mapping for humanitarian situational awareness are reviewed and discussed. Ethical aspects of sensitivity and security of humanitarian mapping are deliberated. Furthermore methods for monitoring and analysis of refugee/internally displaced persons camps in humanitarian settings are assessed. Advantages and limitations of object-based image analysis, sample supervised segmentation and feature extraction are presented and discussed.
With the rise in the underground Internet economy, automated malicious programs popularly known as malwares have become a major threat to computers and information systems connected to the internet. Properties such as self healing, self hiding and ability to deceive the security devices make these software hard to detect and mitigate. Therefore, the detection and the mitigation of such malicious software is a major challenge for researchers and security personals. The conventional systems for the detection and mitigation of such threats are mostly signature based systems. Major drawback of such systems are their inability to detect malware samples for which there is no signature available in their signature database. Such malwares are known as zero day malware. Moreover, more and more malware writers uses obfuscation technology such as polymorphic and metamorphic, packing, encryption, to avoid being detected by antivirus. Therefore, the traditional signature based detection system is neither effective nor efficient for the detection of zero-day malware. Hence to improve the effectiveness and efficiency of malware detection system we are using classification method based on structural information and behavioral specifications. In this paper we have used both static and dynamic analysis approaches. In static analysis we are extracting the features of an executable file followed by classification. In dynamic analysis we are taking the traces of executable files using NtTrace within controlled atmosphere. Experimental results obtained from our algorithm indicate that our proposed algorithm is effective in extracting malicious behavior of executables. Further it can also be used to detect malware variants.