Biblio
Relevance feedback can be considered as a learning problem. It has been extensively used to improve the performance of retrieval multimedia information. In this paper, after the relevance feedback upon content-based image retrieval (CBIR) discussed, a hybrid learning scheme on multi-target retrieval (MTR) with relevance feedback was proposed. Suppose the symbolic image database (SID) of object-level with combined image metadata and feature model was constructed. During the interactive query for remote sensing image, we calculate the similarity metric so as to get the relevant image sets from the image library. For the purpose of further improvement of the precision of image retrieval, a hybrid learning scheme parameter also need to be chosen. As a result, the idea of our hybrid learning scheme contains an exception maximization algorithm (EMA) used for retrieving the most relevant images from SID and an algorithm called supported vector machine (SVM) with relevance feedback used for learning the feedback information substantially. Experimental results show that our hybrid learning scheme with relevance feedback on MTR can improve the performance and accuracy compared the basic algorithms.
An adaptable agent-based IDS (AAIDS) inspired by the danger theory of artificial immune system is proposed. The learning mechanism of AAIDS is designed by emulating how dendritic cells (DC) in immune systems detect and classify danger signals. AG agent, DC agent and TC agent coordinate together and respond to system calls directly rather than analyze network packets. Simulations show AAIDS can determine several critical scenarios of the system behaviors where packet analysis is impractical.
In this paper, we propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses a serious security threat to financial institutions, businesses and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples so that their behavior can be analyzed. Machine learning approaches are becoming popular for classifying malware, however, most of the existing machine learning methods for malware classification use shallow learning algorithms (e.g. SVM). Recently, Convolutional Neural Networks (CNN), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Motivated by this success, we propose a CNN-based architecture to classify malware samples. We convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, Malimg and Microsoft malware, demonstrate that our method achieves better than the state-of-the-art performance. The proposed method achieves 98.52% and 99.97% accuracy on the Malimg and Microsoft datasets respectively.
Hacker forums and other social platforms may contain vital information about cyber security threats. But using manual analysis to extract relevant threat information from these sources is a time consuming and error-prone process that requires a significant allocation of resources. In this paper, we explore the potential of Machine Learning methods to rapidly sift through hacker forums for relevant threat intelligence. Utilizing text data from a real hacker forum, we compared the text classification performance of Convolutional Neural Network methods against more traditional Machine Learning approaches. We found that traditional machine learning methods, such as Support Vector Machines, can yield high levels of performance that are on par with Convolutional Neural Network algorithms.
To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security events from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. Because of this, potential malicious attacks and compromised hosts may be missed. Machine learning is a viable approach to reduce the false positive rate and improve the productivity of SOC analysts. In this paper, we develop a user-centric machine learning framework for the cyber security operation center in real enterprise environment. We discuss the typical data sources in SOC, their work flow, and how to leverage and process these data sets to build an effective machine learning system. The paper is targeted towards two groups of readers. The first group is data scientists or machine learning researchers who do not have cyber security domain knowledge but want to build machine learning systems for security operations center. The second group of audiences are those cyber security practitioners who have deep knowledge and expertise in cyber security, but do not have machine learning experiences and wish to build one by themselves. Throughout the paper, we use the system we built in the Symantec SOC production environment as an example to demonstrate the complete steps from data collection, label creation, feature engineering, machine learning algorithm selection, model performance evaluations, to risk score generation.
Cloud computation has become prominent with seemingly unlimited amount of storage and computation available to users. Yet, security is a major issue that hampers the growth of cloud. In this research we investigate a collaborative Intrusion Detection System (IDS) based on the ensemble learning method. It uses weak classifiers, and allows the use of untapped resources of cloud to detect various types of attacks on the cloud system. In the proposed system, tasks are distributed among available virtual machines (VM), individual results are then merged for the final adaptation of the learning model. Performance evaluation is carried out using decision trees and using fuzzy classifiers, on KDD99, one of the largest datasets for IDS. Segmentation of the dataset is done in order to mimic the behavior of real-time data traffic occurred in a real cloud environment. The experimental results show that the proposed approach reduces the execution time with improved accuracy, and is fault-tolerant when handling VM failures. The system is a proof-of-concept model for a scalable, cloud-based distributed system that is able to explore untapped resources, and may be used as a base model for a real-time hierarchical IDS.
As increasingly more enterprises are deploying cloud file-sharing services, this adds a new channel for potential insider threats to company data and IPs. In this paper, we introduce a two-stage machine learning system to detect anomalies. In the first stage, we project the access logs of cloud file-sharing services onto relationship graphs and use three complementary graph-based unsupervised learning methods: OddBall, PageRank and Local Outlier Factor (LOF) to generate outlier indicators. In the second stage, we ensemble the outlier indicators and introduce the discrete wavelet transform (DWT) method, and propose a procedure to use wavelet coefficients with the Haar wavelet function to identify outliers for insider threat. The proposed system has been deployed in a real business environment, and demonstrated effectiveness by selected case studies.
Insider threat is a significant security risk for information system, and detection of insider threat is a major concern for information system organizers. Recently existing work mainly focused on the single pattern analysis of user single-domain behavior, which were not suitable for user behavior pattern analysis in multi-domain scenarios. However, the fusion of multi-domain irrelevant features may hide the existence of anomalies. Previous feature learning methods have relatively a large proportion of information loss in feature extraction. Therefore, this paper proposes a hybrid model based on the deep belief network (DBN) to detect insider threat. First, an unsupervised DBN is used to extract hidden features from the multi-domain feature extracted by the audit logs. Secondly, a One-Class SVM (OCSVM) is trained from the features learned by the DBN. The experimental results on the CERT dataset demonstrate that the DBN can be used to identify the insider threat events and it provides a new idea to feature processing for the insider threat detection.
Modern smart surveillance systems can not only record the monitored environment but also identify the targeted objects and detect anomaly activities. These advanced functions are often facilitated by deep neural networks, achieving very high accuracy and large data processing throughput. However, inappropriate design of the neural network may expose such smart systems to the risks of leaking the target being searched or even the adopted learning model itself to attackers. In this talk, we will present the security challenges in the design of smart surveillance systems. We will also discuss some possible solutions that leverage the unique properties of emerging nano-devices, including the incurred design and performance cost and optimization methods for minimizing these overheads.
Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations – asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
In this paper, we propose a decomposition based multiobjective evolutionary algorithm that extracts information from an external archive to guide the evolutionary search for continuous optimization problem. The proposed algorithm used a mechanism to identify the promising regions(subproblems) through learning information from the external archive to guide evolutionary search process. In order to demonstrate the performance of the algorithm, we conduct experiments to compare it with other decomposition based approaches. The results validate that our proposed algorithm is very competitive.