Biblio
Deep machine learning techniques have shown promising results in network traffic classification, however, the robustness of these techniques under adversarial threats is still in question. Deep machine learning models are found vulnerable to small carefully crafted adversarial perturbations posing a major question on the performance of deep machine learning techniques. In this paper, we propose a black-box adversarial attack on network traffic classification. The proposed attack successfully evades deep machine learning-based classifiers which highlights the potential security threat of using deep machine learning techniques to realize autonomous networks.
The anonymity and decentralization of Bitcoin make it widely accepted in illegal transactions, such as money laundering, drug and weapon trafficking, gambling, to name a few, which has already caused significant security risk all around the world. The obvious de-anonymity approach that matches transaction addresses and users is not possible in practice due to limited annotated data set. In this paper, we divide addresses into four types, exchange, gambling, service, and general, and propose targeted addresses identification algorithms with high fault tolerance which may be employed in a wide range of applications. We use network representation learning to extract features and train imbalanced multi-classifiers. Experimental results validated the effectiveness of the proposed method.
Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.
Continuous and adaptive learning is an effective learning approach when dealing with highly dynamic and changing scenarios, where concept drift often happens. In a continuous, stream or adaptive learning setup, new measurements arrive continuously and there are no boundaries for learning, meaning that the learning model has to decide how and when to (re)learn from these new data constantly. We address the problem of adaptive and continual learning for network security, building dynamic models to detect network attacks in real network traffic. The combination of fast and big network measurements data with the re-training paradigm of adaptive learning imposes complex challenges in terms of data processing speed, which we tackle by relying on big data platforms for parallel stream processing. We build and benchmark different adaptive learning models on top of a novel big data analytics platform for network traffic monitoring and analysis tasks, and show that high speed-up computations (as high as × 6) can be achieved by parallelizing off-the-shelf stream learning approaches.
In the open network environment, the network offensive information is implanted in big data environment, so it is necessary to carry out accurate location marking of network offensive information, to realize network attack detection, and to implement the process of accurate location marking of network offensive information. Combined with big data analysis method, the location of network attack nodes is realized, but when network attacks cross in series, the performance of attack information tagging is not good. An accurate marking technique for network attack information is proposed based on big data fusion tracking recognition. The adaptive learning model combined with big data is used to mark and sample the network attack information, and the feature analysis model of attack information chain is designed by extracting the association rules. This paper classifies the data types of the network attack nodes, and improves the network attack detection ability by the task scheduling method of the network attack information nodes, and realizes the accurate marking of the network attacking information. Simulation results show that the proposed algorithm can effectively improve the accuracy of marking offensive information in open network environment, the efficiency of attack detection and the ability of intrusion prevention is improved, and it has good application value in the field of network security defense.
The task of attack attribution, i.e., identifying the entity responsible for an attack, is complicated and usually requires the involvement of an experienced security expert. Prior attempts to automate attack attribution apply various machine learning techniques on features extracted from the malware's code and behavior in order to identify other similar malware whose authors are known. However, the same malware can be reused by multiple actors, and the actor who performed an attack using a malware might differ from the malware's author. Moreover, information collected during an incident may contain many clues about the identity of the attacker in addition to the malware used. In this paper, we propose a method of attack attribution based on textual analysis of threat intelligence reports, using state of the art algorithms and models from the fields of machine learning and natural language processing (NLP). We have developed a new text representation algorithm which captures the context of the words and requires minimal feature engineering. Our approach relies on vector space representation of incident reports derived from a small collection of labeled reports and a large corpus of general security literature. Both datasets have been made available to the research community. Experimental results show that the proposed representation can attribute attacks more accurately than the baselines' representations. In addition, we show how the proposed approach can be used to identify novel previously unseen threat actors and identify similarities between known threat actors.
Intentionally deceptive content presented under the guise of legitimate journalism is a worldwide information accuracy and integrity problem that affects opinion forming, decision making, and voting patterns. Most so-called `fake news' is initially distributed over social media conduits like Facebook and Twitter and later finds its way onto mainstream media platforms such as traditional television and radio news. The fake news stories that are initially seeded over social media platforms share key linguistic characteristics such as making excessive use of unsubstantiated hyperbole and non-attributed quoted content. In this paper, the results of a fake news identification study that documents the performance of a fake news classifier are presented. The Textblob, Natural Language, and SciPy Toolkits were used to develop a novel fake news detector that uses quoted attribution in a Bayesian machine learning system as a key feature to estimate the likelihood that a news article is fake. The resultant process precision is 63.333% effective at assessing the likelihood that an article with quotes is fake. This process is called influence mining and this novel technique is presented as a method that can be used to enable fake news and even propaganda detection. In this paper, the research process, technical analysis, technical linguistics work, and classifier performance and results are presented. The paper concludes with a discussion of how the current system will evolve into an influence mining system.