Biblio
Nowadays, communication networks have a high relevance in any field. Because of this, it is necessary to maintain them working properly and with an adequate security level. In many fields, and in anomaly detection in communication networks in particular, it results really convenient the use of early detection methods. Therefore, adequate metrics must be defined to allow the correct evaluation of methods applied in relation to time delay in the detection. In this thesis the definition of time-aware metrics for early detection anomaly techniques evaluation.
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
Distributed attacks originating from botnet-infected machines (bots) such as large-scale malware propagation campaigns orchestrated via spam emails can quickly affect other network infrastructures. As these attacks are made successful only by the fact that hundreds of infected machines engage in them collectively, their damage can be avoided if machines infected with a common botnet can be detected early rather than after an attack is launched. Prior studies have suggested that outgoing bot attacks are often preceded by other ``tell-tale'' malicious behaviour, such as communication with botnet controllers (C&C servers) that command botnets to carry out attacks. We postulate that observing similar behaviour occuring in a synchronised manner across multiple machines is an early indicator of a widespread infection of a single botnet, leading potentially to a large-scale, distributed attack. Intuitively, if we can detect such synchronised behaviour early enough on a few machines in the network, we can quickly contain the threat before an attack does any serious damage. In this work we present a measurement-driven analysis to validate this intuition. We empirically analyse the various stages of malicious behaviour that are observed in real botnet traffic, and carry out the first systematic study of the network behaviour that typically precedes outgoing bot attacks and is synchronised across multiple infected machines. We then implement as a proof-of-concept a set of analysers that monitor synchronisation in botnet communication to generate early infection and attack alerts. We show that with this approach, we can quickly detect nearly 80% of real-world spamming and port scanning attacks, and even demonstrate a novel capability of preventing these attacks altogether by predicting them before they are launched.
Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics.