Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks
Title | Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Feng, Y., Akiyama, H., Lu, L., Sakurai, K. |
Conference Name | 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) |
Date Published | aug |
Keywords | attack vectors, C&C communication, Command & Control communication, cyber security community, cyberattack, data privacy, DDoS Attacks, denial of services, distributed cyber attacks, early detection, Electronic mail, feature extraction, feature selection, Human Behavior, learning (artificial intelligence), machine learning, machine learning algorithms, machine learning-based detection methods, principal component analysis, privacy leakage, pubcrawl, Random Forest, Resiliency, Scalability, security of data, Servers, support vector machine, Support vector machines |
Abstract | It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification. |
URL | https://ieeexplore.ieee.org/document/8511883 |
DOI | 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00040 |
Citation Key | feng_feature_2018 |
- learning (artificial intelligence)
- Support vector machines
- support vector machine
- Servers
- security of data
- Scalability
- Resiliency
- Random Forest
- pubcrawl
- privacy leakage
- principal component analysis
- machine learning-based detection methods
- machine learning algorithms
- machine learning
- Attack vectors
- Human behavior
- Feature Selection
- feature extraction
- Electronic mail
- early detection
- distributed cyber attacks
- denial of services
- DDoS Attacks
- data privacy
- cyberattack
- cyber security community
- Command & Control communication
- C&C communication