Automated Generation and Selection of Interpretable Features for Enterprise Security
Title | Automated Generation and Selection of Interpretable Features for Enterprise Security |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Duan, J., Zeng, Z., Oprea, A., Vasudevan, S. |
Conference Name | 2018 IEEE International Conference on Big Data (Big Data) |
Date Published | dec |
Keywords | big data security, Boolean functions, classifiers, Clustering algorithms, cyber security, DNF formulas, enterprise security logs, feature extraction, fourier analysis, Fourier transforms, learning (artificial intelligence), machine learning method, malicious activity detection, Malware, Metrics, pattern clustering, pubcrawl, resilience, Resiliency, Scalability, security, security of data, Training data |
Abstract | We present an effective machine learning method for malicious activity detection in enterprise security logs. Our method involves feature engineering, or generating new features by applying operators on features of the raw data. We generate DNF formulas from raw features, extract Boolean functions from them, and leverage Fourier analysis to generate new parity features and rank them based on their highest Fourier coefficients. We demonstrate on real enterprise data sets that the engineered features enhance the performance of a wide range of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall, while sacrificing no more than 0.47% in accuracy. We also observe better isolation of malicious clusters, when performing clustering on engineered features. In general, a small number of engineered features achieve higher performance than raw data features according to our metrics of interest. Our feature engineering method also retains interpretability, an important consideration in cyber security applications. |
URL | https://ieeexplore.ieee.org/document/8621986 |
DOI | 10.1109/BigData.2018.8621986 |
Citation Key | duan_automated_2018 |
- machine learning method
- Training data
- security of data
- security
- Scalability
- Resiliency
- resilience
- pubcrawl
- pattern clustering
- Metrics
- malware
- malicious activity detection
- big data security
- learning (artificial intelligence)
- Fourier transforms
- fourier analysis
- feature extraction
- enterprise security logs
- DNF formulas
- cyber security
- Clustering algorithms
- classifiers
- Boolean functions