Biblio
To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security events from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. Because of this, potential malicious attacks and compromised hosts may be missed. Machine learning is a viable approach to reduce the false positive rate and improve the productivity of SOC analysts. In this paper, we develop a user-centric machine learning framework for the cyber security operation center in real enterprise environment. We discuss the typical data sources in SOC, their work flow, and how to leverage and process these data sets to build an effective machine learning system. The paper is targeted towards two groups of readers. The first group is data scientists or machine learning researchers who do not have cyber security domain knowledge but want to build machine learning systems for security operations center. The second group of audiences are those cyber security practitioners who have deep knowledge and expertise in cyber security, but do not have machine learning experiences and wish to build one by themselves. Throughout the paper, we use the system we built in the Symantec SOC production environment as an example to demonstrate the complete steps from data collection, label creation, feature engineering, machine learning algorithm selection, model performance evaluations, to risk score generation.
In the last couple of years, organizations have demonstrated an increased willingness to participate in threat intelligence sharing platforms. The open exchange of information and knowledge regarding threats, vulnerabilities, incidents and mitigation strategies results from the organizations' growing need to protect against today's sophisticated cyber attacks. To investigate data quality challenges that might arise in threat intelligence sharing, we conducted focus group discussions with ten expert stakeholders from security operations centers of various globally operating organizations. The study addresses several factors affecting shared threat intelligence data quality at multiple levels, including collecting, processing, sharing and storing data. As expected, the study finds that the main factors that affect shared threat intelligence data stem from the limitations and complexities associated with integrating and consolidating shared threat intelligence from different sources while ensuring the data's usefulness for an inhomogeneous group of participants.Data quality is extremely important for shared threat intelligence. As our study has shown, there are no fundamentally new data quality issues in threat intelligence sharing. However, as threat intelligence sharing is an emerging domain and a large number of threat intelligence sharing tools are currently being rushed to market, several data quality issues – particularly related to scalability and data source integration – deserve particular attention.