Visible to the public MR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications

TitleMR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications
Publication TypeConference Paper
Year of Publication2014
AuthorsYun Shen, Thonnard, O.
Conference NameBig Data (Big Data), 2014 IEEE International Conference on
Date PublishedOct
KeywordsAlgorithm design and analysis, attack attribution, Big Data, Big Data security intelligence applications, Clustering algorithms, commodity hardware, computational complexity, Computer crime, data mining, distributed algorithms, Electronic mail, graph theory, Internet attacks, large security data sets, large security datasets, MapReduce, MR-TRIAGE workflow, multicriteria evaluation techniques, Open wireless architecture, parallel algorithms, pattern clustering, Prototypes, scalable data summarisation, scalable graph clustering algorithms, scalable multicriteria data clustering, security, security companies, security data mining, security events, situational understanding, threat level
Abstract

Security companies have recently realised that mining massive amounts of security data can help generate actionable intelligence and improve their understanding of Internet attacks. In particular, attack attribution and situational understanding are considered critical aspects to effectively deal with emerging, increasingly sophisticated Internet attacks. This requires highly scalable analysis tools to help analysts classify, correlate and prioritise security events, depending on their likely impact and threat level. However, this security data mining process typically involves a considerable amount of features interacting in a non-obvious way, which makes it inherently complex. To deal with this challenge, we introduce MR-TRIAGE, a set of distributed algorithms built on MapReduce that can perform scalable multi-criteria data clustering on large security data sets and identify complex relationships hidden in massive datasets. The MR-TRIAGE workflow is made of a scalable data summarisation, followed by scalable graph clustering algorithms in which we integrate multi-criteria evaluation techniques. Theoretical computational complexity of the proposed parallel algorithms are discussed and analysed. The experimental results demonstrate that the algorithms can scale well and efficiently process large security datasets on commodity hardware. Our approach can effectively cluster any type of security events (e.g., spam emails, spear-phishing attacks, etc) that are sharing at least some commonalities among a number of predefined features.

URLhttps://ieeexplore.ieee.org/document/7004285/
DOI10.1109/BigData.2014.7004285
Citation Key7004285