Biblio
Traditional security controls, such as firewalls, anti-virus and IDS, are ill-equipped to help IT security and response teams keep pace with the rapid evolution of the cyber threat landscape. Cyber Threat Intelligence (CTI) can help remediate this problem by exploiting non-traditional information sources, such as hacker forums and "dark-web" social platforms. Security and response teams can use the collected intelligence to identify emerging threats. Unfortunately, when manual analysis is used to extract CTI from non-traditional sources, it is a time consuming, error-prone and resource intensive process. We address these issues by using a hybrid Machine Learning model that automatically searches through hacker forum posts, identifies the posts that are most relevant to cyber security and then clusters the relevant posts into estimations of the topics that the hackers are discussing. The first (identification) stage uses Support Vector Machines and the second (clustering) stage uses Latent Dirichlet Allocation. We tested our model, using data from an actual hacker forum, to automatically extract information about various threats such as leaked credentials, malicious proxy servers, malware that evades AV detection, etc. The results demonstrate our method is an effective means for quickly extracting relevant and actionable intelligence that can be integrated with traditional security controls to increase their effectiveness.
Software Defined Networking (SDN) and Network Function Virtualisation (NFV) are transforming modern networks towards a service-oriented architecture. At the same time, the cybersecurity industry is rapidly adopting Machine Learning (ML) algorithms to improve detection and mitigation of complex attacks. Traditional intrusion detection systems perform signature-based detection, based on well-known malicious traffic patterns that signify potential attacks. The main drawback of this method is that attack patterns need to be known in advance and signatures must be preconfigured. Hence, typical systems fail to detect a zero-day attack or an attack with unknown signature. This work considers the use of machine learning for advanced anomaly detection, and specifically deploys the Apache Spot ML framework on an SDN/NFV-enabled testbed running cybersecurity services as Virtual Network Functions (VNFs). VNFs are used to capture traffic for ingestion by the ML algorithm and apply mitigation measures in case of a detected anomaly. Apache Spot utilises Latent Dirichlet Allocation to identify anomalous traffic patterns in Netflow, DNS and proxy data. The overall performance of Apache Spot is evaluated by deploying Denial of Service (Slowloris, BoNeSi) and a Data Exfiltration attack (iodine).
There is a growing interest in modeling and predicting the behavior of financial systems and supply chains. In this paper, we focus on the the analysis of the resMBS supply chain; it is associated with the US residential mortgage backed securities and subprime mortgages that were critical in the 2008 US financial crisis. We develop models based on financial institutions (FI), and their participation described by their roles (Role) on financial contracts (FC). Our models are based on an intuitive assumption that FIs will form communities within an FC, and FIs within a community are more likely to collaborate with other FIs in that community, and play the same role, in another FC. Inspired by the Latent Dirichlet Allocation (LDA) and topic models, we develop two probabilistic financial community models. In FI-Comm, each FC (document) is a mix of topics where a topic is a distribution over FIs (words). In Role-FI-Comm, each topic is a distribution over Role-FI pairs (words). Experimental results over 5000+ financial prospecti demonstrate the effectiveness of our models.
An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.