Biblio
The advanced persistent threat (APT) landscape has been studied without quantifiable data, for which indicators of compromise (IoC) may be uniformly analyzed, replicated, or used to support security mechanisms. This work culminates extensive academic and industry APT analysis, not as an incremental step in existing approaches to APT detection, but as a new benchmark of APT related opportunity. We collect 15,259 APT IoC hashes, retrieving subsequent sandbox execution logs across 41 different file types. This work forms an initial focus on Windows-based threat detection. We present a novel Windows APT executable (APT-EXE) dataset, made available to the research community. Manual and statistical analysis of the APT-EXE dataset is conducted, along with supporting feature analysis. We draw upon repeat and common APT paths access, file types, and operations within the APT-EXE dataset to generalize APT execution footprints. A baseline case analysis successfully identifies a majority of 117 of 152 live APT samples from campaigns across 2018 and 2019.
Ransomware attacks are taking advantage of the ongoing pandemics and attacking the vulnerable systems in business, health sector, education, insurance, bank, and government sectors. Various approaches have been proposed to combat ransomware, but the dynamic nature of malware writers often bypasses the security checkpoints. There are commercial tools available in the market for ransomware analysis and detection, but their performance is questionable. This paper aims at proposing an AI-based ransomware detection framework and designing a detection tool (AIRaD) using a combination of both static and dynamic malware analysis techniques. Dynamic binary instrumentation is done using PIN tool, function call trace is analyzed leveraging Cuckoo sandbox and Ghidra. Features extracted at DLL, function call, and assembly level are processed with NLP, association rule mining techniques and fed to different machine learning classifiers. Support vector machine and Adaboost with J48 algorithms achieved the highest accuracy of 99.54% with 0.005 false-positive rates for a multi-level combined term frequency approach.
Enforcing security and resilience in a cloud platform is an essential but challenging problem due to the presence of a large number of heterogeneous applications running on shared resources. A security analysis system that can detect threats or malware must exist inside the cloud infrastructure. Much research has been done on machine learning-driven malware analysis, but it is limited in computational complexity and detection accuracy. To overcome these drawbacks, we proposed a new malware detection system based on the concept of clustering and trend micro locality sensitive hashing (TLSH). We used Cuckoo sandbox, which provides dynamic analysis reports of files by executing them in an isolated environment. We used a novel feature extraction algorithm to extract essential features from the malware reports obtained from the Cuckoo sandbox. Further, the most important features are selected using principal component analysis (PCA), random forest, and Chi-square feature selection methods. Subsequently, the experimental results are obtained for clustering and non-clustering approaches on three classifiers, including Decision Tree, Random Forest, and Logistic Regression. The model performance shows better classification accuracy and false positive rate (FPR) as compared to the state-of-the-art works and non-clustering approach at significantly lesser computation cost.
Whenever any internet user visits a website, a scripting language runs in the background known as JavaScript. The embedding of malicious activities within the script poses a great threat to the cyberworld. Attackers take advantage of the dynamic nature of the JavaScript and embed malicious code within the website to download malware and damage the host. JavaScript developers obfuscate the script to keep it shielded from getting detected by the malware detectors. In this paper, we propose a novel technique for analysing and detecting JavaScript using sandbox assisted ensemble model. We extract the payload using malware-jail sandbox to get the real script. Upon getting the extracted script, we analyse it to define the features that are needed for creating the dataset. We compute Pearson's r between every feature for feature extraction. An ensemble model consisting of Sequential Minimal Optimization (SMO), Voted Perceptron and AdaBoost algorithm is used with voting technique to detect malicious JavaScript. Experimental results show that our proposed model can detect obfuscated and de-obfuscated malicious JavaScript with an accuracy of 99.6% and 0.03s detection time. Our model performs better than other state-of-the-art models in terms of accuracy and least training and detection time.
Mining of data is used to analyze facts to discover formerly unknown patterns, classifying and grouping the records. There are several crucial scalable statistics mining platforms that have been developed in latest years. RapidMiner is a famous open source software which can be used for advanced analytics, Weka and Orange are important tools of machine learning for classifying patterns with techniques of clustering and regression, whilst Knime is often used for facts preprocessing like information extraction, transformation and loading. This article encapsulates the most important and robust platforms.
The dynamicity and complexity of clouds highlight the importance of automated root cause analysis solutions for explaining what might have caused a security incident. Most existing works focus on either locating malfunctioning clouds components, e.g., switches, or tracing changes at lower abstraction levels, e.g., system calls. On the other hand, a management-level solution can provide a big picture about the root cause in a more scalable manner. In this paper, we propose DOMINOCATCHER, a novel provenance-based solution for explaining the root cause of security incidents in terms of management operations in clouds. Specifically, we first define our provenance model to capture the interdependencies between cloud management operations, virtual resources and inputs. Based on this model, we design a framework to intercept cloud management operations and to extract and prune provenance metadata. We implement DOMINOCATCHER on OpenStack platform as an attached middleware and validate its effectiveness using security incidents based on real-world attacks. We also evaluate the performance through experiments on our testbed, and the results demonstrate that DOMINOCATCHER incurs insignificant overhead and is scalable for clouds.
Network security policies contain requirements - including system and software features as well as expected and desired actions of human actors. In this paper, we present a framework for evaluation of textual network security policies as requirements documents to identify areas for improvement. Specifically, our framework concentrates on completeness. We use topic modeling coupled with expert evaluation to learn the complete list of important topics that should be addressed in a network security policy. Using these topics as a checklist, we evaluate (students) a collection of network security policies for completeness, i.e., the level of presence of these topics in the text. We developed three methods for topic recognition to identify missing or poorly addressed topics. We examine network security policies and report the results of our analysis: preliminary success of our approach.
Internet Service Providers (ISPs) have an economic and operational interest in detecting malicious network activity relating to their subscribers. However, it is unclear what kind of traffic data an ISP has available for cyber-security research, and under which legal conditions it can be used. This paper gives an overview of the challenges posed by legislation and of the data sources available to a European ISP. DNS and NetFlow logs are identified as relevant data sources and the state of the art in anonymization and fingerprinting techniques is discussed. Based on legislation, data availability and privacy considerations, a practically applicable anonymization policy is presented.
The electrical power system is the backbone of our nations critical infrastructure. It has been designed to withstand single component failures based on a set of reliability metrics which have proven acceptable during normal operating conditions. However, in recent years there has been an increasing frequency of extreme weather events. Many have resulted in widespread long-term power outages, proving reliability metrics do not provide adequate energy security. As a result, researchers have focused their efforts resilience metrics to ensure efficient operation of power systems during extreme events. A resilient system has the ability to resist, adapt, and recover from disruptions. Therefore, resilience has demonstrated itself as a promising concept for currently faced challenges in power distribution systems. In this work, we propose an operational resilience metric for modern power distribution systems. The metric is based on the aggregation of system assets adaptive capacity in real and reactive power. This metric gives information to the magnitude and duration of a disturbance the system can withstand. We demonstrate resilience metric in a case study under normal operation and during a power contingency on a microgrid. In the future, this information can be used by operators to make more informed decisions based on system resilience in an effort to prevent power outages.
Machine learning algorithms used to detect attacks are limited by the fact that they cannot incorporate the back-ground knowledge that an analyst has. This limits their suitability in detecting new attacks. Reinforcement learning is different from traditional machine learning algorithms used in the cybersecurity domain. Compared to traditional ML algorithms, reinforcement learning does not need a mapping of the input-output space or a specific user-defined metric to compare data points. This is important for the cybersecurity domain, especially for malware detection and mitigation, as not all problems have a single, known, correct answer. Often, security researchers have to resort to guided trial and error to understand the presence of a malware and mitigate it.In this paper, we incorporate prior knowledge, represented as Cybersecurity Knowledge Graphs (CKGs), to guide the exploration of an RL algorithm to detect malware. CKGs capture semantic relationships between cyber-entities, including that mined from open source. Instead of trying out random guesses and observing the change in the environment, we aim to take the help of verified knowledge about cyber-attack to guide our reinforcement learning algorithm to effectively identify ways to detect the presence of malicious filenames so that they can be deleted to mitigate a cyber-attack. We show that such a guided system outperforms a base RL system in detecting malware.
In the past decades, learning an effective distance metric between pairs of instances has played an important role in the classification and retrieval task, for example, the person identification or malware retrieval in the IoT service. The core motivation of recent efforts focus on improving the metric forms, and already showed promising results on the various applications. However, such models often fail to produce a reliable metric on the ambiguous test set. It happens mainly due to the sampling process of the training set, which is not representative of the distribution of the negative samples, especially the examples that are closer to the boundary of different categories (also called hard negative samples). In this paper, we focus on addressing such problems and propose an adaptive margin deep adversarial metric learning (AMDAML) framework. It exploits numerous common negative samples to generate potential hard (adversarial) negatives and applies them to facilitate robust metric learning. Apart from the previous approaches that typically depend on the search or data augmentation to find hard negative samples, the generation of adversarial negative instances could avoid the limitation of domain knowledge and constraint pairs' amount. Specifically, in order to prevent over fitting or underfitting during the training step, we propose an adaptive margin loss that preserves a flexible margin between the negative (include the adversarial and original) and positive samples. We simultaneously train both the adversarial negative generator and conventional metric objective in an adversarial manner and learn the feature representations that are more precise and robust. The experimental results on practical data sets clearly demonstrate the superiority of AMDAML to representative state-of-the-art metric learning models.