Biblio
Modern detection systems use sensor outputs available in the deployment environment to probabilistically identify attacks. These systems are trained on past or synthetic feature vectors to create a model of anomalous or normal behavior. Thereafter, run-time collected sensor outputs are compared to the model to identify attacks (or the lack of attack). While this approach to detection has been proven to be effective in many environments, it is limited to training on only features that can be reliably collected at detection time. Hence, they fail to leverage the often vast amount of ancillary information available from past forensic analysis and post-mortem data. In short, detection systems do not train (and thus do not learn from) features that are unavailable or too costly to collect at run-time. Recent work proposed an alternate model construction approach that integrates forensic "privilege" information–-features reliably available at training time, but not at run-time–-to improve accuracy and resilience of detection systems. In this paper, we further evaluate two of proposed techniques to model training with privileged information: knowledge transfer, and model influence. We explore the cultivation of privileged features, the efficiency of those processes and their influence on the detection accuracy. We observe that the improved integration of privileged features makes the resulting detection models more accurate. Our evaluation shows that use of privileged information leads to up to 8.2% relative decrease in detection error for fast-flux bot detection over a system with no privileged information, and 5.5% for malware classification.
Compromised smart meters reporting false power consumption data in Advanced Metering Infrastructure (AMI) may have drastic consequences on a smart grid's operations. Most existing works only deal with electricity theft from customers. However, several other types of data falsification attacks are possible, when meters are compromised by organized rivals. In this paper, we first propose a taxonomy of possible data falsification strategies such as additive, deductive, camouflage and conflict, in AMI micro-grids. Then, we devise a statistical anomaly detection technique to identify the incidence of proposed attack types, by studying their impact on the observed data. Subsequently, a trust model based on Kullback-Leibler divergence is proposed to identify compromised smart meters for additive and deductive attacks. The resultant detection rates and false alarms are minimized through a robust aggregate measure that is calculated based on the detected attack type and successfully discriminating legitimate changes from malicious ones. For conflict and camouflage attacks, a generalized linear model and Weibull function based kernel trick is used over the trust score to facilitate more accurate classification. Using real data sets collected from AMI, we investigate several trade-offs that occur between attacker's revenue and costs, as well as the margin of false data and fraction of compromised nodes. Experimental results show that our model has a high true positive detection rate, while the average false alarm rate is just 8%, for most practical attack strategies, without depending on the expensive hardware based monitoring.
Insider misuse has become a major risk for many organizations. One of the most common forms of misuses is data leakage. Such threats have turned into a real challenge to overcome and mitigate. Whilst prevention is important, incidents will inevitably occur and as such attribution of the leakage is key to ensuring appropriate recourse. Although digital forensics capability has grown rapidly in the process of analyzing the digital evidences, a key barrier is often being able to associate the evidence back to an individual who leaked the data. Stolen credentials and the Trojan defense are two commonly cited arguments used to complicate the issue of attribution. Furthermore, the use of a digital certificate or user ID would only associate to the account not to the individual. This paper proposes a more proactive model whereby a user's biometric information is transparently captured (during normal interactions) and embedding within the digital objects they interact with (thereby providing a direct link between the last user using any document or object). An investigation into the possibility of embedding individuals' biometric signals into image files is presented, with a particular focus upon the ability to recover the biometric information under varying degrees of modification attack. The experimental results show that even when the watermarked object is significantly modified (e.g. only 25% of the image is available) it is still possible to recover those embedded biometric information.
Investigations on the charge of possessing child pornography usually require manual forensic image inspection in order to collect evidence. When storage devices are confiscated, law enforcement authorities are hence often faced with massive image datasets which have to be screened within a limited time frame. As the ability to concentrate and time are highly limited factors of a human investigator, we believe that intelligent algorithms can effectively assist the inspection process by rearranging images based on their content. Thus, more relevant images can be discovered within a shorter time frame, which is of special importance in time-critical investigations of triage character. While currently employed techniques are based on black- and whitelisting of known images, we propose to use deep learning algorithms trained for the detection of pornographic imagery, as they are able to identify new content. In our approach, we evaluated three state-of-the-art neural networks for the detection of pornographic images and employed them to rearrange simulated datasets of 1 million images containing a small fraction of pornographic content. The rearrangement of images according to their content allows a much earlier detection of relevant images during the actual manual inspection of the dataset, especially when the percentage of relevant images is low. With our approach, the first relevant image could be discovered between positions 8 and 9 in the rearranged list on average. Without using our approach of image rearrangement, the first relevant image was discovered at position 1,463 on average.
Although several methods have been proposed for the detection of resampling operations in multimedia signals and the estimation of the resampling factor, the fundamental limits for this forensic task leave open research questions. In this work, we explore the effects that a downsampling operation introduces in the statistics of a 1D signal as a function of the parameters used. We quantify the statistical distance between an original signal and its downsampled version by means of the Kullback-Leibler Divergence (KLD) in case of a wide-sense stationary 1st-order autoregressive signal model. Values of the KLD are derived for different signal parameters, resampling factors and interpolation kernels, thus predicting the achievable hypothesis distinguishability in each case. Our analysis reveals unexpected detectability in case of strong downsampling due to the local correlation structure of the original signal. Moreover, since existing detection methods generally leverage the cyclostationarity of resampled signals, we also address the case where the autocovariance values are estimated directly by means of the sample autocovariance from the signal under investigation. Under the considered assumptions, the Wishart distribution models the sample covariance matrix of a signal segment and the KLD under different hypotheses is derived.
There have been a growing number of interests in using the convolutional neural network(CNN) in image forensics, where some excellent methods have been proposed. Training the randomly initialized model from scratch needs a big amount of training data and computational time. To solve this issue, we present a new method of training an image forensic model using prior knowledge transferred from the existing steganalysis model. We also find out that CNN models tend to show poor performance when tested on a different database. With knowledge transfer, we are able to easily train an excellent model for a new database with a small amount of training data from the new database. Performance of our models are evaluated on Bossbase and BOW by detecting five forensic types, including median filtering, resampling, JPEG compression, contrast enhancement and additive Gaussian noise. Through a series of experiments, we demonstrate that our proposed method is very effective in two scenario mentioned above, and our method based on transfer learning can greatly accelerate the convergence of CNN model. The results of these experiments show that our proposed method can detect five different manipulations with an average accuracy of 97.36%.
As modern attacks become more stealthy and persistent, detecting or preventing them at their early stages becomes virtually impossible. Instead, an attack investigation or provenance system aims to continuously monitor and log interesting system events with minimal overhead. Later, if the system observes any anomalous behavior, it analyzes the log to identify who initiated the attack and which resources were affected by the attack and then assess and recover from any damage incurred. However, because of a fundamental tradeoff between log granularity and system performance, existing systems typically record system-call events without detailed program-level activities (e.g., memory operation) required for accurately reconstructing attack causality or demand that every monitored program be instrumented to provide program-level information. To address this issue, we propose RAIN, a Refinable Attack INvestigation system based on a record-replay technology that records system-call events during runtime and performs instruction-level dynamic information flow tracking (DIFT) during on-demand process replay. Instead of replaying every process with DIFT, RAIN conducts system-call-level reachability analysis to filter out unrelated processes and to minimize the number of processes to be replayed, making inter-process DIFT feasible. Evaluation results show that RAIN effectively prunes out unrelated processes and determines attack causality with negligible false positive rates. In addition, the runtime overhead of RAIN is similar to existing system-call level provenance systems and its analysis overhead is much smaller than full-system DIFT.
Besides its enormous benefits to the industry and community the Internet of Things (IoT) has introduced unique security challenges to its enablers and adopters. As the trend in cybersecurity threats continue to grow, it is likely to influence IoT deployments. Therefore it is eminent that besides strengthening the security of IoT systems we develop effective digital forensics techniques that when breaches occur we can track the sources of attacks and bring perpetrators to the due process with reliable digital evidence. The biggest challenge in this regard is the heterogeneous nature of devices in IoT systems and lack of unified standards. In this paper we investigate digital forensics from IoT perspectives. We argue that besides traditional digital forensics practices it is important to have application-specific forensics in place to ensure collection of evidence in context of specific IoT applications. We consider top three IoT applications and introduce a model which deals with not just traditional forensics but is applicable in digital as well as application-specific forensics process. We believe that the proposed model will enable collection, examination, analysis and reporting of forensically sound evidence in an IoT application-specific digital forensics investigation.
This paper describes the applications of deep learning-based image recognition in the DARPA Memex program and its repository of 1.4 million weapons-related images collected from the Deep web. We develop a fast, efficient, and easily deployable framework for integrating Google's Tensorflow framework with Apache Tika for automatically performing image forensics on the Memex data. Our framework and its integration are evaluated qualitatively and quantitatively and our work suggests that automated, large-scale, and reliable image classification and forensics can be widely used and deployed in bulk analysis for answering domain-specific questions.
A filesystem capable of curtailing data theft and ensuring file integrity protection through deception is introduced and evaluated. The deceptive filesystem transparently creates multiple levels of stacking to protect the base filesystem and monitor file accesses, hide and redact sensitive files with baits, and inject decoys onto fake system views purveyed to untrusted subjects, all while maintaining a pristine state to legitimate processes. Our prototype implementation leverages a kernel hot-patch to seamlessly integrate the new filesystem module into live and existing environments. We demonstrate the utility of our approach with a use case on the nefarious Erebus ransomware. We also show that the filesystem adds no I/O overhead for legitimate users.
In recent cyber incidents, Ransom software (ransomware) causes a major threat to the security of computer systems. Consequently, ransomware detection has become a hot topic in computer security. Unfortunately, current signature-based and static detection model is often easily evadable by obfuscation, polymorphism, compress, and encryption. For overcoming the lack of signature-based and static ransomware detection approach, we have proposed the dynamic ransomware detection system using data mining techniques such as Random Forest (RF), Support Vector Machine (SVM), Simple Logistic (SL) and Naive Bayes (NB) algorithms for detecting known and unknown ransomware. We monitor the actual (dynamic) behaviors of software to generate API calls flow graphs (CFG) and transfer it in a feature space. Thereafter, data normalization and feature selection were applied to select informative features which are the best for discriminating between various categories of software and benign software. Finally, the data mining algorithms were used for building the detection model for judging whether the software is benign software or ransomware. Our experimental results show that our proposed system can be more effective to improve the performance for ransomware detection. Especially, the accuracy and detection rate of our proposed system with Simple Logistic (SL) algorithm can achieve to 98.2% and 97.6%, respectively. Meanwhile, the false positive rate also can be reduced to 1.2%.
Recently, Ransomware has been rapidly increasing and is becoming far more dangerous than other common malware types. Unlike previous versions of Ransomware that infect email attachments or access certain sites, the new Ransomware, such as WannaCryptor, corrupts data even when the PC is connected to the Internet. Therefore, many studies are being conducted to detect and defend Ransomware. However, existing studies on Ransomware detection cannot effectively detect and defend the new Ransomware because it detects Ransomware using signature databases or monitoring specific activities of processes. In this paper, we propose a method to make decoy files for detecting Ransomwares efficiently. The proposed method is based on the analysis of the behaviors of existing Ransomwares at the source code level.
With the continued advancement of the internet and relevant programs, the number of exploitable loopholes in security systems increases. One such exploit that is plaguing the software scene is ransomware, a type of malware that weaves its way through these security loopholes and denies access to intellectual property and documents via encryption. The culprits will then demand a ransom as a price for data decryption. Many businesses face the issue of not having stringent security measures that are sufficient enough to negate the threat of ransomware. This jeopardizes the availability of sensitive data as corporations and individuals are at threat of losing data crucial to business or personal operations. Although certain countermeasures to deal with ransomware exist, the fact that a plethora of new ransomware cases keeps appearing every year points to the problem that they aren't effective enough. This paper aims to conceptualize practical solutions that can be used as foundations to build on in hope that more effective and proactive countermeasures to ransomware can be developed in the future.
Subscriber Identity Module (SIM) is the backbone of modern mobile communication. SIM can be used to store a number of user sensitive information such as user contacts, SMS, banking information (some banking applications store user credentials on the SIM) etc. Unfortunately, the current SIM model has a major weakness. When the mobile device is lost, an adversary can simply steal a user's SIM and use it. He/she can then extract the user's sensitive information stored on the SIM. Moreover, The adversary can then pose as the user and communicate with the contacts stored on the SIM. This opens up the avenue to a large number of social engineering techniques. Additionally, if the user has provided his/her number as a recovery option for some accounts, the adversary can get access to them. The current methodology to deal with a stolen SIM is to contact your particular service provider and report a theft. The service provider then blocks the services on your SIM, but the adversary still has access to the data which is stored on the SIM. Therefore, a secure scheme is required to ensure that only legal users are able to access and utilize their SIM.
Bitcoin, a peer-to-peer payment system and digital currency, is often involved in illicit activities such as scamming, ransomware attacks, illegal goods trading, and thievery. At the time of writing, the Bitcoin ecosystem has not yet been mapped and as such there is no estimate of the share of illicit activities. This paper provides the first estimation of the portion of cyber-criminal entities in the Bitcoin ecosystem. Our dataset consists of 854 observations categorised into 12 classes (out of which 5 are cybercrime-related) and a total of 100,000 uncategorised observations. The dataset was obtained from the data provider who applied three types of clustering of Bitcoin transactions to categorise entities: co-spend, intelligence-based, and behaviour-based. Thirteen supervised learning classifiers were then tested, of which four prevailed with a cross-validation accuracy of 77.38%, 76.47%, 78.46%, 80.76% respectively. From the top four classifiers, Bagging and Gradient Boosting classifiers were selected based on their weighted average and per class precision on the cybercrime-related categories. Both models were used to classify 100,000 uncategorised entities, showing that the share of cybercrime-related is 29.81% according to Bagging, and 10.95% according to Gradient Boosting with number of entities as the metric. With regard to the number of addresses and current coins held by this type of entities, the results are: 5.79% and 10.02% according to Bagging; and 3.16% and 1.45% according to Gradient Boosting.
Ransomware techniques have evolved over time with the most resilient attacks making data recovery practically impossible. This has driven countermeasures to shift towards recovery against prevention but in this paper, we model ransomware attacks from an infection vector point of view. We follow the basic infection chain of crypto ransomware and use Bayesian network statistics to infer some of the most common ransomware infection vectors. We also employ the use of attack and sensor nodes to capture uncertainty in the Bayesian network.
Crypto-ransomware is a challenging threat that ciphers a user's files while hiding the decryption key until a ransom is paid by the victim. This type of malware is a lucrative business for cybercriminals, generating millions of dollars annually. The spread of ransomware is increasing as traditional detection-based protection, such as antivirus and anti-malware, has proven ineffective at preventing attacks. Additionally, this form of malware is incorporating advanced encryption algorithms and expanding the number of file types it targets. Cybercriminals have found a lucrative market and no one is safe from being the next victim. Encrypting ransomware targets business small and large as well as the regular home user. This paper discusses ransomware methods of infection, technology behind it and what can be done to help prevent becoming the next victim. The paper investigates the most common types of crypto-ransomware, various payload methods of infection, typical behavior of crypto ransomware, its tactics, how an attack is ordinarily carried out, what files are most commonly targeted on a victim's computer, and recommendations for prevention and safeguards are listed as well.
Ransomware, a class of self-propagating malware that uses encryption to hold the victims' data ransom, has emerged in recent years as one of the most dangerous cyber threats, with widespread damage; e.g., zero-day ransomware WannaCry has caused world-wide catastrophe, from knocking U.K. National Health Service hospitals offline to shutting down a Honda Motor Company in Japan [1]. Our close collaboration with security operations of large enterprises reveals that defense against ransomware relies on tedious analysis from high-volume systems logs of the first few infections. Sandbox analysis of freshly captured malware is also commonplace in operation. We introduce a method to identify and rank the most discriminating ransomware features from a set of ambient (non-attack) system logs and at least one log stream containing both ambient and ransomware behavior. These ranked features reveal a set of malware actions that are produced automatically from system logs, and can help automate tedious manual analysis. We test our approach using WannaCry and two polymorphic samples by producing logs with Cuckoo Sandbox during both ambient, and ambient plus ransomware executions. Our goal is to extract the features of the malware from the logs with only knowledge that malware was present. We compare outputs with a detailed analysis of WannaCry allowing validation of the algorithm's feature extraction and provide analysis of the method's robustness to variations of input data—changing quality/quantity of ambient data and testing polymorphic ransomware. Most notably, our patterns are accurate and unwavering when generated from polymorphic WannaCry copies, on which 63 (of 63 tested) antivirus (AV) products fail.
Ransomware attacks are becoming prevalent nowadays with the flourishing of crypto-currencies. As the most harmful variant of ransomware crypto-ransomware encrypts the victim's valuable data, and asks for ransom money. Paying the ransom money, however, may not guarantee recovery of the data being encrypted. Most of the existing work for ransomware defense purely focuses on ransomware detection. A few of them consider data recovery from ransomware attacks, but they are not able to defend against ransomware which can obtain a high system privilege. In this work, we design RDS3, a novel Ransomware Defense Strategy, in which we Stealthily back up data in the Spare space of a computing device, such that the data encrypted by ransomware can be restored. Our key idea is that the spare space which stores the backup data is fully isolated from the ransomware. In this way, the ransomware is not able to ``touch'' the backup data regardless of what privilege it can obtain. Security analysis and experimental evaluation show that RDS3 can mitigate ransomware attacks with an acceptable overhead.
Power grid operations rely on the trustworthy operation of critical control center functionalities, including the so-called Economic Dispatch (ED) problem. The ED problem is a large-scale optimization problem that is periodically solved by the system operator to ensure the balance of supply and load while maintaining reliability constraints. In this paper, we propose a semantics-based attack generation and implementation approach to study the security of the ED problem.1 Firstly, we generate optimal attack vectors to transmission line ratings to induce maximum congestion in the critical lines, resulting in the violation of capacity limits. We formulate a bilevel optimization problem in which the attacker chooses manipulations of line capacity ratings to maximinimize the percentage line capacity violations under linear power flows. We reformulate the bilevel problem as a mixed integer linear program that can be solved efficiently. Secondly, we describe how the optimal attack vectors can be implemented in commercial energy management systems (EMSs). The attack explores the dynamic memory space of the EMS, and replaces the true line capacity ratings stored in data regions with the optimal attack vectors. In contrast to the well-known false data injection attacks to control systems that require compromising distributed sensors, our approach directly implements attacks to the control center server. Our experimental results on benchmark power systems and five widely utilized EMSs show the practical feasibility of our attack generation and implementation approach.
It is difficult to assess the security of modern enterprise networks because they are usually dynamic with configuration changes (such as changes in topology, firewall rules, etc). Graphical security models (e.g., Attack Graphs and Attack Trees) and security metrics (e.g., attack cost, shortest attack path) are widely used to systematically analyse the security posture of network systems. However, there are problems using them to assess the security of dynamic networks. First, the existing graphical security models are unable to capture dynamic changes occurring in the networks over time. Second, the existing security metrics are not designed for dynamic networks such that their effectiveness to the dynamic changes in the network is still unknown. In this paper, we conduct a comprehensive analysis via simulations to evaluate the effectiveness of security metrics using a Temporal Hierarchical Attack Representation Model. Further, we investigate the varying effects of security metrics when changes are observed in the dynamic networks. Our experimental analysis shows that different security metrics have varying security posture changes with respect to changes in the network.
Power grid operations rely on the trustworthy operation of critical control center functionalities, including the so-called Economic Dispatch (ED) problem. The ED problem is a large-scale optimization problem that is periodically solved by the system operator to ensure the balance of supply and load while maintaining reliability constraints. In this paper, we propose a semantics-based attack generation and implementation approach to study the security of the ED problem.1 Firstly, we generate optimal attack vectors to transmission line ratings to induce maximum congestion in the critical lines, resulting in the violation of capacity limits. We formulate a bilevel optimization problem in which the attacker chooses manipulations of line capacity ratings to maximinimize the percentage line capacity violations under linear power flows. We reformulate the bilevel problem as a mixed integer linear program that can be solved efficiently. Secondly, we describe how the optimal attack vectors can be implemented in commercial energy management systems (EMSs). The attack explores the dynamic memory space of the EMS, and replaces the true line capacity ratings stored in data regions with the optimal attack vectors. In contrast to the well-known false data injection attacks to control systems that require compromising distributed sensors, our approach directly implements attacks to the control center server. Our experimental results on benchmark power systems and five widely utilized EMSs show the practical feasibility of our attack generation and implementation approach.
It is difficult to assess the security of modern enterprise networks because they are usually dynamic with configuration changes (such as changes in topology, firewall rules, etc). Graphical security models (e.g., Attack Graphs and Attack Trees) and security metrics (e.g., attack cost, shortest attack path) are widely used to systematically analyse the security posture of network systems. However, there are problems using them to assess the security of dynamic networks. First, the existing graphical security models are unable to capture dynamic changes occurring in the networks over time. Second, the existing security metrics are not designed for dynamic networks such that their effectiveness to the dynamic changes in the network is still unknown. In this paper, we conduct a comprehensive analysis via simulations to evaluate the effectiveness of security metrics using a Temporal Hierarchical Attack Representation Model. Further, we investigate the varying effects of security metrics when changes are observed in the dynamic networks. Our experimental analysis shows that different security metrics have varying security posture changes with respect to changes in the network.
Software-defined networks provide new facilities for deploying security mechanisms dynamically. In particular, it is possible to build and adjust security chains to protect the infrastructures, by combining different security functions, such as firewalls, intrusion detection systems and services for preventing data leakage. It is important to ensure that these security chains, in view of their complexity and dynamics, are consistent and do not include security violations. We propose in this paper an automated strategy for supporting the verification of security chains in software-defined networks. It relies on an architecture integrating formal verification methods for checking both the control and data planes of these chains, before their deployment. We describe algorithms for translating specifications of security chains into formal models that can then be verified by SMT1 solving or model checking. Our solution is prototyped as a package, named Synaptic, built as an extension of the Frenetic family of SDN programming languages. The performances of our approach are evaluated through extensive experimentations based on the CVC4, veriT, and nuXmv checkers.
The majority of business activity of our integrated and connected world takes place in networks based on cloud computing infrastructure that cross national, geographic and jurisdictional boundaries. Such an efficient entity interconnection is made possible through an emerging networking paradigm, Software Defined Networking (SDN) that intends to vastly simplify policy enforcement and network reconfiguration in a dynamic manner. However, despite the obvious advantages this novel networking paradigm introduces, its increased attack surface compared to traditional networking deployments proved to be a thorny issue that creates skepticism when safety-critical applications are considered. Especially when SDN is used to support Internet-of-Things (IoT)-related networking elements, additional security concerns rise, due to the elevated vulnerability of such deployments to specific types of attacks and the necessity of inter-cloud communication any IoT application would require. The overall number of connected nodes makes the efficient monitoring of all entities a real challenge, that must be tackled to prevent system degradation and service outage. This position paper provides an overview of common security issues of SDN when linked to IoT clouds, describes the design principals of the recently introduced Blockchain paradigm and advocates the reasons that render Blockchain as a significant security factor for solutions where SDN and IoT are involved.