Biblio
In view of the problem that the intrusion detection method based on One-Class Support Vector Machine (OCSVM) could not detect the outliers within the industrial data, which results in the decision function deviating from the training sample, an anomaly intrusion detection algorithm based on Robust Incremental Principal Component Analysis (RIPCA) -OCSVM is proposed in this paper. The method uses RIPCA algorithm to remove outliers in industrial data sets and realize dimensionality reduction. In combination with the advantages of OCSVM on the single classification problem, an anomaly detection model is established, and the Improved Particle Swarm Optimization (IPSO) is used for model parameter optimization. The simulation results show that the method can efficiently and accurately identify attacks or abnormal behaviors while meeting the real-time requirements of the industrial control system (ICS).
With the evolution of network threat, identifying threat from internal is getting more and more difficult. To detect malicious insiders, we move forward a step and propose a novel attribute classification insider threat detection method based on long short term memory recurrent neural networks (LSTM-RNNs). To achieve high detection rate, event aggregator, feature extractor, several attribute classifiers and anomaly calculator are seamlessly integrated into an end-to-end detection framework. Using the CERT insider threat dataset v6.2 and threat detection recall as our performance metric, experimental results validate that the proposed threat detection method greatly outperforms k-Nearest Neighbor, Isolation Forest, Support Vector Machine and Principal Component Analysis based threat detection methods.
Collaborative filtering (CF) recommender system has been widely used for its well performing in personalized recommendation, but CF recommender system is vulnerable to shilling attacks in which shilling attack profiles are injected into the system by attackers to affect recommendations. Design robust recommender system and propose attack detection methods are the main research direction to handle shilling attacks, among which unsupervised PCA is particularly effective in experiment, but if we have no information about the number of shilling attack profiles, the unsupervised PCA will be suffered. In this paper, a new unsupervised detection method which combine PCA and data complexity has been proposed to detect shilling attacks. In the proposed method, PCA is used to select suspected attack profiles, and data complexity is used to pick out the authentic profiles from suspected attack profiles. Compared with the traditional PCA, the proposed method could perform well and there is no need to determine the number of shilling attack profiles in advance.
Security research has made extensive use of exhaustive Internet-wide scans over the recent years, as they can provide significant insights into the overall state of security of the Internet, and ZMap made scanning the entire IPv4 address space practical. However, the IPv4 address space is exhausted, and a switch to IPv6, the only accepted long-term solution, is inevitable. In turn, to better understand the security of devices connected to the Internet, including in particular Internet of Things devices, it is imperative to include IPv6 addresses in security evaluations and scans. Unfortunately, it is practically infeasible to iterate through the entire IPv6 address space, as it is 2ˆ96 times larger than the IPv4 address space. Therefore, enumeration of active hosts prior to scanning is necessary. Without it, we will be unable to investigate the overall security of Internet-connected devices in the future. In this paper, we introduce a novel technique to enumerate an active part of the IPv6 address space by walking DNSSEC-signed IPv6 reverse zones. Subsequently, by scanning the enumerated addresses, we uncover significant security problems: the exposure of sensitive data, and incorrectly controlled access to hosts, such as access to routing infrastructure via administrative interfaces, all of which were accessible via IPv6. Furthermore, from our analysis of the differences between accessing dual-stack hosts via IPv6 and IPv4, we hypothesize that the root cause is that machines automatically and by default take on globally routable IPv6 addresses. This is a practice that the affected system administrators appear unaware of, as the respective services are almost always properly protected from unauthorized access via IPv4. Our findings indicate (i) that enumerating active IPv6 hosts is practical without a preferential network position contrary to common belief, (ii) that the security of active IPv6 hosts is currently still lagging behind the security state of IPv4 hosts, and (iii) that unintended IPv6 connectivity is a major security issue for unaware system administrators.
Transition noise and remanence noise are the two most important types of media noise in heat-assisted magnetic recording. We examine two methods (spatial splitting and principal components analysis) to distinguish them: both techniques show similar trends with respect to applied field and grain pitch (GP). It was also found that PW50can be affected by GP and reader design, but is almost independent of write field and bit length (larger than 50 nm). Interestingly, our simulation shows a linear relationship between jitter and PW50NSRrem, which agrees qualitatively with experimental results.
It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.
Hardware Trojans (HTs) are malicious modifications of the original circuits intended to leak information or cause malfunction. Based on the Side Channel Analysis (SCA) technology, a set of hardware Trojan detection platform is designed for RTL circuits on the basis of HSPICE power consumption simulation. Principal Component Analysis (PCA) algorithm is used to reduce the dimension of power consumption data. An intelligent neural networks (NN) algorithm based on Particle Swarm Optimization (PSO) is introduced to achieve HTs recognition. Experimental results show that the detection accuracy of PSO NN method is much better than traditional BP NN method.
Statistical prediction models can be an effective technique to identify vulnerable components in large software projects. Two aspects of vulnerability prediction models have a profound impact on their performance: 1) the features (i.e., the characteristics of the software) that are used as predictors and 2) the way those features are used in the setup of the statistical learning machinery. In a previous work, we compared models based on two different types of features: software metrics and term frequencies (text mining features). In this paper, we broaden the set of models we compare by investigating an array of techniques for the manipulation of said features. These techniques fall under the umbrella of dimensionality reduction and have the potential to improve the ability of a prediction model to localize vulnerabilities. We explore the role of dimensionality reduction through a series of cross-validation and cross-project prediction experiments. Our results show that in the case of software metrics, a dimensionality reduction technique based on confirmatory factor analysis provided an advantage when performing cross-project prediction, yielding the best F-measure for the predictions in five out of six cases. In the case of text mining, feature selection can make the prediction computationally faster, but no dimensionality reduction technique provided any other notable advantage.
In vehicular networks, each message is signed by the generating node to ensure accountability for the contents of that message. For privacy reasons, each vehicle uses a collection of certificates, which for accountability reasons are linked at a central authority. One such design is the Security Credential Management System (SCMS) [1], which is the leading credential management system in the US. The SCMS is composed of multiple components, each of which has a different task for key management, which are logically separated. The SCMS is designed to ensure privacy against a single insider compromise, or against outside adversaries. In this paper, we demonstrate that the current SCMS design fails to achieve its design goal, showing that a compromised authority can gain substantial information about certificate linkages. We propose a solution that accommodates threshold-based detection, but uses relabeling and noise to limit the information that can be learned from a single insider adversary. We also analyze our solution using techniques from differential privacy and validate it using traffic-simulator based experiments. Our results show that our proposed solution prevents privacy information leakage against the compromised authority in collusion with outsider attackers.
More and more medical data are shared, which leads to disclosure of personal privacy information. Therefore, the construction of medical data privacy preserving publishing model is of great value: not only to make a non-correspondence between the released information and personal identity, but also to maintain the data utility after anonymity. However, there is an inherent contradiction between the anonymity and the data utility. In this paper, a Principal Component Analysis-Grey Relational Analysis (PCA-GRA) K anonymous algorithm is proposed to improve the data utility effectively under the premise of anonymity, in which the association between quasi-identifiers and the sensitive information is reckoned as a criterion to control the generalization hierarchy. Compared with the previous anonymity algorithms, results show that the proposed PCA-GRA K anonymous algorithm has achieved significant improvement in data utility from three aspects, namely information loss, feature maintenance and classification evaluation performance.
Distributed Denial of Service (DDoS) attacks are a popular and inexpensive form of cyber attacks. Application layer DDoS attacks utilize legitimate application layer requests to overwhelm a web server. These attacks are a major threat to Internet applications and web services. The main goal of these attacks is to make the services unavailable to legitimate users by overwhelming the resources on a web server. They look valid in connection and protocol characteristics, which makes them difficult to detect. In this paper, we propose a detection method for the application layer DDoS attacks, which is based on user behavior anomaly detection. We extract instances of user behaviors requesting resources from HTTP web server logs. We apply the Principle Component Analysis (PCA) subspace anomaly detection method for the detection of anomalous behavior instances. Web server logs from a web server hosting a student resource portal were collected as experimental data. We also generated nine different HTTP DDoS attacks through penetration testing. Our performance results on the collected data show that using PCAsubspace anomaly detection on user behavior data can detect application layer DDoS attacks, even if they are trying to mimic a normal user's behavior at some level.
Software-defined networks offer a promising framework for the implementation of cross-layer data-centric security policies in military systems. An important aspect of the design process for such advanced security solutions is the thorough experimental assessment and validation of proposed technical concepts prior to their deployment in operational military systems. In this paper, we describe an OpenFlow-based testbed, which was developed with a specific focus on validation of SDN security mechanisms - including both the mechanisms for protecting the software-defined network layer and the cross-layer enforcement of higher level policies, such as data-centric security policies. We also present initial experimentation results obtained using the testbed, which confirm its ability to validate simulation and analytic predictions. Our objective is to provide a sufficiently detailed description of the configuration used in our testbed so that it can be easily re-plicated and re-used by other security researchers in their experiments.
The smart grid is an electrical grid that has a duplex communication. This communication is between the utility and the consumer. Digital system, automation system, computers and control are the various systems of Smart Grid. It finds applications in a wide variety of systems. Some of its applications have been designed to reduce the risk of power system blackout. Dynamic vulnerability assessment is done to identify, quantify, and prioritize the vulnerabilities in a system. This paper presents a novel approach for classifying the data into one of the two classes called vulnerable or non-vulnerable by carrying out Dynamic Vulnerability Assessment (DVA) based on some data mining techniques such as Multichannel Singular Spectrum Analysis (MSSA), and Principal Component Analysis (PCA), and a machine learning tool such as Support Vector Machine Classifier (SVM-C) with learning algorithms that can analyze data. The developed methodology is tested in the IEEE 57 bus, where the cause of vulnerability is transient instability. The results show that data mining tools can effectively analyze the patterns of the electric signals, and SVM-C can use those patterns for analyzing the system data as vulnerable or non-vulnerable and determines System Vulnerability Status.
Standard classification procedures of both data mining and multivariate statistics are sensitive to the presence of outlying values. In this paper, we propose new algorithms for computing regularized versions of linear discriminant analysis for data with small sample sizes in each group. Further, we propose a highly robust version of a regularized linear discriminant analysis. The new method denoted as MWCD-L2-LDA is based on the idea of implicit weights assigned to individual observations, inspired by the minimum weighted covariance determinant estimator. Classification performance of the new method is illustrated on a detailed analysis of our pilot study of authentication methods on computers, using individual typing characteristics by means of keystroke dynamics.
Protecting the privacy of user-identification data is fundamental to protect the information systems from attacks and vulnerabilities. Providing access to such data only to the limited and legitimate users is the key motivation for `Biometrics'. In `Biometric Systems' confirming a user's claim of his/her identity reliably, is more important than focusing on `what he/she really possesses' or `what he/she remembers'. In this paper the use of face image for biometric access is proposed using two multistage face recognition algorithms that employ biometric facial features to validate the user's claim. The proposed algorithms use standard algorithms and classifiers such as EigenFaces, PCA and LDA in stages. Performance evaluation of both proposed algorithms is carried out using two standard datasets, the Extended Yale database and AT&T database. Results using the proposed multi-stage algorithms are better than those using other standard algorithms. Current limitations and possible applications of the proposed algorithms are also discussed along, with further scope of making these robust to pose, illumination and noise variations.
Nearest neighbor search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, for example, locality sensitive hashing (LSH), are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables (projections) randomly, a large number of hash tables (i.e., long codewords) are required in order to achieve both high precision and recall. To address this limitation, we propose a novel hashing algorithm called density sensitive hashing (DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSH avoids the purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the state-of-the-art hashing approaches.
Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
In many Twitter applications, developers collect only a limited sample of tweets and a local portion of the Twitter network. Given such Twitter applications with limited data, how can we classify Twitter users as either bots or humans? We develop a collection of network-, linguistic-, and application-oriented variables that could be used as possible features, and identify specific features that distinguish well between humans and bots. In particular, by analyzing a large dataset relating to the 2014 Indian election, we show that a number of sentimentrelated factors are key to the identification of bots, significantly increasing the Area under the ROC Curve (AUROC). The same method may be used for other applications as well.