Biblio
Insider threat is a significant security risk for information system, and detection of insider threat is a major concern for information system organizers. Recently existing work mainly focused on the single pattern analysis of user single-domain behavior, which were not suitable for user behavior pattern analysis in multi-domain scenarios. However, the fusion of multi-domain irrelevant features may hide the existence of anomalies. Previous feature learning methods have relatively a large proportion of information loss in feature extraction. Therefore, this paper proposes a hybrid model based on the deep belief network (DBN) to detect insider threat. First, an unsupervised DBN is used to extract hidden features from the multi-domain feature extracted by the audit logs. Secondly, a One-Class SVM (OCSVM) is trained from the features learned by the DBN. The experimental results on the CERT dataset demonstrate that the DBN can be used to identify the insider threat events and it provides a new idea to feature processing for the insider threat detection.
In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections of botnets at their phase of C&C, that maximizes the detection rate of these botnets. A Genetic Algorithm (GA) was used to select the set of features that gives the highest detection rate. We used the machine learning algorithm C4.5, this algorithm did the classification between connections belonging or not to a botnet. The datasets used in this paper were extracted from the repositories ISOT and ISCX. Some tests were done to get the best parameters in a GA and the algorithm C4.5. We also performed experiments in order to obtain the best set of features for each botnet analyzed (specific), and for each type of botnet (general) too. The results are shown at the end of the paper, in which a considerable reduction of features and a higher detection rate than the related work presented were obtained.
Software components, which are vulnerable to being exploited, need to be identified and patched. Employing any prevention techniques designed for the purpose of detecting vulnerable software components in early stages can reduce the expenses associated with the software testing process significantly and thus help building a more reliable and robust software system. Although previous studies have demonstrated the effectiveness of adapting prediction techniques in vulnerability detection, the feasibility of those techniques is limited mainly because of insufficient training data sets. This paper proposes a prediction technique targeting at early identification of potentially vulnerable software components. In the proposed scheme, the potentially vulnerable components are viewed as mislabeled data that may contain true but not yet observed vulnerabilities. The proposed hybrid technique combines the supports vector machine algorithm and ensemble learning strategy to better identify potential vulnerable components. The proposed vulnerability detection scheme is evaluated using some Java Android applications. The results demonstrated that the proposed hybrid technique could identify potentially vulnerable classes with high precision and relatively acceptable accuracy and recall.
Defending key network infrastructure, such as Internet backbone links or the communication channels of critical infrastructure, is paramount, yet challenging. The inherently complex nature and quantity of network data impedes detecting attacks in real world settings. In this paper, we utilize features of network flows, characterized by their entropy, together with an extended version of the original Replicator Neural Network (RNN) and deep learning techniques to learn models of normality. This combination allows us to apply anomaly-based intrusion detection on arbitrarily large amounts of data and, consequently, large networks. Our approach is unsupervised and requires no labeled data. It also accurately detects network-wide anomalies without presuming that the training data is completely free of attacks. The evaluation of our intrusion detection method, on top of real network data, indicates that it can accurately detect resource exhaustion attacks and network profiling techniques of varying intensities. The developed method is efficient because a normality model can be learned by training an RNN within a few seconds only.
Power system security is one of the key issues in the operation of smart grid system. Evaluation of power system security is a big challenge considering all the contingencies, due to huge computational efforts involved. Phasor measurement unit plays a vital role in real time power system monitoring and control. This paper presents static security assessment scheme for large scale inter connected power system with Phasor measurement unit using Artificial Neural Network. Voltage magnitude and phase angle are used as input variables of the ANN. The optimal location of PMU under base case and critical contingency cases are determined using Genetic algorithm. The performance of the proposed optimization model was tested with standard IEEE 30 bus system incorporating zero injection buses and successful results have been obtained.
The traditional text classification methods usually follow this process: first, a sentence can be considered as a bag of words (BOW), then transformed into sentence feature vector which can be classified by some methods, such as maximum entropy (ME), Naive Bayes (NB), support vector machines (SVM), and so on. However, when these methods are applied to text classification, we usually can not obtain an ideal result. The most important reason is that the semantic relations between words is very important for text categorization, however, the traditional method can not capture it. Sentiment classification, as a special case of text classification, is binary classification (positive or negative). Inspired by the sentiment analysis, we use a novel deep learning-based recurrent neural networks (RNNs)model for automatic security audit of short messages from prisons, which can classify short messages(secure and non-insecure). In this paper, the feature of short messages is extracted by word2vec which captures word order information, and each sentence is mapped to a feature vector. In particular, words with similar meaning are mapped to a similar position in the vector space, and then classified by RNNs. RNNs are now widely used and the network structure of RNNs determines that it can easily process the sequence data. We preprocess short messages, extract typical features from existing security and non-security short messages via word2vec, and classify short messages through RNNs which accept a fixed-sized vector as input and produce a fixed-sized vector as output. The experimental results show that the RNNs model achieves an average 92.7% accuracy which is higher than SVM.
The demand for trained cybersecurity operators is growing more quickly than traditional programs in higher education can fill. At the same time, unemployment for returning military veterans has become a nationally discussed problem. We describe the design and launch of New Skills for a New Fight (NSNF), an intensive, one-year program to train military veterans for the cybersecurity field. This non-traditional program, which leverages experience that veterans gained in military service, includes recruitment and selection, a base of knowledge in the form of four university courses in a simultaneous cohort mode, a period of hands-on cybersecurity training, industry certifications and a practical internship in a Security Operations Center (SOC). Twenty veterans entered this pilot program in January of 2016, and will complete in less than a year's time. Initially funded by a global financial services company, the program provides veterans with an expense-free preparation for an entry-level cybersecurity job.
This project shows a procedure-training simulator targeted at the operation and maintenance of overland distribution power lines. This simulator is focused on workplace safety and risk assessment of common daily operations such as fuse replacement and power cut activities. The training system is implemented using VR goggles (Oculus Rift) and a mixture of a real scenario matched perfectly with its Virtual Reality counterpart. The real scenario is composed of a real "basket" and a stick - both of the equipment is the actual ones used in daily training. Both, equipment are tracked by high precision infrared cameras system (OptiTrack) providing a high degree of immersion and realism. In addition to tracking the scenario, the user is completely tracked: heads, shoulders, arms and hands are tracked. This tracking allows a perfect simulation of the participant's movements in the Virtual World. This allows precise evaluation of movements as well as ergonomics. The virtual scenario was carefully designed to accurately reproduce in a coherent way all relevant spatial, architectonic and natural features typical of the urban environment, reflecting the variety of challenges that real cities might impose on the activity. The system consists of two modules: the first module being Instructor Interface, which will help create and control different challenging scenarios and follow the student's reactions and behavior; and the second module is the simulator itself, which will be presented to the student through VR goggles. The training session can also be viewed on a projected screen by other students, enabling learning through observation of mistakes and successes of their peers, such as a martial arts dojo. The simulator features various risk scenarios such as: different climates - sun, rain and wind; different lighting conditions - day, night and artificial; different types of electrical structures; transformer fire and explosion; short-circuit and electric arc; defective equipment; many obstacles - trees, cars, windows, swarm of bees, etc.
In this paper we study keystroke dynamics as an authentication mechanism for touch screen based devices. The authentication process decides whether the identity of a given person is accepted or rejected. This can be easily implemented by using a two-class classifier which operates with the help of positive samples (belonging to the authentic person) and negative ones. However, collecting negative samples is not always a viable option. In such cases a one-class classification algorithm can be used to characterize the target class and distinguish it from the outliers. We implemented an authentication test-framework that is capable of working with both one-class and two-class classification algorithms. The framework was evaluated on our dataset containing keystroke samples from 42 users, collected from touch screen-based Android devices. Experimental results yield an Equal Error Rate (EER) of 3% (two-class) and 7% (one-class) respectively.
In this research, we focus on context independent continuous authentication that reacts on every separate action performed by a user. The experimental data was collected in a complete uncontrolled condition from 53 users by using our data collection software. In our analysis, we considered both keystroke and mouse usage behaviour patterns to prevent a situation where an attacker avoids detection by restricting to one input device because the continuous authentication system only checks the other input device. The best result obtained from this research is that for 47 bio-metric subjects we have on average 275 actions required to detect an imposter where these biometric subjects are never locked out from the system.
The aim of this research is to advance the user active authentication using keystroke dynamics. Through this research, we assess the performance and influence of various keystroke features on keystroke dynamics authentication systems. In particular, we investigate the performance of keystroke features on a subset of most frequently used English words. The performance of four features such as i) key duration, ii) flight time latency, iii) diagraph time latency, and iv) word total time duration are analyzed. Two machine learning techniques are employed for assessing keystroke authentications. The selected classification methods are support vector machine (SVM), and k-nearest neighbor classifier (K-NN). The logged experimental data are captured for 28 users. The experimental results show that key duration time offers the best performance result among all four keystroke features, followed by word total time.
Biometric systems have been applied to improve the security of several computational systems. These systems analyse physiological or behavioural features obtained from the users in order to perform authentication. Biometric features should ideally meet a number of requirements, including permanence. In biometrics, permanence means that the analysed biometric feature will not change over time. However, recent studies have shown that this is not the case for several biometric modalities. Adaptive biometric systems deal with this issue by adapting the user model over time. Some algorithms for adaptive biometrics have been investigated and compared in the literature. In machine learning, several studies show that the combination of individual techniques in ensembles may lead to more accurate and stable decision models. This paper investigates the usage of some ensemble approaches to combine the output of current adaptive algorithms for biometrics. The experiments are carried out on keystroke dynamics, a biometric modality known to be subject to change over time.
Keystroke dynamics analysis has been applied successfully to password or fixed short texts verification as a means to reduce their inherent security limitations, because their length and the fact of being typed often makes their characteristic timings fairly stable. On the other hand, free text analysis has been neglected until recent years due to the inherent difficulties of dealing with short term behavioral noise and long term effects over the typing rhythm. In this paper we examine finite context modeling of keystroke dynamics in free text and report promising results for user verification over an extensive data set collected from a real world environment outside the laboratory setting that we make publicly available.
Automation systems are gaining popularity around the world. The use of these powerful technologies for home security has been proposed and some systems have been developed. Other implementations see the user taking a central role in providing and receiving updates to the system. We propose a system making use of an Android based smartphone as the user control point. Our Android application allows for dual factor (facial and secret pin) based authentication in order to protect the privacy of the user. The system successfully implements facial recognition on the limited resources of a smartphone by making use of the Eigenfaces algorithm. The system we created was designed for home automation but makes use of technologies that allow it to be applied within any environment. This opens the possibility for more research into dual factor authentication and the architecture of our system provides a blue print for the implementation of home based automation systems. This system with minimal modifications can be applied within an industrial application.
We present the novel concept of Controllable Face Privacy. Existing methods that alter face images to conceal identity inadvertently also destroy other facial attributes such as gender, race or age. This all-or-nothing approach is too harsh. Instead, we propose a flexible method that can independently control the amount of identity alteration while keeping unchanged other facial attributes. To achieve this flexibility, we apply a subspace decomposition onto our face encoding scheme, effectively decoupling facial attributes such as gender, race, age, and identity into mutually orthogonal subspaces, which in turn enables independent control of these attributes. Our method is thus useful for nuanced face de-identification, in which only facial identity is altered, but others, such gender, race and age, are retained. These altered face images protect identity privacy, and yet allow other computer vision analyses, such as gender detection, to proceed unimpeded. Controllable Face Privacy is therefore useful for reaping the benefits of surveillance cameras while preventing privacy abuse. Our proposal also permits privacy to be applied not just to identity, but also to other facial attributes as well. Furthermore, privacy-protection mechanisms, such as k-anonymity, L-diversity, and t-closeness, may be readily incorporated into our method. Extensive experiments with a commercial facial analysis software show that our alteration method is indeed effective.
In this paper, the principle of the kernel extreme learning machine (ELM) is analyzed. Based on that, we introduce a kind of multi-scale wavelet kernel extreme learning machine classifier and apply it to electroencephalographic (EEG) signal feature classification. Experiments show that our classifier achieves excellent performance.
Attributing the culprit of a cyber-attack is widely considered one of the major technical and policy challenges of cyber-security. The lack of ground truth for an individual responsible for a given attack has limited previous studies. Here, we overcome this limitation by leveraging DEFCON capture-the-flag (CTF) exercise data where the actual ground-truth is known. In this work, we use various classification techniques to identify the culprit in a cyberattack and find that deceptive activities account for the majority of misclassified samples. We also explore several heuristics to alleviate some of the misclassification caused by deception.
With the rapid development of the information technology, more and more high-speed networks came out. The 4G LTE network as a recently emerging network has gradually entered the mainstream of the communication network. This paper proposed an effective content-based information filtering based on the 4G LTE high-speed network by combing the content-based filter and traditional simple filter. Firstly, raw information is pre-processed by five-tuple filter. Secondly, we determine the topics and character of the source data by key nearest neighbor text classification after minimum-risk Bayesian classification. Finally, the improved AdaBoost algorithm achieves the four-level content-based information filtering. The experiments reveal that the effective information filtering method can be applied to the network security, big data analysis and other fields. It has high research value and market value.
Nowadays, a typical household owns multiple digital devices that can be connected to the Internet. Advertising companies always want to seamlessly reach consumers behind devices instead of the device itself. However, the identity of consumers becomes fragmented as they switch from one device to another. A naive attempt is to use deterministic features such as user name, telephone number and email address. However consumers might refrain from giving away their personal information because of privacy and security reasons. The challenge in ICDM2015 contest is to develop an accurate probabilistic model for predicting cross-device consumer identity without using the deterministic user information. In this paper we present an accurate and scalable cross-device solution using an ensemble of Gradient Boosting Decision Trees (GBDT) and Random Forest. Our final solution ranks 9th both on the public and private LB with F0.5 score of 0.855.
Among most of the cyber attacks that occured, the most drastic are advanced persistent threats. APTs are differ from other attacks as they have multiple phases, often silent for long period of time and launched by adamant, well-funded opponents. These targeted attacks mainly concentrated on government agencies and organizations in industries, as are those involved in international trade and having sensitive data. APTs escape from detection by antivirus solutions, intrusion detection and intrusion prevention systems and firewalls. In this paper we proposes a classification model having 99.8% accuracy, for the detection of APT.
An identity authentication scheme is proposed combining with biometric encryption, public key cryptography of homomorphism and predicate encryption technology under the cloud computing environment. Identity authentication scheme is proposed based on the voice and homomorphism technology. The scheme is divided into four stages, register and training template stage, voice login and authentication stage, authorization stage, and audit stage. The results prove the scheme has certain advantages in four aspects.
In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of faults, using a composite algorithm named MLL-GA, which combines various multi-label learning algorithms by leveraging genetic algorithm (GA). To evaluate the effectiveness of MLL-GA, we perform experiments on 6 open source programs and show that MLL-GA could achieve average F-measures of 0.6078 to 0.8665. We also compare our algorithm with Ml.KNN and show that on average across the 6 datasets, MLL-GA improves the average F-measure of MI.KNN by 14.43%.
Data mining is the process of finding correlations in the relational databases. There are different techniques for identifying malicious database transactions. Many existing approaches which profile is SQL query structures and database user activities to detect intrusion, the log mining approach is the automatic discovery for identifying anomalous database transactions. Mining of the Data is very helpful to end users for extracting useful business information from large database. Multi-level and multi-dimensional data mining are employed to discover data item dependency rules, data sequence rules, domain dependency rules, and domain sequence rules from the database log containing legitimate transactions. Database transactions that do not comply with the rules are identified as malicious transactions. The log mining approach can achieve desired true and false positive rates when the confidence and support are set up appropriately. The implemented system incrementally maintain the data dependency rule sets and optimize the performance of the intrusion detection process.
Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing ℓ2,1-norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.