Biblio
Botnet has been evolving over time since its birth. Nowadays, P2P (Peer-to-Peer) botnet has become a main threat to cyberspace security, owing to its strong concealment and easy expansibility. In order to effectively detect P2P botnet, researchers often focus on the analysis of network traffic. For the sake of enriching P2P botnet detection methods, the author puts forward a new sight of applying distributed threat intelligence sharing system to P2P botnet detection. This system aims to fight against distributed botnet by using distributed methods itself, and then to detect botnet in real time. To fulfill the goal of botnet detection, there are 3 important parts: the threat intelligence sharing and evaluating system, the BAV quantitative TI model, and the AHP and HMM based analysis algorithm. Theoretically, this method should work on different types of distributed cyber threat besides P2P botnet.
The failure prediction method of virtual machines (VM) guarantees reliability to cloud platforms. However, the uncertainty of VM security state will affect the reliability and task processing capabilities of the entire cloud platform. In this study, a failure prediction method of VM based on AdaBoost-Hidden Markov Model was proposed to improve the reliability of VMs and overall performance of cloud platforms. This method analyzed the deep relationship between the observation state and the hidden state of the VM through the hidden Markov model, proved the influence of the AdaBoost algorithm on the hidden Markov model (HMM), and realized the prediction of the VM failure state. Results show that the proposed method adapts to the complex dynamic cloud platform environment, can effectively predict the failure state of VMs, and improve the predictive ability of VM security state.
Short messages usage has been tremendously increased such as SMS, tweets and status updates. Due to its popularity and ease of use, many companies use it for advertisement purpose. Hackers also use SMS to defraud users and steal personal information. In this paper, the use of Graphs centrality metrics is proposed for spam SMS detection. The graph centrality measures: degree, closeness, and eccentricity are used for classification of SMS. Graphs for each class are created using labeled SMS and then unlabeled SMS is classified using the centrality scores of the token available in the unclassified SMS. Our results show that highest precision and recall is achieved by using degree centrality. Degree centrality achieved the highest precision i.e. 0.81 and recall i.e., 0.76 for spam messages.
The Internet has gradually penetrated into the national economy, politics, culture, military, education and other fields. Due to its openness, interconnectivity and other characteristics, the Internet is vulnerable to all kinds of malicious attacks. The research uses a honeynet to collect attacker information, and proposes a network penetration recognition technology based on interactive behavior analysis. Using Sebek technology to capture the attacker's keystroke record, time series modeling of the keystroke sequences of the interaction behavior is proposed, using a Recurrent Neural Network. The attack recognition method is constructed by using Long Short-Term Memory that solves the problem of gradient disappearance, gradient explosion and long-term memory shortage in ordinary Recurrent Neural Network. Finally, the experiment verifies that the short-short time memory network has a high accuracy rate for the recognition of penetration attacks.
Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.
An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).
Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.
Malware technology makes it difficult for malware analyst to detect same malware files with different obfuscation technique. In this paper we are trying to tackle that problem by analyzing the sequence of system call from an executable file. Malware files which actually are the same should have almost identical or at least a similar sequence of system calls. In this paper, we are going to create a model for each malware class consists of malwares from different families based on its sequence of system calls. Method/algorithm that's used in this paper is profile hidden markov model which is a very well-known tool in the biological informatics field for comparing DNA and protein sequences. Malware classes that we are going to build are trojan and worm class. Accuracy for these classes are pretty high, it's above 90% with also a high false positive rate around 37%.
SDN is a new network framework which can be controlled and defined by software programming, and OpenFlow is the communication protocol between SDN controller plane and data plane. With centralized control of SDN, the network is more vulnerable encounter APT than traditional network. After deeply analyzing the process of APT at each stage in SDN, this paper proposes the APT detection method based on HMM, which can fully reflect the relationship between attack behavior and APT stage. Experiment shows that the method is more accurate to detect APT in SDN, and less overhead.
With the progressive development of network applications and software dependency, we need to discover more advanced methods for protecting our systems. Each industry is equally affected, and regardless of whether we consider the vulnerability of the government or each individual household or company, we have to find a sophisticated and secure way to defend our systems. The starting point is to create a reliable intrusion detection mechanism that will help us to identify the attack at a very early stage; otherwise in the cyber security space the intrusion can affect the system negatively, which can cause enormous consequences and damage the system's privacy, security or financial stability. This paper proposes a concise, and easy to use statistical learning procedure, abbreviated NASCA, which is a four-stage intrusion detection method that can successfully detect unwanted intrusion to our systems. The model is static, but it can be adapted to a dynamic set up.
This paper presents a framework for privacy-preserving video delivery system to fulfill users' privacy demands. The proposed framework leverages the inference channels in sensitive behavior prediction and object tracking in a video surveillance system for the sequence privacy protection. For such a goal, we need to capture different pieces of evidence which are used to infer the identity. The temporal, spatial and context features are extracted from the surveillance video as the observations to perceive the privacy demands and their correlations. Taking advantage of quantifying various evidence and utility, we let users subscribe videos with a viewer-dependent pattern. We implement a prototype system for off-line and on-line requirements in two typical monitoring scenarios to construct extensive experiments. The evaluation results show that our system can efficiently satisfy users' privacy demands while saving over 25% more video information compared to traditional video privacy protection schemes.
In this paper, we compare the effectiveness of Hidden Markov Models (HMMs) with that of Profile Hidden Markov Models (PHMMs), where both are trained on sequences of API calls. We compare our results to static analysis using HMMs trained on sequences of opcodes, and show that dynamic analysis achieves significantly stronger results in many cases. Furthermore, in comparing our two dynamic analysis approaches, we find that using PHMMs consistently outperforms our technique based on HMMs.
Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.
RFID is widely used for object tracking in indoor environments, e.g., airport baggage tracking. Analyzing RFID data offers insight into the underlying tracking systems as well as the associated business processes. However, the inherent uncertainty in RFID data, including noise (cross readings) and incompleteness (missing readings), pose challenges to high-level RFID data querying and analysis. In this paper, we address these challenges by proposing a learning-based data cleansing approach that, unlike existing approaches, requires no detailed prior knowledge about the spatio-temporal properties of the indoor space and the RFID reader deployment. Requiring only minimal information about RFID deployment, the approach learns relevant knowledge from raw RFID data and uses it to cleanse the data. In particular, we model raw RFID readings as time series that are sparse because the indoor space is only partly covered by a limited number of RFID readers. We propose the Indoor RFID Multi-variate Hidden Markov Model (IR-MHMM) to capture the uncertainties of indoor RFID data as well as the correlation of moving object locations and object RFID readings. We propose three state space design methods for IR-MHMM that enable the learning of parameters while contending with raw RFID data time series. We solely use raw uncleansed RFID data for the learning of model parameters, requiring no special labeled data or ground truth. The resulting IR-MHMM based RFID data cleansing approach is able to recover missing readings and reduce cross readings with high effectiveness and efficiency, as demonstrated by extensive experimental studies with both synthetic and real data. Given enough indoor RFID data for learning, the proposed approach achieves a data cleansing accuracy comparable to or even better than state-of-the-art techniques requiring very detailed prior knowledge, making our solution superior in terms of both effectiveness and employability.
What you see is not definitely believable is not a rare case in the cyber security monitoring. However, due to various tricks of camouflages, such as packing or virutal private network (VPN), detecting "advanced persistent threat"(APT) by only signature based malware detection system becomes more and more intractable. On the other hand, by carefully modeling users' subsequent behaviors of daily routines, probability for one account to generate certain operations can be estimated and used in anomaly detection. To the best of our knowledge so far, a novel behavioral analytic framework, which is dedicated to analyze Active Directory domain service logs and to monitor potential inside threat, is now first proposed in this project. Experiments on real dataset not only show that the proposed idea indeed explores a new feasible direction for cyber security monitoring, but also gives a guideline on how to deploy this framework to various environments.
Keeping up with rapid advances in research in various fields of Engineering and Technology is a challenging task. Decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the effect of growth in certain areas on their core business. Though analyst agencies like Gartner, McKinsey etc. Provide such reports for some areas, thought leaders of all organisations still need to amass data from heterogeneous collections like research publications, analyst reports, patent applications, competitor information etc. To help them finalize their own strategies. Text mining and data analytics researchers have been looking at integrating statistics, text analytics and information visualization to aid the process of retrieval and analytics. In this paper, we present our work on automated topical analysis and insight generation from large heterogeneous text collections of publications and patents. While most of the earlier work in this area provides search-based platforms, ours is an integrated platform for search and analysis. We have presented several methods and techniques that help in analysis and better comprehension of search results. We have also presented methods for generating insights about emerging and popular trends in research along with contextual differences between academic research and patenting profiles. We also present novel techniques to present topic evolution that helps users understand how a particular area has evolved over time.
Enhanced situational awareness is integral to risk management and response evaluation. Dynamic systems that incorporate both hard and soft data sources allow for comprehensive situational frameworks which can supplement physical models with conceptual notions of risk. The processing of widely available semi-structured textual data sources can produce soft information that is readily consumable by such a framework. In this paper, we augment the situational awareness capabilities of a recently proposed risk management framework (RMF) with the incorporation of soft data. We illustrate the beneficial role of the hard-soft data fusion in the characterization and evaluation of potential vessels in distress within Maritime Domain Awareness (MDA) scenarios. Risk features pertaining to maritime vessels are defined a priori and then quantified in real time using both hard (e.g., Automatic Identification System, Douglas Sea Scale) as well as soft (e.g., historical records of worldwide maritime incidents) data sources. A risk-aware metric to quantify the effectiveness of the hard-soft fusion process is also proposed. Though illustrated with MDA scenarios, the proposed hard-soft fusion methodology within the RMF can be readily applied to other domains.