Biblio

List
Filter

Found 871 results

Filters: Keyword is feature extraction [Clear All Filters]

2018-04-04

Ullah, I., Mahmoud, Q. H.. 2017. A hybrid model for anomaly-based intrusion detection in SCADA networks. 2017 IEEE International Conference on Big Data (Big Data). :2160–2167.

Supervisory Control and Data Acquisition (SCADA) systems complexity and interconnectivity increase in recent years have exposed the SCADA networks to numerous potential vulnerabilities. Several studies have shown that anomaly-based Intrusion Detection Systems (IDS) achieves improved performance to identify unknown or zero-day attacks. In this paper, we propose a hybrid model for anomaly-based intrusion detection in SCADA networks using machine learning approach. In the first part, we present a robust hybrid model for anomaly-based intrusion detection in SCADA networks. Finally, we present a feature selection model for anomaly-based intrusion detection in SCADA networks by removing redundant and irrelevant features. Irrelevant features in the dataset can affect modeling power and reduce predictive accuracy. These models were evaluated using an industrial control system dataset developed at the Distributed Analytics and Security Institute Mississippi State University Starkville, MS, USA. The experimental results show that our proposed model has a key effect in reducing the time and computational complexity and achieved improved accuracy and detection rate. The accuracy of our proposed model was measured as 99.5 % for specific-attack-labeled.

Ran, L., Lu, L., Lin, H., Han, M., Zhao, D., Xiang, J., Yu, H., Ma, X.. 2017. An Experimental Study of Four Methods for Homology Analysis of Firmware Vulnerability. 2017 International Conference on Dependable Systems and Their Applications (DSA). :42–50.

In the production process of embedded device, due to the frequent reuse of third-party libraries or development kits, there are large number of same vulnerabilities that appear in more than one firmware. Homology analysis is often used in detecting this kind of vulnerabilities caused by code reuse or third-party reuse and in the homology analysis, the widely used methods are mainly Binary difference analysis, Normalized compression distance, String feature matching and Fuzz hash. But when we use these methods for homology analysis, we found that the detection result is not ideal and there is a high false positive rate. Focusing on this problem, we analyzed the application scenarios of these four methods and their limitations by combining different methods and different types of files and the experiments show that the combination of methods and files have a better performance in homology analysis.

Nawaratne, R., Bandaragoda, T., Adikari, A., Alahakoon, D., Silva, D. De, Yu, X.. 2017. Incremental knowledge acquisition and self-learning for autonomous video surveillance. IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society. :4790–4795.

The world is witnessing a remarkable increase in the usage of video surveillance systems. Besides fulfilling an imperative security and safety purpose, it also contributes towards operations monitoring, hazard detection and facility management in industry/smart factory settings. Most existing surveillance techniques use hand-crafted features analyzed using standard machine learning pipelines for action recognition and event detection. A key shortcoming of such techniques is the inability to learn from unlabeled video streams. The entire video stream is unlabeled when the requirement is to detect irregular, unforeseen and abnormal behaviors, anomalies. Recent developments in intelligent high-level video analysis have been successful in identifying individual elements in a video frame. However, the detection of anomalies in an entire video feed requires incremental and unsupervised machine learning. This paper presents a novel approach that incorporates high-level video analysis outcomes with incremental knowledge acquisition and self-learning for autonomous video surveillance. The proposed approach is capable of detecting changes that occur over time and separating irregularities from re-occurrences, without the prerequisite of a labeled dataset. We demonstrate the proposed approach using a benchmark video dataset and the results confirm its validity and usability for autonomous video surveillance.

Parchami, M., Bashbaghi, S., Granger, E.. 2017. CNNs with cross-correlation matching for face recognition in video surveillance using a single training sample per person. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). :1–6.

In video surveillance, face recognition (FR) systems seek to detect individuals of interest appearing over a distributed network of cameras. Still-to-video FR systems match faces captured in videos under challenging conditions against facial models, often designed using one reference still per individual. Although CNNs can achieve among the highest levels of accuracy in many real-world FR applications, state-of-the-art CNNs that are suitable for still-to-video FR, like trunk-branch ensemble (TBE) CNNs, represent complex solutions for real-time applications. In this paper, an efficient CNN architecture is proposed for accurate still-to-video FR from a single reference still. The CCM-CNN is based on new cross-correlation matching (CCM) and triplet-loss optimization methods that provide discriminant face representations. The matching pipeline exploits a matrix Hadamard product followed by a fully connected layer inspired by adaptive weighted cross-correlation. A triplet-based training approach is proposed to optimize the CCM-CNN parameters such that the inter-class variations are increased, while enhancing robustness to intra-class variations. To further improve robustness, the network is fine-tuned using synthetically-generated faces based on still and videos of non-target individuals. Experiments on videos from the COX Face and Chokepoint datasets indicate that the CCM-CNN can achieve a high level of accuracy that is comparable to TBE-CNN and HaarNet, but with a significantly lower time and memory complexity. It may therefore represent the better trade-off between accuracy and complexity for real-time video surveillance applications.

Gajjar, V., Khandhediya, Y., Gurnani, A.. 2017. Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). :2805–2809.

With crimes on the rise all around the world, video surveillance is becoming more important day by day. Due to the lack of human resources to monitor this increasing number of cameras manually, new computer vision algorithms to perform lower and higher level tasks are being developed. We have developed a new method incorporating the most acclaimed Histograms of Oriented Gradients, the theory of Visual Saliency and the saliency prediction model Deep Multi-Level Network to detect human beings in video sequences. Furthermore, we implemented the k - Means algorithm to cluster the HOG feature vectors of the positively detected windows and determined the path followed by a person in the video. We achieved a detection precision of 83.11% and a recall of 41.27%. We obtained these results 76.866 times faster than classification on normal images.

Rupasinghe, R. A. A., Padmasiri, D. A., Senanayake, S. G. M. P., Godaliyadda, G. M. R. I., Ekanayake, M. P. B., Wijayakulasooriya, J. V.. 2017. Dynamic clustering for event detection and anomaly identification in video surveillance. 2017 IEEE International Conference on Industrial and Information Systems (ICIIS). :1–6.

This work introduces concepts and algorithms along with a case study validating them, to enhance the event detection, pattern recognition and anomaly identification results in real life video surveillance. The motivation for the work underlies in the observation that human behavioral patterns in general continuously evolve and adapt with time, rather than being static. First, limitations in existing work with respect to this phenomena are identified. Accordingly, the notion and algorithms of Dynamic Clustering are introduced in order to overcome these drawbacks. Correspondingly, we propose the concept of maintaining two separate sets of data in parallel, namely the Normal Plane and the Anomaly Plane, to successfully achieve the task of learning continuously. The practicability of the proposed algorithms in a real life scenario is demonstrated through a case study. From the analysis presented in this work, it is evident that a more comprehensive analysis, closely following human perception can be accomplished by incorporating the proposed notions and algorithms in a video surveillance event.

Babiker, M., Khalifa, O. O., Htike, K. K., Hassan, A., Zaharadeen, M.. 2017. Automated daily human activity recognition for video surveillance using neural network. 2017 IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA). :1–5.

Surveillance video systems are gaining increasing attention in the field of computer vision due to its demands of users for the seek of security. It is promising to observe the human movement and predict such kind of sense of movements. The need arises to develop a surveillance system that capable to overcome the shortcoming of depending on the human resource to stay monitoring, observing the normal and suspect event all the time without any absent mind and to facilitate the control of huge surveillance system network. In this paper, an intelligent human activity system recognition is developed. Series of digital image processing techniques were used in each stage of the proposed system, such as background subtraction, binarization, and morphological operation. A robust neural network was built based on the human activities features database, which was extracted from the frame sequences. Multi-layer feed forward perceptron network used to classify the activities model in the dataset. The classification results show a high performance in all of the stages of training, testing and validation. Finally, these results lead to achieving a promising performance in the activity recognition rate.

Nguyen-Meidine, L. T., Granger, E., Kiran, M., Blais-Morin, L. A.. 2017. A comparison of CNN-based face and head detectors for real-time video surveillance applications. 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA). :1–7.

Detecting faces and heads appearing in video feeds are challenging tasks in real-world video surveillance applications due to variations in appearance, occlusions and complex backgrounds. Recently, several CNN architectures have been proposed to increase the accuracy of detectors, although their computational complexity can be an issue, especially for realtime applications, where faces and heads must be detected live using high-resolution cameras. This paper compares the accuracy and complexity of state-of-the-art CNN architectures that are suitable for face and head detection. Single pass and region-based architectures are reviewed and compared empirically to baseline techniques according to accuracy and to time and memory complexity on images from several challenging datasets. The viability of these architectures is analyzed with real-time video surveillance applications in mind. Results suggest that, although CNN architectures can achieve a very high level of accuracy compared to traditional detectors, their computational cost can represent a limitation for many practical real-time applications.

2018-04-02

Vhaduri, S., Poellabauer, C.. 2017. Wearable Device User Authentication Using Physiological and Behavioral Metrics. 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). :1–6.

Wearables, such as Fitbit, Apple Watch, and Microsoft Band, with their rich collection of sensors, facilitate the tracking of healthcare- and wellness-related metrics. However, the assessment of the physiological metrics collected by these devices could also be useful in identifying the user of the wearable, e.g., to detect unauthorized use or to correctly associate the data to a user if wearables are shared among multiple users. Further, researchers and healthcare providers often rely on these smart wearables to monitor research subjects and patients in their natural environments over extended periods of time. Here, it is important to associate the sensed data with the corresponding user and to detect if a device is being used by an unauthorized individual, to ensure study compliance. Existing one-time authentication approaches using credentials (e.g., passwords, certificates) or trait-based biometrics (e.g., face, fingerprints, iris, voice) might fail, since such credentials can easily be shared among users. In this paper, we present a continuous and reliable wearable-user authentication mechanism using coarse-grain minute-level physical activity (step counts) and physiological data (heart rate, calorie burn, and metabolic equivalent of task). From our analysis of 421 Fitbit users from a two-year long health study, we are able to statistically distinguish nearly 100% of the subject-pairs and to identify subjects with an average accuracy of 92.97%.

Yadav, S., Howells, G.. 2017. Analysis of ICMetrics Features/Technology for Wearable Devices IOT Sensors. 2017 Seventh International Conference on Emerging Security Technologies (EST). :175–178.

This paper investigates the suitability of employing various measurable features derived from multiple wearable devices (Apple Watch), for the generation of unique authentication and encryption keys related to the user. This technique is termed as ICMetrics. The ICMetrics technology requires identifying the suitable features in an environment for key generation most useful for online services. This paper presents an evaluation of the feasibility of identifying a unique user based on desirable feature set and activity data collected over short and long term and explores how the number of samples being factored into the ICMetrics system affects uniqueness of the key.

Cai, H., Yun, T., Hester, J., Venkatasubramanian, K. K.. 2017. Deploying Data-Driven Security Solutions on Resource-Constrained Wearable IoT Systems. 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW). :199–204.

Wearable Internet-of-Things (WIoT) environments have demonstrated great potential in a broad range of applications in healthcare and well-being. Security is essential for WIoT environments. Lack of security in WIoTs not only harms user privacy, but may also harm the user's safety. Though devices in the WIoT can be attacked in many ways, in this paper we focus on adversaries who mount what we call sensor-hijacking attacks, which prevent the constituent medical devices from accurately collecting and reporting the user's health state (e.g., reporting old or wrong physiological measurements). In this paper we outline some of our experiences in implementing a data-driven security solution for detecting sensor-hijacking attack on a secure wearable internet-of-things (WIoT) base station called the Amulet. Given the limited capabilities (computation, memory, battery power) of the Amulet platform, implementing such a security solution is quite challenging and presents several trade-offs with respect to detection accuracy and resources requirements. We conclude the paper with a list of insights into what capabilities constrained WIoT platforms should provide developers so as to make the inclusion of data-driven security primitives in such systems.

Yousefi-Azar, M., Varadharajan, V., Hamey, L., Tupakula, U.. 2017. Autoencoder-Based Feature Learning for Cyber Security Applications. 2017 International Joint Conference on Neural Networks (IJCNN). :3854–3861.

This paper presents a novel feature learning model for cyber security tasks. We propose to use Auto-encoders (AEs), as a generative model, to learn latent representation of different feature sets. We show how well the AE is capable of automatically learning a reasonable notion of semantic similarity among input features. Specifically, the AE accepts a feature vector, obtained from cyber security phenomena, and extracts a code vector that captures the semantic similarity between the feature vectors. This similarity is embedded in an abstract latent representation. Because the AE is trained in an unsupervised fashion, the main part of this success comes from appropriate original feature set that is used in this paper. It can also provide more discriminative features in contrast to other feature engineering approaches. Furthermore, the scheme can reduce the dimensionality of the features thereby signicantly minimising the memory requirements. We selected two different cyber security tasks: networkbased anomaly intrusion detection and Malware classication. We have analysed the proposed scheme with various classifiers using publicly available datasets for network anomaly intrusion detection and malware classifications. Several appropriate evaluation metrics show improvement compared to prior results.

Chen, Y., Chen, W.. 2017. Finger ECG-Based Authentication for Healthcare Data Security Using Artificial Neural Network. 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom). :1–6.

Wearable and mobile medical devices provide efficient, comfortable, and economic health monitoring, having a wide range of applications from daily to clinical scenarios. Health data security becomes a critically important issue. Electrocardiogram (ECG) has proven to be a potential biometric in human recognition over the past decade. Unlike conventional authentication methods using passwords, fingerprints, face, etc., ECG signal can not be simply intercepted, duplicated, and enables continuous identification. However, in many of the studies, algorithms developed are not suitable for practical application, which usually require long ECG data for authentication. In this work, we introduce a two-phase authentication using artificial neural network (NN) models. This algorithm enables fast authentication within only 3 seconds, meanwhile achieves reasonable performance in recognition. We test the proposed method in a controlled laboratory experiment with 50 subjects. Finger ECG signals are collected using a mobile device at different times and physical statues. At the first stage, a ``General'' NN model is constructed based on data from the cohort and used for preliminary screening, while at the second stage ``Personal'' NN models constructed from single individual's data are applied as fine-grained identification. The algorithm is tested on the whole data set, and on different sizes of subsets (5, 10, 20, 30, and 40). Results proved that the proposed method is feasible and reliable for individual authentication, having obtained average False Acceptance Rate (FAR) and False Rejection Rate (FRR) below 10% for the whole data set.

Alkhateeb, E. M. S.. 2017. Dynamic Malware Detection Using API Similarity. 2017 IEEE International Conference on Computer and Information Technology (CIT). :297–301.

Hackers create different types of Malware such as Trojans which they use to steal user-confidential information (e.g. credit card details) with a few simple commands, recent malware however has been created intelligently and in an uncontrolled size, which puts malware analysis as one of the top important subjects of information security. This paper proposes an efficient dynamic malware-detection method based on API similarity. This proposed method outperform the traditional signature-based detection method. The experiment evaluated 197 malware samples and the proposed method showed promising results of correctly identified malware.

Yusof, M., Saudi, M. M., Ridzuan, F.. 2017. A New Mobile Botnet Classification Based on Permission and API Calls. 2017 Seventh International Conference on Emerging Security Technologies (EST). :122–127.

Currently, mobile botnet attacks have shifted from computers to smartphones due to its functionality, ease to exploit, and based on financial intention. Mostly, it attacks Android due to its popularity and high usage among end users. Every day, more and more malicious mobile applications (apps) with the botnet capability have been developed to exploit end users' smartphones. Therefore, this paper presents a new mobile botnet classification based on permission and Application Programming Interface (API) calls in the smartphone. This classification is developed using static analysis in a controlled lab environment and the Drebin dataset is used as the training dataset. 800 apps from the Google Play Store have been chosen randomly to test the proposed classification. As a result, 16 permissions and 31 API calls that are most related with mobile botnet have been extracted using feature selection and later classified and tested using machine learning algorithms. The experimental result shows that the Random Forest Algorithm has achieved the highest detection accuracy of 99.4% with the lowest false positive rate of 16.1% as compared to other machine learning algorithms. This new classification can be used as the input for mobile botnet detection for future work, especially for financial matters.

2018-03-26

Pallaprolu, S. C., Sankineni, R., Thevar, M., Karabatis, G., Wang, J.. 2017. Zero-Day Attack Identification in Streaming Data Using Semantics and Spark. 2017 IEEE International Congress on Big Data (BigData Congress). :121–128.

Intrusion Detection Systems (IDS) have been in existence for many years now, but they fall short in efficiently detecting zero-day attacks. This paper presents an organic combination of Semantic Link Networks (SLN) and dynamic semantic graph generation for the on the fly discovery of zero-day attacks using the Spark Streaming platform for parallel detection. In addition, a minimum redundancy maximum relevance (MRMR) feature selection algorithm is deployed to determine the most discriminating features of the dataset. Compared to previous studies on zero-day attack identification, the described method yields better results due to the semantic learning and reasoning on top of the training data and due to the use of collaborative classification methods. We also verified the scalability of our method in a distributed environment.

2018-03-19

Ditzler, G., Prater, A.. 2017. Fine Tuning Lasso in an Adversarial Environment against Gradient Attacks. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.

Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.

Wang, A., Mohaisen, A., Chen, S.. 2017. An Adversary-Centric Behavior Modeling of DDoS Attacks. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). :1126–1136.

Distributed Denial of Service (DDoS) attacks are some of the most persistent threats on the Internet today. The evolution of DDoS attacks calls for an in-depth analysis of those attacks. A better understanding of the attackers' behavior can provide insights to unveil patterns and strategies utilized by attackers. The prior art on the attackers' behavior analysis often falls in two aspects: it assumes that adversaries are static, and makes certain simplifying assumptions on their behavior, which often are not supported by real attack data. In this paper, we take a data-driven approach to designing and validating three DDoS attack models from temporal (e.g., attack magnitudes), spatial (e.g., attacker origin), and spatiotemporal (e.g., attack inter-launching time) perspectives. We design these models based on the analysis of traces consisting of more than 50,000 verified DDoS attacks from industrial mitigation operations. Each model is also validated by testing its effectiveness in accurately predicting future DDoS attacks. Comparisons against simple intuitive models further show that our models can more accurately capture the essential features of DDoS attacks.

Chen, Z., Tondi, B., Li, X., Ni, R., Zhao, Y., Barni, M.. 2017. A Gradient-Based Pixel-Domain Attack against SVM Detection of Global Image Manipulations. 2017 IEEE Workshop on Information Forensics and Security (WIFS). :1–6.

We present a gradient-based attack against SVM-based forensic techniques relying on high-dimensional SPAM features. As opposed to prior work, the attack works directly in the pixel domain even if the relationship between pixel values and SPAM features can not be inverted. The proposed method relies on the estimation of the gradient of the SVM output with respect to pixel values, however it departs from gradient descent methodology due to the necessity of preserving the integer nature of pixels and to reduce the effect of the attack on image quality. A fast algorithm to estimate the gradient is also introduced to reduce the complexity of the attack. We tested the proposed attack against SVM detection of histogram stretching, adaptive histogram equalization and median filtering. In all cases the attack succeeded in inducing a decision error with a very limited distortion, the PSNR between the original and the attacked images ranging from 50 to 70 dBs. The attack is also effective in the case of attacks with Limited Knowledge (LK) when the SVM used by the attacker is trained on a different dataset with respect to that used by the analyst.

Liu, B., Zhu, Z., Yang, Y.. 2017. Convolutional Neural Networks Based Scale-Adaptive Kernelized Correlation Filter for Robust Visual Object Tracking. 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC). :423–428.

Visual object tracking is challenging when the object appearances occur significant changes, such as scale change, background clutter, occlusion, and so on. In this paper, we crop different sizes of multiscale templates around object and input these multiscale templates into network to pretrain the network adaptive the size change of tracking object. Different from previous the tracking method based on deep convolutional neural network (CNN), we exploit deep Residual Network (ResNet) to offline train a multiscale object appearance model on the ImageNet, and then the features from pretrained network are transferred into tracking tasks. Meanwhile, the proposed method combines the multilayer convolutional features, it is robust to disturbance, scale change, and occlusion. In addition, we fuse multiscale search strategy into three kernelized correlation filter, which strengthens the ability of adaptive scale change of object. Unlike the previous methods, we directly learn object appearance change by integrating multiscale templates into the ResNet. We compared our method with other CNN-based or correlation filter tracking methods, the experimental results show that our tracking method is superior to the existing state-of-the-art tracking method on Object Tracking Benchmark (OTB-2015) and Visual Object Tracking Benchmark (VOT-2015).

Rocha, A., Scheirer, W. J., Forstall, C. W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A. R. B., Stamatatos, E.. 2017. Authorship Attribution for Social Media Forensics. IEEE Transactions on Information Forensics and Security. 12:5–33.

The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.

Thankaraj, A., Nair, A. J., Vasudevan, N., Pathari, V.. 2017. Misclassifications: The Missing Link. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :1719–1722.

The notion of style is pivotal to literature. The choice of a certain writing style moulds and enhances the overall character of a book. Stylometry uses statistical methods to analyze literary style. This work aims to build a recommendation system based on the similarity in stylometric cues of various authors. The problem at hand is in close proximity to the author attribution problem. It follows a supervised approach with an initial corpus of books labelled with their respective authors as training set and generate recommendations based on the misclassified books. Results in book similarity are substantiated by domain experts.

Shahid, U., Farooqi, S., Ahmad, R., Shafiq, Z., Srinivasan, P., Zaffar, F.. 2017. Accurate Detection of Automatically Spun Content via Stylometric Analysis. 2017 IEEE International Conference on Data Mining (ICDM). :425–434.

Spammers use automated content spinning techniques to evade plagiarism detection by search engines. Text spinners help spammers in evading plagiarism detectors by automatically restructuring sentences and replacing words or phrases with their synonyms. Prior work on spun content detection relies on the knowledge about the dictionary used by the text spinning software. In this work, we propose an approach to detect spun content and its seed without needing the text spinner's dictionary. Our key idea is that text spinners introduce stylometric artifacts that can be leveraged for detecting spun documents. We implement and evaluate our proposed approach on a corpus of spun documents that are generated using a popular text spinning software. The results show that our approach can not only accurately detect whether a document is spun but also identify its source (or seed) document - all without needing the dictionary used by the text spinner.

Faust, C., Dozier, G., Xu, J., King, M. C.. 2017. Adversarial Authorship, Interactive Evolutionary Hill-Climbing, and Author CAAT-III. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–8.

We are currently witnessing the development of increasingly effective author identification systems (AISs) that have the potential to track users across the internet based on their writing style. In this paper, we discuss two methods for providing user anonymity with respect to writing style: Adversarial Stylometry and Adversarial Authorship. With Adversarial Stylometry, a user attempts to obfuscate their writing style by consciously altering it. With Adversarial Authorship, a user can select an author cluster target (ACT) and write toward this target with the intention of subverting an AIS so that the user's writing sample will be misclassified Our results show that Adversarial Authorship via interactive evolutionary hill-climbing outperforms Adversarial Stylometry.

2018-03-05

Chen, Q., Bridges, R. A.. 2017. Automated Behavioral Analysis of Malware: A Case Study of WannaCry Ransomware. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). :454–460.

Ransomware, a class of self-propagating malware that uses encryption to hold the victims' data ransom, has emerged in recent years as one of the most dangerous cyber threats, with widespread damage; e.g., zero-day ransomware WannaCry has caused world-wide catastrophe, from knocking U.K. National Health Service hospitals offline to shutting down a Honda Motor Company in Japan [1]. Our close collaboration with security operations of large enterprises reveals that defense against ransomware relies on tedious analysis from high-volume systems logs of the first few infections. Sandbox analysis of freshly captured malware is also commonplace in operation. We introduce a method to identify and rank the most discriminating ransomware features from a set of ambient (non-attack) system logs and at least one log stream containing both ambient and ransomware behavior. These ranked features reveal a set of malware actions that are produced automatically from system logs, and can help automate tedious manual analysis. We test our approach using WannaCry and two polymorphic samples by producing logs with Cuckoo Sandbox during both ambient, and ambient plus ransomware executions. Our goal is to extract the features of the malware from the logs with only knowledge that malware was present. We compare outputs with a detailed analysis of WannaCry allowing validation of the algorithm's feature extraction and provide analysis of the method's robustness to variations of input data—changing quality/quantity of ambient data and testing polymorphic ransomware. Most notably, our patterns are accurate and unwavering when generated from polymorphic WannaCry copies, on which 63 (of 63 tested) antivirus (AV) products fail.