Biblio

List
Filter

Found 350 results

Filters: Keyword is data mining [Clear All Filters]

2018-04-02

Alkhateeb, E. M. S.. 2017. Dynamic Malware Detection Using API Similarity. 2017 IEEE International Conference on Computer and Information Technology (CIT). :297–301.

Hackers create different types of Malware such as Trojans which they use to steal user-confidential information (e.g. credit card details) with a few simple commands, recent malware however has been created intelligently and in an uncontrolled size, which puts malware analysis as one of the top important subjects of information security. This paper proposes an efficient dynamic malware-detection method based on API similarity. This proposed method outperform the traditional signature-based detection method. The experiment evaluated 197 malware samples and the proposed method showed promising results of correctly identified malware.

2018-06-20

Lee, Y., Choi, S. S., Choi, J., Song, J.. 2017. A Lightweight Malware Classification Method Based on Detection Results of Anti-Virus Software. 2017 12th Asia Joint Conference on Information Security (AsiaJCIS). :5–9.

With the development of cyber threats on the Internet, the number of malware, especially unknown malware, is also dramatically increasing. Since all of malware cannot be analyzed by analysts, it is very important to find out new malware that should be analyzed by them. In order to cope with this issue, the existing approaches focused on malware classification using static or dynamic analysis results of malware. However, the static and the dynamic analyses themselves are also too costly and not easy to build the isolated, secure and Internet-like analysis environments such as sandbox. In this paper, we propose a lightweight malware classification method based on detection results of anti-virus software. Since the proposed method can reduce the volume of malware that should be analyzed by analysts, it can be used as a preprocess for in-depth analysis of malware. The experimental showed that the proposed method succeeded in classification of 1,000 malware samples into 187 unique groups. This means that 81% of the original malware samples do not need to analyze by analysts.

2018-05-24

Huyn, Joojay. 2017. A Scalable Real-Time Framework for DDoS Traffic Monitoring and Characterization. Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. :265–266.

Volumetric DDoS attacks continue to inflict serious damage. Many proposed defenses for mitigating such attacks assume that a monitoring system has already detected the attack. However, many proposed DDoS monitoring systems do not focus on efficiently analyzing high volume network traffic to provide important characterizations of the attack in real-time to downstream traffic filtering systems. We propose a scalable real-time framework for an effective volumetric DDoS monitoring system that leverages modern big data technologies for streaming analytics of high volume network traffic to accurately detect and characterize attacks.

2018-11-14

Teoh, T. T., Nguwi, Y. Y., Elovici, Y., Cheung, N. M., Ng, W. L.. 2017. Analyst Intuition Based Hidden Markov Model on High Speed, Temporal Cyber Security Big Data. 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). :2080–2083.

Hidden Markov Models (HMM) are probabilistic models that can be used for forecasting time series data. It has seen success in various domains like finance [1-5], bioinformatics [6-8], healthcare [9-11], agriculture [12-14], artificial intelligence[15-17]. However, the use of HMM in cyber security found to date is numbered. We believe the properties of HMM being predictive, probabilistic, and its ability to model different naturally occurring states form a good basis to model cyber security data. It is hence the motivation of this work to provide the initial results of our attempts to predict security attacks using HMM. A large network datasets representing cyber security attacks have been used in this work to establish an expert system. The characteristics of attacker's IP addresses can be extracted from our integrated datasets to generate statistical data. The cyber security expert provides the weight of each attribute and forms a scoring system by annotating the log history. We applied HMM to distinguish between a cyber security attack, unsure and no attack by first breaking the data into 3 cluster using Fuzzy K mean (FKM), then manually label a small data (Analyst Intuition) and finally use HMM state-based approach. By doing so, our results are very encouraging as compare to finding anomaly in a cyber security log, which generally results in creating huge amount of false detection.

Teoh, T. T., Zhang, Y., Nguwi, Y. Y., Elovici, Y., Ng, W. L.. 2017. Analyst Intuition Inspired High Velocity Big Data Analysis Using PCA Ranked Fuzzy K-Means Clustering with Multi-Layer Perceptron (MLP) to Obviate Cyber Security Risk. 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). :1790–1793.

The growing prevalence of cyber threats in the world are affecting every network user. Numerous security monitoring systems are being employed to protect computer networks and resources from falling victim to cyber-attacks. There is a pressing need to have an efficient security monitoring system to monitor the large network datasets generated in this process. A large network datasets representing Malware attacks have been used in this work to establish an expert system. The characteristics of attacker's IP addresses can be extracted from our integrated datasets to generate statistical data. The cyber security expert provides to the weight of each attribute and forms a scoring system by annotating the log history. We adopted a special semi supervise method to classify cyber security log into attack, unsure and no attack by first breaking the data into 3 cluster using Fuzzy K mean (FKM), then manually label a small data (Analyst Intuition) and finally train the neural network classifier multilayer perceptron (MLP) base on the manually labelled data. By doing so, our results is very encouraging as compare to finding anomaly in a cyber security log, which generally results in creating huge amount of false detection. The method of including Artificial Intelligence (AI) and Analyst Intuition (AI) is also known as AI2. The classification results are encouraging in segregating the types of attacks.

2018-09-05

Teusner, R., Matthies, C., Giese, P.. 2017. Should I Bug You? Identifying Domain Experts in Software Projects Using Code Complexity Metrics 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). :418–425.

In any sufficiently complex software system there are experts, having a deeper understanding of parts of the system than others. However, it is not always clear who these experts are and which particular parts of the system they can provide help with. We propose a framework to elicit the expertise of developers and recommend experts by analyzing complexity measures over time. Furthermore, teams can detect those parts of the software for which currently no, or only few experts exist and take preventive actions to keep the collective code knowledge and ownership high. We employed the developed approach at a medium-sized company. The results were evaluated with a survey, comparing the perceived and the computed expertise of developers. We show that aggregated code metrics can be used to identify experts for different software components. The identified experts were rated as acceptable candidates by developers in over 90% of all cases.

Li, C., Palanisamy, B., Joshi, J.. 2017. Differentially Private Trajectory Analysis for Points-of-Interest Recommendation. 2017 IEEE International Congress on Big Data (BigData Congress). :49–56.

Ubiquitous deployment of low-cost mobile positioning devices and the widespread use of high-speed wireless networks enable massive collection of large-scale trajectory data of individuals moving on road networks. Trajectory data mining finds numerous applications including understanding users' historical travel preferences and recommending places of interest to new visitors. Privacy-preserving trajectory mining is an important and challenging problem as exposure of sensitive location information in the trajectories can directly invade the location privacy of the users associated with the trajectories. In this paper, we propose a differentially private trajectory analysis algorithm for points-of-interest recommendation to users that aims at maximizing the accuracy of the recommendation results while protecting the privacy of the exposed trajectories with differential privacy guarantees. Our algorithm first transforms the raw trajectory dataset into a bipartite graph with nodes representing the users and the points-of-interest and the edges representing the visits made by the users to the locations, and then extracts the association matrix representing the bipartite graph to inject carefully calibrated noise to meet έ-differential privacy guarantees. A post-processing of the perturbed association matrix is performed to suppress noise prior to performing a Hyperlink-Induced Topic Search (HITS) on the transformed data that generates an ordered list of recommended points-of-interest. Extensive experiments on a real trajectory dataset show that our algorithm is efficient, scalable and demonstrates high recommendation accuracy while meeting the required differential privacy guarantees.

2018-05-24

Tosh, D. K., Shetty, S., Liang, X., Kamhoua, C. A., Kwiat, K. A., Njilla, L.. 2017. Security Implications of Blockchain Cloud with Analysis of Block Withholding Attack. 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). :458–467.

The blockchain technology has emerged as an attractive solution to address performance and security issues in distributed systems. Blockchain's public and distributed peer-to-peer ledger capability benefits cloud computing services which require functions such as, assured data provenance, auditing, management of digital assets, and distributed consensus. Blockchain's underlying consensus mechanism allows to build a tamper-proof environment, where transactions on any digital assets are verified by set of authentic participants or miners. With use of strong cryptographic methods, blocks of transactions are chained together to enable immutability on the records. However, achieving consensus demands computational power from the miners in exchange of handsome reward. Therefore, greedy miners always try to exploit the system by augmenting their mining power. In this paper, we first discuss blockchain's capability in providing assured data provenance in cloud and present vulnerabilities in blockchain cloud. We model the block withholding (BWH) attack in a blockchain cloud considering distinct pool reward mechanisms. BWH attack provides rogue miner ample resources in the blockchain cloud for disrupting honest miners' mining efforts, which was verified through simulations.

2017-12-20

Ishio, T., Sakaguchi, Y., Ito, K., Inoue, K.. 2017. Source File Set Search for Clone-and-Own Reuse Analysis. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). :257–268.

Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the cloned component is different from the original one. Although developers may record the original version information in a version control system and/or directory names, such information is often either unavailable or incomplete. In this research, we propose a code search method that takes as input a set of source files and extracts all the components including similar files from a software ecosystem (i.e., a collection of existing versions of software packages). Our method employs an efficient file similarity computation using b-bit minwise hashing technique. We use an aggregated file similarity for ranking components. To evaluate the effectiveness of this tool, we analyzed 75 cloned components in Firefox and Android source code. The tool took about two hours to report the original components from 10 million files in Debian GNU/Linux packages. Recall of the top-five components in the extracted lists is 0.907, while recall of a baseline using SHA-1 file hash is 0.773, according to the ground truth recorded in the source code repositories.

2018-01-23

Kolosnjaji, B., Eraisha, G., Webster, G., Zarras, A., Eckert, C.. 2017. Empowering convolutional networks for malware classification and analysis. 2017 International Joint Conference on Neural Networks (IJCNN). :3838–3845.

Performing large-scale malware classification is increasingly becoming a critical step in malware analytics as the number and variety of malware samples is rapidly growing. Statistical machine learning constitutes an appealing method to cope with this increase as it can use mathematical tools to extract information out of large-scale datasets and produce interpretable models. This has motivated a surge of scientific work in developing machine learning methods for detection and classification of malicious executables. However, an optimal method for extracting the most informative features for different malware families, with the final goal of malware classification, is yet to be found. Fortunately, neural networks have evolved to the state that they can surpass the limitations of other methods in terms of hierarchical feature extraction. Consequently, neural networks can now offer superior classification accuracy in many domains such as computer vision and natural language processing. In this paper, we transfer the performance improvements achieved in the area of neural networks to model the execution sequences of disassembled malicious binaries. We implement a neural network that consists of convolutional and feedforward neural constructs. This architecture embodies a hierarchical feature extraction approach that combines convolution of n-grams of instructions with plain vectorization of features derived from the headers of the Portable Executable (PE) files. Our evaluation results demonstrate that our approach outperforms baseline methods, such as simple Feedforward Neural Networks and Support Vector Machines, as we achieve 93% on precision and recall, even in case of obfuscations in the data.

2018-03-19

Ditzler, G., Prater, A.. 2017. Fine Tuning Lasso in an Adversarial Environment against Gradient Attacks. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.

Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.

2018-09-28

Tsou, Y., Chen, H., Chen, J., Huang, Y., Wang, P.. 2017. Differential privacy-based data de-identification protection and risk evaluation system. 2017 International Conference on Information and Communication Technology Convergence (ICTC). :416–421.

As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.

2018-08-06

L. Chen, Y. Ye, T. Bourlai. 2017. Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense. 2017 European Intelligence and Security Informatics Conference (EISIC). :99-106.

Since malware has caused serious damages and evolving threats to computer and Internet users, its detection is of great interest to both anti-malware industry and researchers. In recent years, machine learning-based systems have been successfully deployed in malware detection, in which different kinds of classifiers are built based on the training samples using different feature representations. Unfortunately, as classifiers become more widely deployed, the incentive for defeating them increases. In this paper, we explore the adversarial machine learning in malware detection. In particular, on the basis of a learning-based classifier with the input of Windows Application Programming Interface (API) calls extracted from the Portable Executable (PE) files, we present an effective evasion attack model (named EvnAttack) by considering different contributions of the features to the classification problem. To be resilient against the evasion attack, we further propose a secure-learning paradigm for malware detection (named SecDefender), which not only adopts classifier retraining technique but also introduces the security regularization term which considers the evasion cost of feature manipulations by attackers to enhance the system security. Comprehensive experimental results on the real sample collections from Comodo Cloud Security Center demonstrate the effectiveness of our proposed methods.

2018-01-23

Kilgallon, S., Rosa, L. De La, Cavazos, J.. 2017. Improving the effectiveness and efficiency of dynamic malware analysis with machine learning. 2017 Resilience Week (RWS). :30–36.

As the malware threat landscape is constantly evolving and over one million new malware strains are being generated every day [1], early automatic detection of threats constitutes a top priority of cybersecurity research, and amplifies the need for more advanced detection and classification methods that are effective and efficient. In this paper, we present the application of machine learning algorithms to predict the length of time malware should be executed in a sandbox to reveal its malicious intent. We also introduce a novel hybrid approach to malware classification based on static binary analysis and dynamic analysis of malware. Static analysis extracts information from a binary file without executing it, and dynamic analysis captures the behavior of malware in a sandbox environment. Our experimental results show that by turning the aforementioned problems into machine learning problems, it is possible to get an accuracy of up to 90% on the prediction of the malware analysis run time and up to 92% on the classification of malware families.

2023-03-31

Rousseaux, Francis, Saurel, Pierre. 2016. The legal debate about personal data privacy at a time of big data mining and searching: Making big data researchers cooperating with lawmakers to find solutions for the future. 2016 First IEEE International Conference on Computer Communication and the Internet (ICCCI). :354–357.

At the same time as Big Data technologies are being constantly refined, the legislation relating to data privacy is changing. The invalidation by the Court of Justice of the European Union on October 6, 2015, of the agreement known as “Safe Harbor”, negotiated by the European Commission on behalf of the European Union with the United States has two consequences. The first is to announce its replacement by a new, still fragile, program, the “Privacy Shield”, which isn't yet definitive and which could also later be repealed by the Court of Justice of the European Union. For example, we are expecting to hear the opinion in mid-April 2016 of the group of data protection authorities for the various states of the European Union, known as G29. The second is to mobilize the Big Data community to take control of the question of data privacy management and to put in place an adequate internal program.

2017-11-03

Baravalle, A., Lopez, M. S., Lee, S. W.. 2016. Mining the Dark Web: Drugs and Fake Ids. 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). :350–356.

In the last years, governmental bodies have been futilely trying to fight against dark web marketplaces. Shortly after the closing of "The Silk Road" by the FBI and Europol in 2013, new successors have been established. Through the combination of cryptocurrencies and nonstandard communication protocols and tools, agents can anonymously trade in a marketplace for illegal items without leaving any record. This paper presents a research carried out to gain insights on the products and services sold within one of the larger marketplaces for drugs, fake ids and weapons on the Internet, Agora. Our work sheds a light on the nature of the market, there is a clear preponderance of drugs, which accounts for nearly 80% of the total items on sale. The ready availability of counterfeit documents, while they make up for a much smaller percentage of the market, raises worries. Finally, the role of organized crime within Agora is discussed and presented.

2017-11-27

Parate, M., Tajane, S., Indi, B.. 2016. Assessment of System Vulnerability for Smart Grid Applications. 2016 IEEE International Conference on Engineering and Technology (ICETECH). :1083–1088.

The smart grid is an electrical grid that has a duplex communication. This communication is between the utility and the consumer. Digital system, automation system, computers and control are the various systems of Smart Grid. It finds applications in a wide variety of systems. Some of its applications have been designed to reduce the risk of power system blackout. Dynamic vulnerability assessment is done to identify, quantify, and prioritize the vulnerabilities in a system. This paper presents a novel approach for classifying the data into one of the two classes called vulnerable or non-vulnerable by carrying out Dynamic Vulnerability Assessment (DVA) based on some data mining techniques such as Multichannel Singular Spectrum Analysis (MSSA), and Principal Component Analysis (PCA), and a machine learning tool such as Support Vector Machine Classifier (SVM-C) with learning algorithms that can analyze data. The developed methodology is tested in the IEEE 57 bus, where the cause of vulnerability is transient instability. The results show that data mining tools can effectively analyze the patterns of the electric signals, and SVM-C can use those patterns for analyzing the system data as vulnerable or non-vulnerable and determines System Vulnerability Status.

2017-09-15

Dhulipala, Laxman, Kabiljo, Igor, Karrer, Brian, Ottaviano, Giuseppe, Pupyrev, Sergey, Shalita, Alon. 2016. Compressing Graphs and Indexes with Recursive Graph Bisection. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1535–1544.

Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

2017-03-07

Agnihotri, Lalitha, Mojarad, Shirin, Lewkow, Nicholas, Essa, Alfred. 2016. Educational Data Mining with Python and Apache Spark: A Hands-on Tutorial. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. :507–508.

Enormous amount of educational data has been accumulated through Massive Open Online Courses (MOOCs), as well as commercial and non-commercial learning platforms. This is in addition to the educational data released by US government since 2012 to facilitate disruption in education by making data freely available. The high volume, variety and velocity of collected data necessitate use of big data tools and storage systems such as distributed databases for storage and Apache Spark for analysis. This tutorial will introduce researchers and faculty to real-world applications involving data mining and predictive analytics in learning sciences. In addition, the tutorial will introduce statistics required to validate and accurately report results. Topics will cover how big data is being used to transform education. Specifically, we will demonstrate how exploratory data analysis, data mining, predictive analytics, machine learning, and visualization techniques are being applied to educational big data to improve learning and scale insights driven from millions of student's records. The tutorial will be held over a half day and will be hands on with pre-posted material. Due to the interdisciplinary nature of work, the tutorial appeals to researchers from a wide range of backgrounds including big data, predictive analytics, learning sciences, educational data mining, and in general, those interested in how big data analytics can transform learning. As a prerequisite, attendees are required to have familiarity with at least one programming language.

2017-09-19

Jahan, Thanveer, Narsimha, G., Rao, C. V. Guru. 2016. Multiplicative Data Perturbation Using Fuzzy Logic in Preserving Privacy. Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies. :38:1–38:5.

In Data mining is the method of extracting the knowledge from huge amount of data and interesting patterns. With the rapid increase of data storage, cloud and service-based computing, the risk of misuse of data has become a major concern. Protecting sensitive information present in the data is crucial and critical. Data perturbation plays an important role in privacy preserving data mining. The major challenge of privacy preserving is to concentrate on factors to achieve privacy guarantee and data utility. We propose a data perturbation method that perturbs the data using fuzzy logic and random rotation. It also describes aspects of comparable level of quality over perturbed data and original data. The comparisons are illustrated on different multivariate datasets. Experimental study has proved the model is better in achieving privacy guarantee of data, as well as data utility.

2017-03-07

Urbanowicz, Ryan J., Olson, Randal S., Moore, Jason H.. 2016. Pareto Inspired Multi-objective Rule Fitness for Adaptive Rule-based Machine Learning. Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. :1403–1403.

Learning classifier systems (LCSs) are rule-based evolutionary algorithms uniquely suited to classification and data mining in complex, multi-factorial, and heterogeneous problems. LCS rule fitness is commonly based on accuracy, but this metric alone is not ideal for assessing global rule `value' in noisy problem domains, and thus impedes effective knowledge extraction. Multi-objective fitness functions are promising but rely on knowledge of how to weigh objective importance. Prior knowledge would be unavailable in most real-world problems. The Pareto-front concept offers a strategy for multi-objective machine learning that is agnostic to objective importance. We propose a Pareto-inspired multi-objective rule fitness (PIMORF) for LCS, and combine it with a complimentary rule-compaction approach (SRC). We implemented these strategies in ExSTraCS, a successful supervised LCS and evaluated performance over an array of complex simulated noisy and clean problems (i.e. genetic and multiplexer) that each concurrently model pure interaction effects and heterogeneity. While evaluation over multiple performance metrics yielded mixed results, this work represents an important first step towards efficiently learning complex problem spaces without the advantage of prior problem knowledge. Overall the results suggest that PIMORF paired with SRC improved rule set interpretability, particularly with regard to heterogeneous patterns.

2017-08-02

Wu, Zhaoming, Aggarwal, Charu C., Sun, Jimeng. 2016. The Troll-Trust Model for Ranking in Signed Networks. Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. :447–456.

Signed social networks have become increasingly important in recent years because of the ability to model trust-based relationships in review sites like Slashdot, Epinions, and Wikipedia. As a result, many traditional network mining problems have been re-visited in the context of networks in which signs are associated with the links. Examples of such problems include community detection, link prediction, and low rank approximation. In this paper, we will examine the problem of ranking nodes in signed networks. In particular, we will design a ranking model, which has a clear physical interpretation in terms of the sign of the edges in the network. Specifically, we propose the Troll-Trust model that models the probability of trustworthiness of individual data sources as an interpretation for the underlying ranking values. We will show the advantages of this approach over a variety of baselines.

2016-04-25

Eric Yuan, Sam Malek. 2016. Mining Software Component Interactions to Detect Security Threats at the Architectural Level. 13th Working IEEE/IFIP Conference on Software Architecture (WICSA 2016).

Conventional security mechanisms at network, host, and source code levels are no longer sufficient in detecting and responding to increasingly dynamic and sophisticated cyber threats today. Detecting anomalous behavior at the architectural level can help better explain the intent of the threat and strengthen overall system security posture. To that end, we present a framework that mines software component interactions from system execution history and applies a detection algorithm to identify anomalous behavior. The framework uses unsupervised learning at runtime, can perform fast anomaly detection “on the fly”, and can quickly adapt to system load fluctuations and user behavior shifts. Our evaluation of the approach against a real Emergency Deployment System has demonstrated very promising results, showing the framework can effectively detect covert attacks, including insider threats, that may be easily missed by traditional intrusion detection methods.

2017-11-03

Park, A. J., Beck, B., Fletche, D., Lam, P., Tsang, H. H.. 2016. Temporal analysis of radical dark web forum users. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :880–883.

Extremist groups have turned to the Internet and social media sites as a means of sharing information amongst one another. This research study analyzes forum posts and finds people who show radical tendencies through the use of natural language processing and sentiment analysis. The forum data being used are from six Islamic forums on the Dark Web which are made available for security research. This research project uses a POS tagger to isolate keywords and nouns that can be utilized with the sentiment analysis program. Then the sentiment analysis program determines the polarity of the post. The post is scored as either positive or negative. These scores are then divided into monthly radical scores for each user. Once these time clusters are mapped, the change in opinions of the users over time may be interpreted as rising or falling levels of radicalism. Each user is then compared on a timeline to other radical users and events to determine possible connections or relationships. The ability to analyze a forum for an overall change in attitude can be an indicator of unrest and possible radical actions or terrorism.

2017-09-15

Alabdulmohsin, Ibrahim, Han, YuFei, Shen, Yun, Zhang, XiangLiang. 2016. Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2395–2400.

Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

Cyber-Physical Systems Virtual Organization

Read-only archive of site from September 29, 2023.

Biblio