Biblio

Found 350 results

Filters: Keyword is data mining  [Clear All Filters]
2019-03-04
Gugelmann, D., Sommer, D., Lenders, V., Happe, M., Vanbever, L..  2018.  Screen watermarking for data theft investigation and attribution. 2018 10th International Conference on Cyber Conflict (CyCon). :391–408.
Organizations not only need to defend their IT systems against external cyber attackers, but also from malicious insiders, that is, agents who have infiltrated an organization or malicious members stealing information for their own profit. In particular, malicious insiders can leak a document by simply opening it and taking pictures of the document displayed on the computer screen with a digital camera. Using a digital camera allows a perpetrator to easily avoid a log trail that results from using traditional communication channels, such as sending the document via email. This makes it difficult to identify and prove the identity of the perpetrator. Even a policy prohibiting the use of any device containing a camera cannot eliminate this threat since tiny cameras can be hidden almost everywhere. To address this leakage vector, we propose a novel screen watermarking technique that embeds hidden information on computer screens displaying text documents. The watermark is imperceptible during regular use, but can be extracted from pictures of documents shown on the screen, which allows an organization to reconstruct the place and time of the data leak from recovered leaked pictures. Our approach takes advantage of the fact that the human eye is less sensitive to small luminance changes than digital cameras. We devise a symbol shape that is invisible to the human eye, but still robust to the image artifacts introduced when taking pictures. We complement this symbol shape with an error correction coding scheme that can handle very high bit error rates and retrieve watermarks from cropped and compressed pictures. We show in an experimental user study that our screen watermarks are not perceivable by humans and analyze the robustness of our watermarks against image modifications.
2020-05-08
Katasev, Alexey S., Emaletdinova, Lilia Yu., Kataseva, Dina V..  2018.  Neural Network Spam Filtering Technology. 2018 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). :1—5.

In this paper we solve the problem of neural network technology development for e-mail messages classification. We analyze basic methods of spam filtering such as a sender IP-address analysis, spam messages repeats detection and the Bayesian filtering according to words. We offer the neural network technology for solving this problem because the neural networks are universal approximators and effective in addressing the problems of classification. Also, we offer the scheme of this technology for e-mail messages “spam”/“not spam” classification. The creation of effective neural network model of spam filtering is performed within the databases knowledge discovery technology. For this training set is formed, the neural network model is trained, its value and classifying ability are estimated. The experimental studies have shown that a developed artificial neural network model is adequate and it can be effectively used for the e-mail messages classification. Thus, in this paper we have shown the possibility of the effective neural network model use for the e-mail messages filtration and have shown a scheme of artificial neural network model use as a part of the e-mail spam filtering intellectual system.

2020-10-12
Chung, Wingyan, Liu, Jinwei, Tang, Xinlin, Lai, Vincent S. K..  2018.  Extracting Textual Features of Financial Social Media to Detect Cognitive Hacking. 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). :244–246.
Social media are increasingly reflecting and influencing the behavior of human and financial market. Cognitive hacking leverages the influence of social media to spread deceptive information with an intent to gain abnormal profits illegally or to cause losses. Measuring the information content in financial social media can be useful for identifying these attacks. In this paper, we developed an approach to identifying social media features that correlate with abnormal returns of the stocks of companies vulnerable to be targets of cognitive hacking. To test the approach, we collected price data and 865,289 social media messages on four technology companies from July 2017 to June 2018, and extracted features that contributed to abnormal stock movements. Preliminary results show that terms that are simple, motivate actions, incite emotion, and uses exaggeration are ranked high in the features of messages associated with abnormal price movements. We also provide selected messages to illustrate the use of these features in potential cognitive hacking attacks.
2019-12-16
Lin, Jerry Chun-Wei, Zhang, Yuyu, Chen, Chun-Hao, Wu, Jimmy Ming-Tai, Chen, Chien-Ming, Hong, Tzung-Pei.  2018.  A Multiple Objective PSO-Based Approach for Data Sanitization. 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI). :148–151.
In this paper, a multi-objective particle swarm optimization (MOPSO)-based framework is presented to find the multiple solutions rather than a single one. The presented grid-based algorithm is used to assign the probability of the non-dominated solution for next iteration. Based on the designed algorithm, it is unnecessary to pre-define the weights of the side effects for evaluation but the non-dominated solutions can be discovered as an alternative way for data sanitization. Extensive experiments are carried on two datasets to show that the designed grid-based algorithm achieves good performance than the traditional single-objective evolution algorithms.
2019-03-04
Husari, G., Niu, X., Chu, B., Al-Shaer, E..  2018.  Using Entropy and Mutual Information to Extract Threat Actions from Cyber Threat Intelligence. 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). :1–6.
With the rapid growth of the cyber attacks, cyber threat intelligence (CTI) sharing becomes essential for providing advance threat notice and enabling timely response to cyber attacks. Our goal in this paper is to develop an approach to extract low-level cyber threat actions from publicly available CTI sources in an automated manner to enable timely defense decision making. Specifically, we innovatively and successfully used the metrics of entropy and mutual information from Information Theory to analyze the text in the cybersecurity domain. Combined with some basic NLP techniques, our framework, called ActionMiner has achieved higher precision and recall than the state-of-the-art Stanford typed dependency parser, which usually works well in general English but not cybersecurity texts.
2019-01-21
Ayoade, G., Chandra, S., Khan, L., Hamlen, K., Thuraisingham, B..  2018.  Automated Threat Report Classification over Multi-Source Data. 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC). :236–245.

With an increase in targeted attacks such as advanced persistent threats (APTs), enterprise system defenders require comprehensive frameworks that allow them to collaborate and evaluate their defense systems against such attacks. MITRE has developed a framework which includes a database of different kill-chains, tactics, techniques, and procedures that attackers employ to perform these attacks. In this work, we leverage natural language processing techniques to extract attacker actions from threat report documents generated by different organizations and automatically classify them into standardized tactics and techniques, while providing relevant mitigation advisories for each attack. A naïve method to achieve this is by training a machine learning model to predict labels that associate the reports with relevant categories. In practice, however, sufficient labeled data for model training is not always readily available, so that training and test data come from different sources, resulting in bias. A naïve model would typically underperform in such a situation. We address this major challenge by incorporating an importance weighting scheme called bias correction that efficiently utilizes available labeled data, given threat reports, whose categories are to be automatically predicted. We empirically evaluated our approach on 18,257 real-world threat reports generated between year 2000 and 2018 from various computer security organizations to demonstrate its superiority by comparing its performance with an existing approach.

2018-11-19
Chelaramani, S., Jha, A., Namboodiri, A. M..  2018.  Cross-Modal Style Transfer. 2018 25th IEEE International Conference on Image Processing (ICIP). :2157–2161.

We, humans, have the ability to easily imagine scenes that depict sentences such as ``Today is a beautiful sunny day'' or ``There is a Christmas feel, in the air''. While it is hard to precisely describe what one person may imagine, the essential high-level themes associated with such sentences largely remains the same. The ability to synthesize novel images that depict the feel of a sentence is very useful in a variety of applications such as education, advertisement, and entertainment. While existing papers tackle this problem given a style image, we aim to provide a far more intuitive and easy to use solution that synthesizes novel renditions of an existing image, conditioned on a given sentence. We present a method for cross-modal style transfer between an English sentence and an image, to produce a new image that imbibes the essential theme of the sentence. We do this by modifying the style transfer mechanism used in image style transfer to incorporate a style component derived from the given sentence. We demonstrate promising results using the YFCC100m dataset.

2020-11-04
Deng, Y., Lu, D., Chung, C., Huang, D., Zeng, Z..  2018.  Personalized Learning in a Virtual Hands-on Lab Platform for Computer Science Education. 2018 IEEE Frontiers in Education Conference (FIE). :1—8.

This Innovate Practice full paper presents a cloud-based personalized learning lab platform. Personalized learning is gaining popularity in online computer science education due to its characteristics of pacing the learning progress and adapting the instructional approach to each individual learner from a diverse background. Among various instructional methods in computer science education, hands-on labs have unique requirements of understanding learner's behavior and assessing learner's performance for personalization. However, it is rarely addressed in existing research. In this paper, we propose a personalized learning platform called ThoTh Lab specifically designed for computer science hands-on labs in a cloud environment. ThoTh Lab can identify the learning style from student activities and adapt learning material accordingly. With the awareness of student learning styles, instructors are able to use techniques more suitable for the specific student, and hence, improve the speed and quality of the learning process. With that in mind, ThoTh Lab also provides student performance prediction, which allows the instructors to change the learning progress and take other measurements to help the students timely. For example, instructors may provide more detailed instructions to help slow starters, while assigning more challenging labs to those quick learners in the same class. To evaluate ThoTh Lab, we conducted an experiment and collected data from an upper-division cybersecurity class for undergraduate students at Arizona State University in the US. The results show that ThoTh Lab can identify learning style with reasonable accuracy. By leveraging the personalized lab platform for a senior level cybersecurity course, our lab-use study also shows that the presented solution improves students engagement with better understanding of lab assignments, spending more effort on hands-on projects, and thus greatly enhancing learning outcomes.

2019-03-04
Kannavara, R., Vangore, J., Roberts, W., Lindholm, M., Shrivastav, P..  2018.  Automating Threat Intelligence for SDL. 2018 IEEE Cybersecurity Development (SecDev). :137–137.
Threat intelligence is very important in order to execute a well-informed Security Development Lifecycle (SDL). Although there are many readily available solutions supporting tactical threat intelligence focusing on enterprise Information Technology (IT) infrastructure, the lack of threat intelligence solutions focusing on SDL is a known gap which is acknowledged by the security community. To address this shortcoming, we present a solution to automate the process of mining open source threat information sources to deliver product specific threat indicators designed to strategically inform the SDL while continuously monitoring for disclosures of relevant potential vulnerabilities during product design, development, and beyond deployment.
2017-12-20
Hirotomo, M., Nishio, Y., Kamizono, M., Fukuta, Y., Mohri, M., Shiraishi, Y..  2017.  Efficient Method for Analyzing Malicious Websites by Using Multi-Environment Analysis System. 2017 12th Asia Joint Conference on Information Security (AsiaJCIS). :48–54.
The malicious websites used by drive-by download attacks change their behavior for web client environments. To analyze the behavior of malicious websites, the single-environment analysis cannot obtain sufficient information. Hence, it is difficult to analyze the whole aspect of malicious websites. Also, the code obfuscation and cloaking are used in malicious websites to avoid to be analyzed their behavior. In this paper, we propose an analyzing method that combines decoding of the obfuscation code with dynamic analysis using multi-environment analysis system in order to analyze the behavior of the malicious websites in detail. Furthermore, we present two approaches to improve the multi-environment analysis. The first one is automation of traffic log analysis to reduce the cost of analyzing huge traffic logs between the environments and malicious websites. The second one is multimodal analysis for finding the URL of malicious websites.
2018-01-10
Gupta, P., Goswami, A., Koul, S., Sartape, K..  2017.  IQS-intelligent querying system using natural language processing. 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA). 2:410–413.
Modern databases contain an enormous amount of information stored in a structured format. This information is processed to acquire knowledge. However, the process of information extraction from a Database System is cumbersome for non-expert users as it requires an extensive knowledge of DBMS languages. Therefore, an inevitable need arises to bridge the gap between user requirements and the provision of a simple information retrieval system whereby the role of a specialized Database Administrator is annulled. In this paper, we propose a methodology for building an Intelligent Querying System (IQS) by which a user can fire queries in his own (natural) language. The system first parses the input sentences and then generates SQL queries from the natural language expressions of the input. These queries are in turn mapped with the desired information to generate the required output. Hence, it makes the information retrieval process simple, effective and reliable.
2020-07-20
Shi, Yang, Wang, Xiaoping, Fan, Hongfei.  2017.  Light-weight white-box encryption scheme with random padding for wearable consumer electronic devices. IEEE Transactions on Consumer Electronics. 63:44–52.
Wearable devices can be potentially captured or accessed in an unauthorized manner because of their physical nature. In such cases, they are in white-box attack contexts, where the adversary may have total visibility on the implementation of the built-in cryptosystem, with full control over its execution platform. Dealing with white-box attacks on wearable devices is undoubtedly a challenge. To serve as a countermeasure against threats in such contexts, we propose a lightweight encryption scheme to protect the confidentiality of data against white-box attacks. We constructed the scheme's encryption and decryption algorithms on a substitution-permutation network that consisted of random secret components. Moreover, the encryption algorithm uses random padding that does not need to be correctly decrypted as part of the input. This feature enables non-bijective linear transformations to be used in each encryption round to achieve strong security. The required storage for static data is relatively small and the algorithms perform well on various devices, which indicates that the proposed scheme satisfies the requirements of wearable computing in terms of limited memory and low computational power.
2017-12-20
Li, S., Wang, B..  2017.  A Method for Hybrid Bayesian Network Structure Learning from Massive Data Using MapReduce. 2017 ieee 3rd international conference on big data security on cloud (bigdatasecurity), ieee international conference on high performance and smart computing (hpsc), and ieee international conference on intelligent data and security (ids). :272–276.
Bayesian Network is the popular and important data mining model for representing uncertain knowledge. For large scale data it is often too costly to learn the accurate structure. To resolve this problem, much work has been done on migrating the structure learning algorithms to the MapReduce framework. In this paper, we introduce a distributed hybrid structure learning algorithm by combining the advantages of constraint-based and score-and-search-based algorithms. By reusing the intermediate results of MapReduce, the algorithm greatly simplified the computing work and got good results in both efficiency and accuracy.
2018-09-12
Nagaratna, M., Sowmya, Y..  2017.  M-sanit: Computing misusability score and effective sanitization of big data using Amazon elastic MapReduce. 2017 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC). :029–035.
The invent of distributed programming frameworks like Hadoop paved way for processing voluminous data known as big data. Due to exponential growth of data, enterprises started to exploit the availability of cloud infrastructure for storing and processing big data. Insider attacks on outsourced data causes leakage of sensitive data. Therefore, it is essential to sanitize data so as to preserve privacy or non-disclosure of sensitive data. Privacy Preserving Data Publishing (PPDP) and Privacy Preserving Data Mining (PPDM) are the areas in which data sanitization plays a vital role in preserving privacy. The existing anonymization techniques for MapReduce programming can be improved to have a misusability measure for determining the level of sanitization to be applied to big data. To overcome this limitation we proposed a framework known as M-Sanit which has mechanisms to exploit misusability score of big data prior to performing sanitization using MapReduce programming paradigm. Our empirical study using the real world cloud eco system such as Amazon Elastic Cloud Compute (EC2) and Amazon Elastic MapReduce (EMR) reveals the effectiveness of misusability score based sanitization of big data prior to publishing or mining it.
2018-05-01
Wang, X., Zhou, S..  2017.  Accelerated Stochastic Gradient Method for Support Vector Machines Classification with Additive Kernel. 2017 First International Conference on Electronics Instrumentation Information Systems (EIIS). :1–6.

Support vector machines (SVMs) have been widely used for classification in machine learning and data mining. However, SVM faces a huge challenge in large scale classification tasks. Recent progresses have enabled additive kernel version of SVM efficiently solves such large scale problems nearly as fast as a linear classifier. This paper proposes a new accelerated mini-batch stochastic gradient descent algorithm for SVM classification with additive kernel (AK-ASGD). On the one hand, the gradient is approximated by the sum of a scalar polynomial function for each feature dimension; on the other hand, Nesterov's acceleration strategy is used. The experimental results on benchmark large scale classification data sets show that our proposed algorithm can achieve higher testing accuracies and has faster convergence rate.

2018-06-07
Uwagbole, S. O., Buchanan, W. J., Fan, L..  2017.  An applied pattern-driven corpus to predictive analytics in mitigating SQL injection attack. 2017 Seventh International Conference on Emerging Security Technologies (EST). :12–17.

Emerging computing relies heavily on secure backend storage for the massive size of big data originating from the Internet of Things (IoT) smart devices to the Cloud-hosted web applications. Structured Query Language (SQL) Injection Attack (SQLIA) remains an intruder's exploit of choice to pilfer confidential data from the back-end database with damaging ramifications. The existing approaches were all before the new emerging computing in the context of the Internet big data mining and as such will lack the ability to cope with new signatures concealed in a large volume of web requests over time. Also, these existing approaches were strings lookup approaches aimed at on-premise application domain boundary, not applicable to roaming Cloud-hosted services' edge Software-Defined Network (SDN) to application endpoints with large web request hits. Using a Machine Learning (ML) approach provides scalable big data mining for SQLIA detection and prevention. Unfortunately, the absence of corpus to train a classifier is an issue well known in SQLIA research in applying Artificial Intelligence (AI) techniques. This paper presents an application context pattern-driven corpus to train a supervised learning model. The model is trained with ML algorithms of Two-Class Support Vector Machine (TC SVM) and Two-Class Logistic Regression (TC LR) implemented on Microsoft Azure Machine Learning (MAML) studio to mitigate SQLIA. This scheme presented here, then forms the subject of the empirical evaluation in Receiver Operating Characteristic (ROC) curve.

2018-03-19
Lee, M., Choi, J., Choi, C., Kim, P..  2017.  APT Attack Behavior Pattern Mining Using the FP-Growth Algorithm. 2017 14th IEEE Annual Consumer Communications Networking Conference (CCNC). :1–4.

There are continuous hacking and social issues regarding APT (Advanced Persistent Threat - APT) attacks and a number of antivirus businesses and researchers are making efforts to analyze such APT attacks in order to prevent or cope with APT attacks, some host PC security technologies such as firewalls and intrusion detection systems are used. Therefore, in this study, malignant behavior patterns were extracted by using an API of PE files. Moreover, the FP-Growth Algorithm to extract behavior information generated in the host PC in order to overcome the limitation of the previous signature-based intrusion detection systems. We will utilize this study as fundamental research about a system that extracts malignant behavior patterns within networks and APIs in the future.

2018-03-05
Chen, Q., Bridges, R. A..  2017.  Automated Behavioral Analysis of Malware: A Case Study of WannaCry Ransomware. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). :454–460.

Ransomware, a class of self-propagating malware that uses encryption to hold the victims' data ransom, has emerged in recent years as one of the most dangerous cyber threats, with widespread damage; e.g., zero-day ransomware WannaCry has caused world-wide catastrophe, from knocking U.K. National Health Service hospitals offline to shutting down a Honda Motor Company in Japan [1]. Our close collaboration with security operations of large enterprises reveals that defense against ransomware relies on tedious analysis from high-volume systems logs of the first few infections. Sandbox analysis of freshly captured malware is also commonplace in operation. We introduce a method to identify and rank the most discriminating ransomware features from a set of ambient (non-attack) system logs and at least one log stream containing both ambient and ransomware behavior. These ranked features reveal a set of malware actions that are produced automatically from system logs, and can help automate tedious manual analysis. We test our approach using WannaCry and two polymorphic samples by producing logs with Cuckoo Sandbox during both ambient, and ambient plus ransomware executions. Our goal is to extract the features of the malware from the logs with only knowledge that malware was present. We compare outputs with a detailed analysis of WannaCry allowing validation of the algorithm's feature extraction and provide analysis of the method's robustness to variations of input data—changing quality/quantity of ambient data and testing polymorphic ransomware. Most notably, our patterns are accurate and unwavering when generated from polymorphic WannaCry copies, on which 63 (of 63 tested) antivirus (AV) products fail.

Chen, Zhi-Guo, Kang, Ho-Seok, Yin, Shang-Nan, Kim, Sung-Ryul.  2017.  Automatic Ransomware Detection and Analysis Based on Dynamic API Calls Flow Graph. Proceedings of the International Conference on Research in Adaptive and Convergent Systems. :196–201.

In recent cyber incidents, Ransom software (ransomware) causes a major threat to the security of computer systems. Consequently, ransomware detection has become a hot topic in computer security. Unfortunately, current signature-based and static detection model is often easily evadable by obfuscation, polymorphism, compress, and encryption. For overcoming the lack of signature-based and static ransomware detection approach, we have proposed the dynamic ransomware detection system using data mining techniques such as Random Forest (RF), Support Vector Machine (SVM), Simple Logistic (SL) and Naive Bayes (NB) algorithms for detecting known and unknown ransomware. We monitor the actual (dynamic) behaviors of software to generate API calls flow graphs (CFG) and transfer it in a feature space. Thereafter, data normalization and feature selection were applied to select informative features which are the best for discriminating between various categories of software and benign software. Finally, the data mining algorithms were used for building the detection model for judging whether the software is benign software or ransomware. Our experimental results show that our proposed system can be more effective to improve the performance for ransomware detection. Especially, the accuracy and detection rate of our proposed system with Simple Logistic (SL) algorithm can achieve to 98.2% and 97.6%, respectively. Meanwhile, the false positive rate also can be reduced to 1.2%.

2018-02-06
Vimalkumar, K., Radhika, N..  2017.  A Big Data Framework for Intrusion Detection in Smart Grids Using Apache Spark. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :198–204.

Technological advancement enables the need of internet everywhere. The power industry is not an exception in the technological advancement which makes everything smarter. Smart grid is the advanced version of the traditional grid, which makes the system more efficient and self-healing. Synchrophasor is a device used in smart grids to measure the values of electric waves, voltages and current. The phasor measurement unit produces immense volume of current and voltage data that is used to monitor and control the performance of the grid. These data are huge in size and vulnerable to attacks. Intrusion Detection is a common technique for finding the intrusions in the system. In this paper, a big data framework is designed using various machine learning techniques, and intrusions are detected based on the classifications applied on the synchrophasor dataset. In this approach various machine learning techniques like deep neural networks, support vector machines, random forest, decision trees and naive bayes classifications are done for the synchrophasor dataset and the results are compared using metrics of accuracy, recall, false rate, specificity, and prediction time. Feature selection and dimensionality reduction algorithms are used to reduce the prediction time taken by the proposed approach. This paper uses apache spark as a platform which is suitable for the implementation of Intrusion Detection system in smart grids using big data analytics.

2018-05-01
Erdem, Ö, Turan, M..  2017.  A Case Study for Automatic Detection of Steganographic Images in Network Traffic. 2017 10th International Conference on Electrical and Electronics Engineering (ELECO). :885–889.

Detection and prevention of data breaches in corporate networks is one of the most important security problems of today's world. The techniques and applications proposed for solution are not successful when attackers attempt to steal data using steganography. Steganography is the art of storing data in a file called cover, such as picture, sound and video. The concealed data cannot be directly recognized in the cover. Steganalysis is the process of revealing the presence of embedded messages in these files. There are many statistical and signature based steganalysis algorithms. In this work, the detection of steganographic images with steganalysis techniques is reviewed and a system has been developed which automatically detects steganographic images in network traffic by using open source tools.

2017-12-12
Kimmig, A., Memory, A., Miller, R. J., Getoor, L..  2017.  A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

2018-01-16
Bhaya, W., EbadyManaa, M..  2017.  DDoS attack detection approach using an efficient cluster analysis in large data scale. 2017 Annual Conference on New Trends in Information Communications Technology Applications (NTICT). :168–173.

Distributed Denial of Service (DDoS) attack is a congestion-based attack that makes both the network and host-based resources unavailable for legitimate users, sending flooding attack packets to the victim's resources. The non-existence of predefined rules to correctly identify the genuine network flow made the task of DDoS attack detection very difficult. In this paper, a combination of unsupervised data mining techniques as intrusion detection system are introduced. The entropy concept in term of windowing the incoming packets is applied with data mining technique using Clustering Using Representative (CURE) as cluster analysis to detect the DDoS attack in network flow. The data is mainly collected from DARPA2000, CAIDA2007 and CAIDA2008 datasets. The proposed approach has been evaluated and compared with several existing approaches in terms of accuracy, false alarm rate, detection rate, F. measure and Phi coefficient. Results indicates the superiority of the proposed approach with four out five detected phases, more than 99% accuracy rate 96.29% detection rate, around 0% false alarm rate 97.98% F-measure, and 97.98% Phi coefficient.

2018-06-20
Tran, H., Nguyen, A., Vo, P., Vu, T..  2017.  DNS graph mining for malicious domain detection. 2017 IEEE International Conference on Big Data (Big Data). :4680–4685.

As a vital component of variety cyber attacks, malicious domain detection becomes a hot topic for cyber security. Several recent techniques are proposed to identify malicious domains through analysis of DNS data because much of global information in DNS data which cannot be affected by the attackers. The attackers always recycle resources, so they frequently change the domain - IP resolutions and create new domains to avoid detection. Therefore, multiple malicious domains are hosted by the same IPs and multiple IPs also host same malicious domains in simultaneously, which create intrinsic association among them. Hence, using the labeled domains which can be traced back from queries history of all domains to verify and figure out the association of them all. Graphs seem the best candidate to represent for this relationship and there are many algorithms developed on graph with high performance. A graph-based interface can be developed and transformed to the graph mining task of inferring graph node's reputation scores using improvements of the belief propagation algorithm. Then higher reputation scores the nodes reveal, the more malicious probabilities they infer. For demonstration, this paper proposes a malicious domain detection technique and evaluates on a real-world dataset. The dataset is collected from DNS data servers which will be used for building a DNS graph. The proposed technique achieves high performance in accuracy rates over 98.3%, precision and recall rates as: 99.1%, 98.6%. Especially, with a small set of labeled domains (legitimate and malicious domains), the technique can discover a large set of potential malicious domains. The results indicate that the method is strongly effective in detecting malicious domains.

2017-12-28
Stuckman, J., Walden, J., Scandariato, R..  2017.  The Effect of Dimensionality Reduction on Software Vulnerability Prediction Models. IEEE Transactions on Reliability. 66:17–37.

Statistical prediction models can be an effective technique to identify vulnerable components in large software projects. Two aspects of vulnerability prediction models have a profound impact on their performance: 1) the features (i.e., the characteristics of the software) that are used as predictors and 2) the way those features are used in the setup of the statistical learning machinery. In a previous work, we compared models based on two different types of features: software metrics and term frequencies (text mining features). In this paper, we broaden the set of models we compare by investigating an array of techniques for the manipulation of said features. These techniques fall under the umbrella of dimensionality reduction and have the potential to improve the ability of a prediction model to localize vulnerabilities. We explore the role of dimensionality reduction through a series of cross-validation and cross-project prediction experiments. Our results show that in the case of software metrics, a dimensionality reduction technique based on confirmatory factor analysis provided an advantage when performing cross-project prediction, yielding the best F-measure for the predictions in five out of six cases. In the case of text mining, feature selection can make the prediction computationally faster, but no dimensionality reduction technique provided any other notable advantage.