Visible to the public Biblio

Filters: Keyword is pattern clustering  [Clear All Filters]
2017-02-14
K. F. Hong, C. C. Chen, Y. T. Chiu, K. S. Chou.  2015.  "Scalable command and control detection in log data through UF-ICF analysis". 2015 International Carnahan Conference on Security Technology (ICCST). :293-298.

During an advanced persistent threat (APT), an attacker group usually establish more than one C&C server and these C&C servers will change their domain names and corresponding IP addresses over time to be unseen by anti-virus software or intrusion prevention systems. For this reason, discovering and catching C&C sites becomes a big challenge in information security. Based on our observations and deductions, a malware tends to contain a fixed user agent string, and the connection behaviors generated by a malware is different from that by a benign service or a normal user. This paper proposed a new method comprising filtering and clustering methods to detect C&C servers with a relatively higher coverage rate. The experiments revealed that the proposed method can successfully detect C&C Servers, and the can provide an important clue for detecting APT.

2015-05-06
Vollmer, T., Manic, M., Linda, O..  2014.  Autonomic Intelligent Cyber-Sensor to Support Industrial Control Network Awareness. Industrial Informatics, IEEE Transactions on. 10:1647-1658.

The proliferation of digital devices in a networked industrial ecosystem, along with an exponential growth in complexity and scope, has resulted in elevated security concerns and management complexity issues. This paper describes a novel architecture utilizing concepts of autonomic computing and a simple object access protocol (SOAP)-based interface to metadata access points (IF-MAP) external communication layer to create a network security sensor. This approach simplifies integration of legacy software and supports a secure, scalable, and self-managed framework. The contribution of this paper is twofold: 1) A flexible two-level communication layer based on autonomic computing and service oriented architecture is detailed and 2) three complementary modules that dynamically reconfigure in response to a changing environment are presented. One module utilizes clustering and fuzzy logic to monitor traffic for abnormal behavior. Another module passively monitors network traffic and deploys deceptive virtual network hosts. These components of the sensor system were implemented in C++ and PERL and utilize a common internal D-Bus communication mechanism. A proof of concept prototype was deployed on a mixed-use test network showing the possible real-world applicability. In testing, 45 of the 46 network attached devices were recognized and 10 of the 12 emulated devices were created with specific operating system and port configurations. In addition, the anomaly detection algorithm achieved a 99.9% recognition rate. All output from the modules were correctly distributed using the common communication structure.

Vollmer, T., Manic, M., Linda, O..  2014.  Autonomic Intelligent Cyber-Sensor to Support Industrial Control Network Awareness. Industrial Informatics, IEEE Transactions on. 10:1647-1658.

The proliferation of digital devices in a networked industrial ecosystem, along with an exponential growth in complexity and scope, has resulted in elevated security concerns and management complexity issues. This paper describes a novel architecture utilizing concepts of autonomic computing and a simple object access protocol (SOAP)-based interface to metadata access points (IF-MAP) external communication layer to create a network security sensor. This approach simplifies integration of legacy software and supports a secure, scalable, and self-managed framework. The contribution of this paper is twofold: 1) A flexible two-level communication layer based on autonomic computing and service oriented architecture is detailed and 2) three complementary modules that dynamically reconfigure in response to a changing environment are presented. One module utilizes clustering and fuzzy logic to monitor traffic for abnormal behavior. Another module passively monitors network traffic and deploys deceptive virtual network hosts. These components of the sensor system were implemented in C++ and PERL and utilize a common internal D-Bus communication mechanism. A proof of concept prototype was deployed on a mixed-use test network showing the possible real-world applicability. In testing, 45 of the 46 network attached devices were recognized and 10 of the 12 emulated devices were created with specific operating system and port configurations. In addition, the anomaly detection algorithm achieved a 99.9% recognition rate. All output from the modules were correctly distributed using the common communication structure.

2015-05-05
Lomotey, R.K., Deters, R..  2014.  Terms Mining in Document-Based NoSQL: Response to Unstructured Data. Big Data (BigData Congress), 2014 IEEE International Congress on. :661-668.

Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.

Eun Hee Ko, Klabjan, D..  2014.  Semantic Properties of Customer Sentiment in Tweets. Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on. :657-663.

An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.

Craig, P., Roa Seiler, N., Olvera Cervantes, A.D..  2014.  Animated Geo-temporal Clusters for Exploratory Search in Event Data Document Collections. Information Visualisation (IV), 2014 18th International Conference on. :157-163.

This paper presents a novel visual analytics technique developed to support exploratory search tasks for event data document collections. The technique supports discovery and exploration by clustering results and overlaying cluster summaries onto coordinated timeline and map views. Users can also explore and interact with search results by selecting clusters to filter and re-cluster the data with animation used to smooth the transition between views. The technique demonstrates a number of advantages over alternative methods for displaying and exploring geo-referenced search results and spatio-temporal data. Firstly, cluster summaries can be presented in a manner that makes them easy to read and scan. Listing representative events from each cluster also helps the process of discovery by preserving the diversity of results. Also, clicking on visual representations of geo-temporal clusters provides a quick and intuitive way to navigate across space and time simultaneously. This removes the need to overload users with the display of too many event labels at any one time. The technique was evaluated with a group of nineteen users and compared with an equivalent text based exploratory search engine.
 

Kaci, A., Kamwa, I., Dessaint, L.A., Guillon, S..  2014.  Synchrophasor Data Baselining and Mining for Online Monitoring of Dynamic Security Limits. Power Systems, IEEE Transactions on. 29:2681-2695.

When the system is in normal state, actual SCADA measurements of power transfers across critical interfaces are continuously compared with limits determined offline and stored in look-up tables or nomograms in order to assess whether the network is secure or insecure and inform the dispatcher to take preventive action in the latter case. However, synchrophasors could change this paradigm by enabling new features, the phase-angle differences, which are well-known measures of system stress, with the added potential to increase system visibility. The paper develops a systematic approach to baseline the phase-angles versus actual transfer limits across system interfaces and enable synchrophasor-based situational awareness (SBSA). Statistical methods are first used to determine seasonal exceedance levels of angle shifts that can allow real-time scoring and detection of atypical conditions. Next, key buses suitable for SBSA are identified using correlation and partitioning around medoid (PAM) clustering. It is shown that angle shifts of this subset of 15% of the network backbone buses can be effectively used as features in ensemble decision tree-based forecasting of seasonal security margins across critical interfaces.
 

Hang Shao, Japkowicz, N., Abielmona, R., Falcon, R..  2014.  Vessel track correlation and association using fuzzy logic and Echo State Networks. Evolutionary Computation (CEC), 2014 IEEE Congress on. :2322-2329.

Tracking moving objects is a task of the utmost importance to the defence community. As this task requires high accuracy, rather than employing a single detector, it has become common to use multiple ones. In such cases, the tracks produced by these detectors need to be correlated (if they belong to the same sensing modality) or associated (if they were produced by different sensing modalities). In this work, we introduce Computational-Intelligence-based methods for correlating and associating various contacts and tracks pertaining to maritime vessels in an area of interest. Fuzzy k-Nearest Neighbours will be used to conduct track correlation and Fuzzy C-Means clustering will be applied for association. In that way, the uncertainty of the track correlation and association is handled through fuzzy logic. To better model the state of the moving target, the traditional Kalman Filter will be extended using an Echo State Network. Experimental results on five different types of sensing systems will be discussed to justify the choices made in the development of our approach. In particular, we will demonstrate the judiciousness of using Fuzzy k-Nearest Neighbours and Fuzzy C-Means on our tracking system and show how the extension of the traditional Kalman Filter by a recurrent neural network is superior to its extension by other methods.

Lei Xu, Pham Dang Khoa, Seung Hun Kim, Won Woo Ro, Weidong Shi.  2014.  LUT based secure cloud computing #x2014; An implementation using FPGAs. ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on. :1-6.

Cloud computing is widely deployed to handle challenges such as big data processing and storage. Due to the outsourcing and sharing feature of cloud computing, security is one of the main concerns that hinders the end users to shift their businesses to the cloud. A lot of cryptographic techniques have been proposed to alleviate the data security issues in cloud computing, but most of these works focus on solving a specific security problem such as data sharing, comparison, searching, etc. At the same time, little efforts have been done on program security and formalization of the security requirements in the context of cloud computing. We propose a formal definition of the security of cloud computing, which captures the essence of the security requirements of both data and program. Analysis of some existing technologies under the proposed definition shows the effectiveness of the definition. We also give a simple look-up table based solution for secure cloud computing which satisfies the given definition. As FPGA uses look-up table as its main computation component, it is a suitable hardware platform for the proposed secure cloud computing scheme. So we use FPGAs to implement the proposed solution for k-means clustering algorithm, which shows the effectiveness of the proposed solution.
 

2015-05-04
Honghui Dong, Xiaoqing Ding, Mingchao Wu, Yan Shi, Limin Jia, Yong Qin, Lianyu Chu.  2014.  Urban traffic commuting analysis based on mobile phone data. Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on. :611-616.

With the urban traffic planning and management development, it is a highly considerable issue to analyze and estimate the original-destination data in the city. Traditional method to acquire the OD information usually uses household survey, which is inefficient and expensive. In this paper, the new methodology proposed that using mobile phone data to analyze the mechanism of trip generation, trip attraction and the OD information. The mobile phone data acquisition is introduced. A pilot study is implemented on Beijing by using the new method. And, much important traffic information can be extracted from the mobile phone data. We use the K-means clustering algorithm to divide the traffic zone. The attribution of traffic zone is identified using the mobile phone data. Then the OD distribution and the commuting travel are analyzed. At last, an experiment is done to verify availability of the mobile phone data, that analyzing the "Traffic tide phenomenon" in Beijing. The results of the experiments in this paper show a great correspondence to the actual situation. The validated results reveal the mobile phone data has tremendous potential on OD analysis.
 

Yun Shen, Thonnard, O..  2014.  MR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications. Big Data (Big Data), 2014 IEEE International Conference on. :627-635.

Security companies have recently realised that mining massive amounts of security data can help generate actionable intelligence and improve their understanding of Internet attacks. In particular, attack attribution and situational understanding are considered critical aspects to effectively deal with emerging, increasingly sophisticated Internet attacks. This requires highly scalable analysis tools to help analysts classify, correlate and prioritise security events, depending on their likely impact and threat level. However, this security data mining process typically involves a considerable amount of features interacting in a non-obvious way, which makes it inherently complex. To deal with this challenge, we introduce MR-TRIAGE, a set of distributed algorithms built on MapReduce that can perform scalable multi-criteria data clustering on large security data sets and identify complex relationships hidden in massive datasets. The MR-TRIAGE workflow is made of a scalable data summarisation, followed by scalable graph clustering algorithms in which we integrate multi-criteria evaluation techniques. Theoretical computational complexity of the proposed parallel algorithms are discussed and analysed. The experimental results demonstrate that the algorithms can scale well and efficiently process large security datasets on commodity hardware. Our approach can effectively cluster any type of security events (e.g., spam emails, spear-phishing attacks, etc) that are sharing at least some commonalities among a number of predefined features.
 

2015-04-30
Saoud, Z., Faci, N., Maamar, Z., Benslimane, D..  2014.  A Fuzzy Clustering-Based Credibility Model for Trust Assessment in a Service-Oriented Architecture. WETICE Conference (WETICE), 2014 IEEE 23rd International. :56-61.

This paper presents a credibility model to assess trust of Web services. The model relies on consumers' ratings whose accuracy can be questioned due to different biases. A category of consumers known as strict are usually excluded from the process of reaching a majority consensus. We demonstrated that this exclusion should not be. The proposed model reduces the gap between these consumers' ratings and the current majority rating. Fuzzy clustering is used to compute consumers' credibility. To validate this model a set of experiments are carried out.