Visible to the public Biblio

Filters: Keyword is k-means  [Clear All Filters]
2021-12-21
Li, Yan, Lu, Yifei, Li, Shuren.  2021.  EZAC: Encrypted Zero-Day Applications Classification Using CNN and K-Means. 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). :378–383.
With the rapid development of traffic encryption technology and the continuous emergence of various network services, the classification of encrypted zero-day applications has become a major challenge in network supervision. More seriously, many attackers will utilize zero-day applications to hide their attack behaviors and make attack undetectable. However, there are very few existing studies on zero-day applications. Existing works usually select and label zero-day applications from unlabeled datasets, and these are not true zero-day applications classification. To address the classification of zero-day applications, this paper proposes an Encrypted Zero-day Applications Classification (EZAC) method that combines Convolutional Neural Network (CNN) and K-Means, which can effectively classify zero-day applications. We first use CNN to classify the flows, and for the flows that may be zero-day applications, we use K-Means to divide them into several categories, which are then manually labeled. Experimental results show that the EZAC achieves 97.4% accuracy on a public dataset (CIC-Darknet2020), which outperforms the state-of-the-art methods.
2021-11-29
Gao, Hongjun, Liu, Youbo, Liu, Zhenyu, Xu, Song, Wang, Renjun, Xiang, Enmin, Yang, Jie, Qi, Mohan, Zhao, Yinbo, Pan, Hongjin et al..  2020.  Optimal Planning of Distribution Network Based on K-Means Clustering. 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2). :2135–2139.
The reform of electricity marketization has bred multiple market agents. In order to maximize the total social benefits on the premise of ensuring the security of the system and taking into account the interests of multiple market agents, a bi-level optimal allocation model of distribution network with multiple agents participating is proposed. The upper level model considers the economic benefits of energy and service providers, which are mainly distributed power investors, energy storage operators and distribution companies. The lower level model considers end-user side economy and actively responds to demand management to ensure the highest user satisfaction. The K-means multi scenario analysis method is used to describe the time series characteristics of wind power, photovoltaic power and load. The particle swarm optimization (PSO) algorithm is used to solve the bi-level model, and IEEE33 node system is used to verify that the model can effectively consider the interests of multiple agents while ensuring the security of the system.
2021-06-30
Xu, Hui, Zhang, Wei, Gao, Man, Chen, Hongwei.  2020.  Clustering Analysis for Big Data in Network Security Domain Using a Spark-Based Method. 2020 IEEE 5th International Symposium on Smart and Wireless Systems within the Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS). :1—4.
Considering the problem of network security under the background of big data, the clustering analysis algorithms can be utilized to improve the correctness of network intrusion detection models for security management. As a kind of iterative clustering analysis algorithm, K-means algorithm is not only simple but also efficient, so it is widely used. However, the traditional K-means algorithm cannot well solve the network security problem when facing big data due to its high complexity and limited processing ability. In this case, this paper proposes to optimize the traditional K-means algorithm based on the Spark platform and deploy the optimized clustering analysis algorithm in the distributed architecture, so as to improve the efficiency of clustering algorithm for network intrusion detection in big data environment. The experimental result shows that, compared with the traditional K-means algorithm, the efficiency of the optimized K-means algorithm using a Spark-based method is significantly improved in the running time.
2020-05-22
Chen, Yalin, Li, Zhiyang, Shi, Jia, Liu, Zhaobin, Qu, Wenyu.  2018.  Stacked K-Means Hashing Quantization for Nearest Neighbor Search. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1—4.
Nowadays, with such a huge amount of information available online, one key challenge is how to retrieve target data efficiently. A recent state-of-art solution, k-means hashing (KMH), codes data via a string of binary code obtained by iterative k-means clustering and binary code optimizing. To deal with high dimensional data, KMH divides the space into low-dimensional subspaces, places a hypercube in each subspace and finds its proper location by the mentioned optimizing process. However, the complexity of the optimization increases rapidly when the dimension of the hypercube increases. To address this issue, we propose an improved hashing method stacked k-means hashing (SKMH). The main idea is to increase the approximation by a coarse-to-fine multi-layer lower-dimensional cubes. With these kinds of lower-dimensional cubes, SKMH can achieve a similar approximation ability via a less optimizing time, compared with KMH method using higher-dimensional cubes. Extensive experiments have been conducted on two public databases, demonstrating the performance of our method by some common metrics in fast nearest neighbor search.
2020-05-18
Sel, Slhami, Hanbay, Davut.  2019.  E-Mail Classification Using Natural Language Processing. 2019 27th Signal Processing and Communications Applications Conference (SIU). :1–4.
Thanks to the rapid increase in technology and electronic communications, e-mail has become a serious communication tool. In many applications such as business correspondence, reminders, academic notices, web page memberships, e-mail is used as primary way of communication. If we ignore spam e-mails, there remain hundreds of e-mails received every day. In order to determine the importance of received e-mails, the subject or content of each e-mail must be checked. In this study we proposed an unsupervised system to classify received e-mails. Received e-mails' coordinates are determined by a method of natural language processing called as Word2Vec algorithm. According to the similarities, processed data are grouped by k-means algorithm with an unsupervised training model. In this study, 10517 e-mails were used in training. The success of the system is tested on a test group of 200 e-mails. In the test phase M3 model (window size 3, min. Word frequency 10, Gram skip) consolidated the highest success (91%). Obtained results are evaluated in section VI.
2020-03-23
Naik, Nitin, Jenkins, Paul, Gillett, Jonathan, Mouratidis, Haralambos, Naik, Kshirasagar, Song, Jingping.  2019.  Lockout-Tagout Ransomware: A Detection Method for Ransomware using Fuzzy Hashing and Clustering. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :641–648.

Ransomware attacks are a prevalent cybersecurity threat to every user and enterprise today. This is attributed to their polymorphic behaviour and dispersion of inexhaustible versions due to the same ransomware family or threat actor. A certain ransomware family or threat actor repeatedly utilises nearly the same style or codebase to create a vast number of ransomware versions. Therefore, it is essential for users and enterprises to keep well-informed about this threat landscape and adopt proactive prevention strategies to minimise its spread and affects. This requires a technique to detect ransomware samples to determine the similarity and link with the known ransomware family or threat actor. Therefore, this paper presents a detection method for ransomware by employing a combination of a similarity preserving hashing method called fuzzy hashing and a clustering method. This detection method is applied on the collected WannaCry/WannaCryptor ransomware samples utilising a range of fuzzy hashing and clustering methods. The clustering results of various clustering methods are evaluated through the use of the internal evaluation indexes to determine the accuracy and consistency of their clustering results, thus the effective combination of fuzzy hashing and clustering method as applied to the particular ransomware corpus. The proposed detection method is a static analysis method, which requires fewer computational overheads and performs rapid comparative analysis with respect to other static analysis methods.

2019-07-01
Meryem, Amar, Samira, Douzi, Bouabid, El Ouahidi.  2018.  Enhancing Cloud Security Using Advanced MapReduce K-means on Log Files. Proceedings of the 2018 International Conference on Software Engineering and Information Management. :63–67.

Many customers ranked cloud security as a major challenge that threaten their work and reduces their trust on cloud service's provider. Hence, a significant improvement is required to establish better adaptations of security measures that suit recent technologies and especially distributed architectures. Considering the meaningful recorded data in cloud generated log files, making analysis on them, mines insightful value about hacker's activities. It identifies malicious user behaviors and predicts new suspected events. Not only that, but centralizing log files, prevents insiders from causing damage to system. In this paper, we proposed to take away sensitive log files into a single server provider and combining both MapReduce programming and k-means on the same algorithm to cluster observed events into classes having similar features. To label unknown user behaviors and predict new suspected activities this approach considers cosine distances and deviation metrics.

2018-09-28
Song, Youngho, Shin, Young-sung, Jang, Miyoung, Chang, Jae-Woo.  2017.  Design and implementation of HDFS data encryption scheme using ARIA algorithm on Hadoop. 2017 IEEE International Conference on Big Data and Smart Computing (BigComp). :84–90.

Hadoop is developed as a distributed data processing platform for analyzing big data. Enterprises can analyze big data containing users' sensitive information by using Hadoop and utilize them for their marketing. Therefore, researches on data encryption have been widely done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES international standard data encryption algorithm. Meanwhile, the Korean government selected ARIA algorithm as a standard data encryption scheme for domestic usages. In this paper, we propose a HDFS data encryption scheme which supports both ARIA and AES algorithms on Hadoop. First, the proposed scheme provides a HDFS block-splitting component that performs ARIA/AES encryption and decryption under the Hadoop distributed computing environment. Second, the proposed scheme provides a variable-length data processing component that can perform encryption and decryption by adding dummy data, in case when the last data block does not contains 128-bit data. Finally, we show from performance analysis that our proposed scheme is efficient for various applications, such as word counting, sorting, k-Means, and hierarchical clustering.

2018-06-11
Hu, Qinghao, Wu, Jiaxiang, Bai, Lu, Zhang, Yifan, Cheng, Jian.  2017.  Fast K-means for Large Scale Clustering. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. :2099–2102.

K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even tens of thousands of clusters. In this paper, we propose a fast k-means algorithm named multi-stage k-means (MKM) which uses a multi-stage filtering approach. The multi-stage filtering approach greatly accelerates the k-means algorithm via a coarse-to-fine search strategy. To further speed up the algorithm, hashing is introduced to accelerate the assignment step which is the most time-consuming part in k-means. Extensive experiments on several massive datasets show that the proposed algorithm can obtain up to 600X speed-up over the k-means algorithm with comparable accuracy.

2018-05-16
Angelidakis, Haris, Makarychev, Konstantin, Makarychev, Yury.  2017.  Algorithms for Stable and Perturbation-resilient Problems. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. :438–451.

We study the notion of stability and perturbation resilience introduced by Bilu and Linial (2010) and Awasthi, Blum, and Sheffet (2012). A combinatorial optimization problem is α-stable or α-perturbation-resilient if the optimal solution does not change when we perturb all parameters of the problem by a factor of at most α. In this paper, we give improved algorithms for stable instances of various clustering and combinatorial optimization problems. We also prove several hardness results. We first give an exact algorithm for 2-perturbation resilient instances of clustering problems with natural center-based objectives. The class of clustering problems with natural center-based objectives includes such problems as k-means, k-median, and k-center. Our result improves upon the result of Balcan and Liang (2016), who gave an algorithm for clustering 1+√2≈2.41 perturbation-resilient instances. Our result is tight in the sense that no polynomial-time algorithm can solve (2−ε)-perturbation resilient instances of k-center unless NP = RP, as was shown by Balcan, Haghtalab, and White (2016). We then give an exact algorithm for (2−2/k)-stable instances of Minimum Multiway Cut with k terminals, improving the previous result of Makarychev, Makarychev, and Vijayaraghavan (2014), who gave an algorithm for 4-stable instances. We also give an algorithm for (2−2/k+δ)-weakly stable instances of Minimum Multiway Cut. Finally, we show that there are no robust polynomial-time algorithms for n1−ε-stable instances of Set Cover, Minimum Vertex Cover, and Min 2-Horn Deletion (unless P = NP).

2017-06-05
Kirchler, Matthias, Herrmann, Dominik, Lindemann, Jens, Kloft, Marius.  2016.  Tracked Without a Trace: Linking Sessions of Users by Unsupervised Learning of Patterns in Their DNS Traffic. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :23–34.

Behavior-based tracking is an unobtrusive technique that allows observers to monitor user activities on the Internet over long periods of time – in spite of changing IP addresses. Previous work has employed supervised classifiers in order to link the sessions of individual users. However, classifiers need labeled training sessions, which are difficult to obtain for observers. In this paper we show how this limitation can be overcome with an unsupervised learning technique. We present a modified k-means algorithm and evaluate it on a realistic dataset that contains the Domain Name System (DNS) queries of 3,862 users. For this purpose, we simulate an observer that tries to track all users, and an Internet Service Provider that assigns a different IP address to every user on every day. The highest tracking accuracy is achieved within the subgroup of highly active users. Almost all sessions of 73% of the users in this subgroup can be linked over a period of 56 days. 19% of the highly active users can be traced completely, i.e., all their sessions are assigned to a single cluster. This fraction increases to 40% for shorter periods of seven days. As service providers may engage in behavior-based tracking to complement their existing profiling efforts, it constitutes a severe privacy threat for users of online services. Users can defend against behavior-based tracking by changing their IP address frequently, but this is cumbersome at the moment.

2017-03-08
Chammas, E., Mokbel, C., Likforman-Sulem, L..  2015.  Arabic handwritten document preprocessing and recognition. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). :451–455.

Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.

2015-05-06
Sumit, S., Mitra, D., Gupta, D..  2014.  Proposed Intrusion Detection on ZRP based MANET by effective k-means clustering method of data mining. Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on. :156-160.

Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.

Sumit, S., Mitra, D., Gupta, D..  2014.  Proposed Intrusion Detection on ZRP based MANET by effective k-means clustering method of data mining. Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on. :156-160.

Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.