Biblio
Ransomware attacks are a prevalent cybersecurity threat to every user and enterprise today. This is attributed to their polymorphic behaviour and dispersion of inexhaustible versions due to the same ransomware family or threat actor. A certain ransomware family or threat actor repeatedly utilises nearly the same style or codebase to create a vast number of ransomware versions. Therefore, it is essential for users and enterprises to keep well-informed about this threat landscape and adopt proactive prevention strategies to minimise its spread and affects. This requires a technique to detect ransomware samples to determine the similarity and link with the known ransomware family or threat actor. Therefore, this paper presents a detection method for ransomware by employing a combination of a similarity preserving hashing method called fuzzy hashing and a clustering method. This detection method is applied on the collected WannaCry/WannaCryptor ransomware samples utilising a range of fuzzy hashing and clustering methods. The clustering results of various clustering methods are evaluated through the use of the internal evaluation indexes to determine the accuracy and consistency of their clustering results, thus the effective combination of fuzzy hashing and clustering method as applied to the particular ransomware corpus. The proposed detection method is a static analysis method, which requires fewer computational overheads and performs rapid comparative analysis with respect to other static analysis methods.
Many customers ranked cloud security as a major challenge that threaten their work and reduces their trust on cloud service's provider. Hence, a significant improvement is required to establish better adaptations of security measures that suit recent technologies and especially distributed architectures. Considering the meaningful recorded data in cloud generated log files, making analysis on them, mines insightful value about hacker's activities. It identifies malicious user behaviors and predicts new suspected events. Not only that, but centralizing log files, prevents insiders from causing damage to system. In this paper, we proposed to take away sensitive log files into a single server provider and combining both MapReduce programming and k-means on the same algorithm to cluster observed events into classes having similar features. To label unknown user behaviors and predict new suspected activities this approach considers cosine distances and deviation metrics.
Hadoop is developed as a distributed data processing platform for analyzing big data. Enterprises can analyze big data containing users' sensitive information by using Hadoop and utilize them for their marketing. Therefore, researches on data encryption have been widely done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES international standard data encryption algorithm. Meanwhile, the Korean government selected ARIA algorithm as a standard data encryption scheme for domestic usages. In this paper, we propose a HDFS data encryption scheme which supports both ARIA and AES algorithms on Hadoop. First, the proposed scheme provides a HDFS block-splitting component that performs ARIA/AES encryption and decryption under the Hadoop distributed computing environment. Second, the proposed scheme provides a variable-length data processing component that can perform encryption and decryption by adding dummy data, in case when the last data block does not contains 128-bit data. Finally, we show from performance analysis that our proposed scheme is efficient for various applications, such as word counting, sorting, k-Means, and hierarchical clustering.
K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even tens of thousands of clusters. In this paper, we propose a fast k-means algorithm named multi-stage k-means (MKM) which uses a multi-stage filtering approach. The multi-stage filtering approach greatly accelerates the k-means algorithm via a coarse-to-fine search strategy. To further speed up the algorithm, hashing is introduced to accelerate the assignment step which is the most time-consuming part in k-means. Extensive experiments on several massive datasets show that the proposed algorithm can obtain up to 600X speed-up over the k-means algorithm with comparable accuracy.
We study the notion of stability and perturbation resilience introduced by Bilu and Linial (2010) and Awasthi, Blum, and Sheffet (2012). A combinatorial optimization problem is α-stable or α-perturbation-resilient if the optimal solution does not change when we perturb all parameters of the problem by a factor of at most α. In this paper, we give improved algorithms for stable instances of various clustering and combinatorial optimization problems. We also prove several hardness results. We first give an exact algorithm for 2-perturbation resilient instances of clustering problems with natural center-based objectives. The class of clustering problems with natural center-based objectives includes such problems as k-means, k-median, and k-center. Our result improves upon the result of Balcan and Liang (2016), who gave an algorithm for clustering 1+â2â2.41 perturbation-resilient instances. Our result is tight in the sense that no polynomial-time algorithm can solve (2âε)-perturbation resilient instances of k-center unless NP = RP, as was shown by Balcan, Haghtalab, and White (2016). We then give an exact algorithm for (2â2/k)-stable instances of Minimum Multiway Cut with k terminals, improving the previous result of Makarychev, Makarychev, and Vijayaraghavan (2014), who gave an algorithm for 4-stable instances. We also give an algorithm for (2â2/k+δ)-weakly stable instances of Minimum Multiway Cut. Finally, we show that there are no robust polynomial-time algorithms for n1âε-stable instances of Set Cover, Minimum Vertex Cover, and Min 2-Horn Deletion (unless P = NP).
Behavior-based tracking is an unobtrusive technique that allows observers to monitor user activities on the Internet over long periods of time – in spite of changing IP addresses. Previous work has employed supervised classifiers in order to link the sessions of individual users. However, classifiers need labeled training sessions, which are difficult to obtain for observers. In this paper we show how this limitation can be overcome with an unsupervised learning technique. We present a modified k-means algorithm and evaluate it on a realistic dataset that contains the Domain Name System (DNS) queries of 3,862 users. For this purpose, we simulate an observer that tries to track all users, and an Internet Service Provider that assigns a different IP address to every user on every day. The highest tracking accuracy is achieved within the subgroup of highly active users. Almost all sessions of 73% of the users in this subgroup can be linked over a period of 56 days. 19% of the highly active users can be traced completely, i.e., all their sessions are assigned to a single cluster. This fraction increases to 40% for shorter periods of seven days. As service providers may engage in behavior-based tracking to complement their existing profiling efforts, it constitutes a severe privacy threat for users of online services. Users can defend against behavior-based tracking by changing their IP address frequently, but this is cumbersome at the moment.
Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.
Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.
Mobile Ad-Hoc Networks (MANET) consist of peer-to-peer infrastructure less communicating nodes that are highly dynamic. As a result, routing data becomes more challenging. Ultimately routing protocols for such networks face the challenges of random topology change, nature of the link (symmetric or asymmetric) and power requirement during data transmission. Under such circumstances both, proactive as well as reactive routing are usually inefficient. We consider, zone routing protocol (ZRP) that adds the qualities of the proactive (IARP) and reactive (IERP) protocols. In ZRP, an updated topological map of zone centered on each node, is maintained. Immediate routes are available inside each zone. In order to communicate outside a zone, a route discovery mechanism is employed. The local routing information of the zones helps in this route discovery procedure. In MANET security is always an issue. It is possible that a node can turn malicious and hamper the normal flow of packets in the MANET. In order to overcome such issue we have used a clustering technique to separate the nodes having intrusive behavior from normal behavior. We call this technique as effective k-means clustering which has been motivated from k-means. We propose to implement Intrusion Detection System on each node of the MANET which is using ZRP for packet flow. Then we will use effective k-means to separate the malicious nodes from the network. Thus, our Ad-Hoc network will be free from any malicious activity and normal flow of packets will be possible.