Biblio
Cryptojacking (also called malicious cryptocurrency mining or cryptomining) is a new threat model using CPU resources covertly “mining” a cryptocurrency in the browser. The impact is a surge in CPU Usage and slows the system performance. In this research, in-browsercryptojacking mitigation has been built as an extension in Google Chrome using Taint analysis method. The method used in this research is attack modeling with abuse case using the Man-In-The-Middle (MITM) attack as a testing for mitigation. The proposed model is designed so that users will be notified if a cryptojacking attack occurs. Hence, the user is able to check the script characteristics that run on the website background. The results of this research show that the taint analysis is a promising method to mitigate cryptojacking attacks. From 100 random sample websites, the taint analysis method can detect 19 websites that are infcted by cryptojacking.
With the recent boom in the cryptocurrency market, hackers have been on the lookout to find novel ways of commandeering users' machine for covert and stealthy mining operations. In an attempt to expose such under-the-hood practices, this paper explores the issue of browser cryptojacking, whereby miners are secretly deployed inside browser code without the knowledge of the user. To this end, we analyze the top 50k websites from Alexa and find a noticeable percentage of sites that are indulging in this exploitative exercise often using heavily obfuscated code. Furthermore, mining prevention plug-ins, such as NoMiner, fail to flag such cleverly concealed instances. Hence, we propose a machine learning solution based on hardware-assisted profiling of browser code in real-time. A fine-grained micro-architectural footprint allows us to classify mining applications with \textbackslashtextgreater99% accuracy and even flags them if the mining code has been heavily obfuscated or encrypted. We build our own browser extension and show that it outperforms other plug-ins. The proposed design has negligible overhead on the user's machine and works for all standard off-the-shelf CPUs.
Event logs that originate from information systems enable comprehensive analysis of business processes, e.g., by process model discovery. However, logs potentially contain sensitive information about individual employees involved in process execution that are only partially hidden by an obfuscation of the event data. In this paper, we therefore address the risk of privacy-disclosure attacks on event logs with pseudonymized employee information. To this end, we introduce PRETSA, a novel algorithm for event log sanitization that provides privacy guarantees in terms of k-anonymity and t-closeness. It thereby avoids disclosure of employee identities, their membership in the event log, and their characterization based on sensitive attributes, such as performance information. Through step-wise transformations of a prefix-tree representation of an event log, we maintain its high utility for discovery of a performance-annotated process model. Experiments with real-world data demonstrate that sanitization with PRETSA yields event logs of higher utility compared to methods that exploit frequency-based filtering, while providing the same privacy guarantees.
Network based attacks on ecommerce websites can have serious economic consequences. Hence, anomaly detection in dynamic network traffic has become an increasingly important research topic in recent years. This paper proposes a novel dynamic Graph and sparse Autoencoder based Anomaly Detection algorithm named GAAD. In GAAD, the network traffic over contiguous time intervals is first modelled as a series of dynamic bipartite graph increments. One mode projection is performed on each bipartite graph increment and the adjacency matrix derived. Columns of the resultant adjacency matrix are then used to train a sparse autoencoder to reconstruct it. The sum of squared errors between the reconstructed approximation and original adjacency matrix is then calculated. An online learning algorithm is then used to estimate a Gaussian distribution that models the error distribution. Outlier error values are deemed to represent anomalous traffic flows corresponding to possible attacks. In the experiment, a network emulator was used to generate representative ecommerce traffic flows over a time period of 225 minutes with five attacks injected, including SYN scans, host emulation and DDoS attacks. ROC curves were generated to investigate the influence of the autoencoder hyper-parameters. It was found that increasing the number of hidden nodes and their activation level, and increasing sparseness resulted in improved performance. Analysis showed that the sparse autoencoder was unable to encode the highly structured adjacency matrix structures associated with attacks, hence they were detected as anomalies. In contrast, SVD and variants, such as the compact matrix decomposition, were found to accurately encode the attack matrices, hence they went undetected.
This paper presents a methodology for utilizing Phasor Measurement units (PMUs) for procuring real time synchronized measurements for assessing the security of the power system dynamically. The concept of wide-area dynamic security assessment considers transient instability in the proposed methodology. Intelligent framework based approach for online dynamic security assessment has been suggested wherein the database consisting of critical features associated with the system is generated for a wide range of contingencies, which is utilized to build the data mining model. This data mining model along with the synchronized phasor measurements is expected to assist the system operator in assessing the security of the system pertaining to a particular contingency, thereby also creating possibility of incorporating control and preventive measures in order to avoid any unforeseen instability in the system. The proposed technique has been implemented on IEEE 39 bus system for accurately indicating the security of the system and is found to be quite robust in the case of noise in the measurement data obtained from the PMUs.
Nowadays, mobile devices have become one of the most popular instruments used by a person on its regular life, mainly due to the importance of their applications. In that context, mobile devices store user's personal information and even more data, becoming a personal tracker for daily activities that provides important information about the user. Derived from this gathering of information, many tools are available to use on mobile devices, with the restrain that each tool only provides isolated information about a specific application or activity. Therefore, the present work proposes a tool that allows investigators to obtain a complete report and timeline of the activities that were performed on the device. This report incorporates the information provided by many sources into a unique set of data. Also, by means of an example, it is presented the operation of the solution, which shows the feasibility in the use of this tool and shows the way in which investigators have to apply the tool.
The era of information technology has, unfortunately, contributed to the tremendous rise in the number of criminal activities. However, digital artifacts can be utilized in convicting cybercriminal and exposing their activities. The digital forensics science concerns about all aspects related to cybercrimes. It seeks digital evidence by following standard methodologies to be admitted in court rooms. This paper concerns about memory forensics for the unique artifacts it holds. Memory contains information about the current state of systems and applications. Moreover, an application's data explains how a criminal has been interacting the application just before the memory is acquired. Memory forensics at the application level is currently random and cumbersome. Targeting specific applications is what forensic researchers and practitioner are currently striving to provide. This paper suggests a general solution to investigate any application. Our solution aims to utilize an application's data structures and variables' information in the investigation process. This is because an application's data has to be stored and retrieved in the means of variables. Data structures and variables' information can be generated by compilers for debugging purposes. We show that an application's information is a valuable resource to the investigator.
This article shows the analogy between natural language texts and quantum-like systems on the example of the Bell test calculating. The applicability of the well-known Bell test for texts in Russian is investigated. The possibility of using this test for the text separation on the topics corresponding to the user query in information retrieval system is shown.
Social media has been one of the most efficacious and precise by speakers of public opinion. A strategy which sanctions the utilization and illustration of twitter data to conclude public conviction is discussed in this paper. Sentiments on exclusive entities with diverse strengths and intenseness are stated by public, where these sentiments are strenuously cognate to their personal mood and emotions. To examine the sentiments from natural language texts, addressing various opinions, a lot of methods and lexical resources have been propounded. A path for boosting twitter sentiment classification using various sentiment proportions as meta-level features has been proposed by this article. Analysis of tweets was done on the product iPhone 6.
Every day, huge amounts of unstructured text is getting generated. Most of this data is in the form of essays, research papers, patents, scholastic articles, book chapters etc. Many plagiarism softwares are being developed to be used in order to reduce the stealing and plagiarizing of Intellectual Property (IP). Current plagiarism softwares are mainly using string matching algorithms to detect copying of text from another source. The drawback of some of such plagiarism softwares is their inability to detect plagiarism when the structure of the sentence is changed. Replacement of keywords by their synonyms also fails to be detected by these softwares. This paper proposes a new method to detect such plagiarism using semantic knowledge graphs. The method uses Named Entity Recognition as well as semantic similarity between sentences to detect possible cases of plagiarism. The doubtful cases are visualized using semantic Knowledge Graphs for thorough analysis of authenticity. Rules for active and passive voice have also been considered in the proposed methodology.
One of the challenges in supplying the communities with wider access to scientific databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it might not be practical to make the public aware of the structure of databases. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide a more intuitive method for generating database queries and delivering responses. Through social media, which makes it possible to interact with a wide section of the population, and with the help of natural language processing, researchers at the Atmospheric Radiation Measurement (ARM) Data Center at Oak Ridge National Laboratory (ORNL) have developed a concept to enable easy search and retrieval of data from several environmental data centers for the scientific community through social media.Using a machine learning framework that maps natural language text to thousands of datasets, instruments, variables, and data streams, the prototype system would allow users to request data through Twitter and receive a link (via tweet) to applicable data results on the project's search catalog tailored to their key words. This automated identification of relevant data from various petascale archives at ORNL could increase convenience, access, and use of the project's data by the broader community. In this paper we discuss how some data-intensive projects at ORNL are using innovative ways to help in data discovery.
In this paper we solve the problem of neural network technology development for e-mail messages classification. We analyze basic methods of spam filtering such as a sender IP-address analysis, spam messages repeats detection and the Bayesian filtering according to words. We offer the neural network technology for solving this problem because the neural networks are universal approximators and effective in addressing the problems of classification. Also, we offer the scheme of this technology for e-mail messages “spam”/“not spam” classification. The creation of effective neural network model of spam filtering is performed within the databases knowledge discovery technology. For this training set is formed, the neural network model is trained, its value and classifying ability are estimated. The experimental studies have shown that a developed artificial neural network model is adequate and it can be effectively used for the e-mail messages classification. Thus, in this paper we have shown the possibility of the effective neural network model use for the e-mail messages filtration and have shown a scheme of artificial neural network model use as a part of the e-mail spam filtering intellectual system.
Inference of unknown opinions with uncertain, adversarial (e.g., incorrect or conflicting) evidence in large datasets is not a trivial task. Without proper handling, it can easily mislead decision making in data mining tasks. In this work, we propose a highly scalable opinion inference probabilistic model, namely Adversarial Collective Opinion Inference (Adv-COI), which provides a solution to infer unknown opinions with high scalability and robustness under the presence of uncertain, adversarial evidence by enhancing Collective Subjective Logic (CSL) which is developed by combining SL and Probabilistic Soft Logic (PSL). The key idea behind the Adv-COI is to learn a model of robust ways against uncertain, adversarial evidence which is formulated as a min-max problem. We validate the out-performance of the Adv-COI compared to baseline models and its competitive counterparts under possible adversarial attacks on the logic-rule based structured data and white and black box adversarial attacks under both clean and perturbed semi-synthetic and real-world datasets in three real world applications. The results show that the Adv-COI generates the lowest mean absolute error in the expected truth probability while producing the lowest running time among all.
Modern Browsers have become sophisticated applications, providing a portal to the web. Browsers host a complex mix of interpreters such as HTML and JavaScript, allowing not only useful functionality but also malicious activities, known as browser-hijacking. These attacks can be particularly difficult to detect, as they usually operate within the scope of normal browser behaviour. CryptoJacking is a form of browser-hijacking that has emerged as a result of the increased popularity and profitability of cryptocurrencies, and the introduction of new cryptocurrencies that promote CPU-based mining. This paper proposes MANiC (Multi-step AssessmeNt for Crypto-miners), a system to detect CryptoJacking websites. It uses regular expressions that are compiled in accordance with the API structure of different miner families. This allows the detection of crypto-mining scripts and the extraction of parameters that could be used to detect suspicious behaviour associated with CryptoJacking. When MANiC was used to analyse the Alexa top 1m websites, it detected 887 malicious URLs containing miners from 11 different families and demonstrated favourable results when compared to related CryptoJacking research. We demonstrate that MANiC can be used to provide insights into this new threat, to identify new potential features of interest and to establish a ground-truth dataset, assisting future research.
Phishing is the major problem of the internet era. In this era of internet the security of our data in web is gaining an increasing importance. Phishing is one of the most harmful ways to unknowingly access the credential information like username, password or account number from the users. Users are not aware of this type of attack and later they will also become a part of the phishing attacks. It may be the losses of financial found, personal information, reputation of brand name or trust of brand. So the detection of phishing site is necessary. In this paper we design a framework of phishing detection using URL.