Biblio
In recent years, there is a surge of interest in approaches pertaining to security issues of Internet of Things deployments and applications that leverage machine learning and deep learning techniques. A key prerequisite for enabling such approaches is the development of scalable infrastructures for collecting and processing security-related datasets from IoT systems and devices. This paper introduces such a scalable and configurable data collection infrastructure for data-driven IoT security. It emphasizes the collection of (security) data from different elements of IoT systems, including individual devices and smart objects, edge nodes, IoT platforms, and entire clouds. The scalability of the introduced infrastructure stems from the integration of state of the art technologies for large scale data collection, streaming and storage, while its configurability relies on an extensible approach to modelling security data from a variety of IoT systems and devices. The approach enables the instantiation and deployment of security data collection systems over complex IoT deployments, which is a foundation for applying effective security analytics algorithms towards identifying threats, vulnerabilities and related attack patterns.
The technological development of the energy sector also produced complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing which areas of the smart grid system use big data technologies and technologies, big data technologies for detecting smart grid attacks have received attention. Big data analytics can produce efficient solutions and it is especially important to choose which algorithms and metrics to use. For this reason, an application prototype has been proposed that uses a big data method to detect attacks on the smart grid. The algorithm with high accuracy was determined to be 92% for random forests and 87% for decision trees.
In recent years, various cloud-based services have been introduced in our daily lives, and information security is now an important topic for protecting the users. In the literature, many technologies have been proposed and incorporated into different services. Data hiding or steganography is a data protection technology, and images are often used as the cover data. On the other hand, steganalysis is an important tool to test the security strength of a steganography technique. So far, steganalysis has been used mainly for detecting the existence of secret data given an image, i.e., to classify if the given image is a normal or a stego image. In this paper, we investigate the possibility of identifying the locations of the embedded data if the a given image is suspected to be a stego image. The purpose is of two folds. First, we would like to confirm the decision made by the first level steganalysis; and the second is to provide a way to guess the size of the embedded data. Our experimental results show that in most cases the embedding positions can be detected. This result can be useful for developing more secure steganography technologies.
The early discovery of security bugs in JavaScript (JS) engines is crucial for protecting Internet users from adversaries abusing zero-day vulnerabilities. Browser vendors, bug bounty hunters, and security researchers have been eager to find such security bugs by leveraging state-of-the-art fuzzers as well as their domain expertise. They report a bug when observing a crash after executing their JS test since a crash is an early indicator of a potential bug. However, it is difficult to identify whether such a crash indeed invokes security bugs in JS engines. Thus, unskilled bug reporters are unable to assess the security severity of their new bugs with JS engine crashes. Today, this classification of a reported security bug is completely manual, depending on the verdicts from JS engine vendors. We investigated the feasibility of applying various machine learning classifiers to determine whether an observed crash triggers a security bug. We designed and implemented CRScope, which classifies security and non-security bugs from given crash-dump files. Our experimental results on 766 crash instances demonstrate that CRScope achieved 0.85, 0.89, and 0.93 Area Under Curve (AUC) for Chakra, V8, and SpiderMonkey crashes, respectively. CRScope also achieved 0.84, 0.89, and 0.95 precision for Chakra, V8, and SpiderMonkey crashes, respectively. This outperforms the previous study and existing tools including Exploitable and AddressSanitizer. CRScope is capable of learning domain-specific expertise from the past verdicts on reported bugs and automatically classifying JS engine security bugs, which helps improve the scalable classification of security bugs.
With the spread of wireless application, huge amount of data is generated every day. Thanks to its elasticity, machine learning is becoming a fundamental brick in this field, and many of applications are developed with the use of it and the several techniques that it offers. However, machine learning suffers on different problems and people that use it often are not aware of the possible threats. Often, an adversary tries to exploit these vulnerabilities in order to obtain benefits; because of this, adversarial machine learning is becoming wide studied in the scientific community. In this paper, we show state-of-the-art adversarial techniques and possible countermeasures, with the aim of warning people regarding sensible argument related to the machine learning.
Collaboration between humans and machines does not necessarily lead to better outcomes.
In recent years, cyber attack techniques are increasingly sophisticated, and blocking the attack is more and more difficult, even if a kind of counter measure or another is taken. In order for a successful handling of this situation, it is crucial to have a prediction of cyber attacks, appropriate precautions, and effective utilization of cyber intelligence that enables these actions. Malicious hackers share various kinds of information through particular communities such as the dark web, indicating that a great deal of intelligence exists in cyberspace. This paper focuses on forums on the dark web and proposes an approach to extract forums which include important information or intelligence from huge amounts of forums and identify traits of each forum using methodologies such as machine learning, natural language processing and so on. This approach will allow us to grasp the emerging threats in cyberspace and take appropriate measures against malicious activities.
This study has built a simulation of a smart home system by the Alibaba ECS. The architecture of hardware was based on edge computing technology. The whole method would design a clear classifier to find the boundary between regular and mutation codes. It could be applied in the detection of the mutation code of network. The project has used the dataset vector to divide them into positive and negative type, and the final result has shown the RBF-function SVM method perform best in this mission. This research has got a good network security detection in the IoT systems and increased the applications of machine learning.
Machine learning has been adopted widely to perform prediction and classification. Implementing machine learning increases security risks when computation process involves sensitive data on training and testing computations. We present a proposed system to protect machine learning engines in IoT environment without modifying internal machine learning architecture. Our proposed system is designed for passwordless and eliminated the third-party in executing machine learning transactions. To evaluate our a proposed system, we conduct experimental with machine learning transactions on IoT board and measure computation time each transaction. The experimental results show that our proposed system can address security issues on machine learning computation with low time consumption.
In the modern day and age, credential based authentication systems no longer provide the level of security that many organisations and their services require. The level of trust in passwords has plummeted in recent years, with waves of cyber attacks predicated on compromised and stolen credentials. This method of authentication is also heavily reliant on the individual user's choice of password. There is the potential to build levels of security on top of credential based authentication systems, using a risk based approach, which preserves the seamless authentication experience for the end user. One method of adding this security to a risk based authentication framework, is keystroke dynamics. Monitoring the behaviour of the users and how they type, produces a type of digital signature which is unique to that individual. Learning this behaviour allows dynamic flags to be applied to anomalous typing patterns that are produced by attackers using stolen credentials, as a potential risk of fraud. Methods from statistics and machine learning have been explored to try and implement such solutions. This paper will look at an Autoencoder model for learning the keystroke dynamics of specific users. The results from this paper show an improvement over the traditional tried and tested statistical approaches with an Equal Error Rate of 6.51%, with the additional benefits of relatively low training times and less reliance on feature engineering.
Recent work proposed the concept of backdoor attacks on deep neural networks (DNNs), where misclassification rules are hidden inside normal models, only to be triggered by very specific inputs. However, these "traditional" backdoors assume a context where users train their own models from scratch, which rarely occurs in practice. Instead, users typically customize "Teacher" models already pretrained by providers like Google, through a process called transfer learning. This customization process introduces significant changes to models and disrupts hidden backdoors, greatly reducing the actual impact of backdoors in practice. In this paper, we describe latent backdoors, a more powerful and stealthy variant of backdoor attacks that functions under transfer learning. Latent backdoors are incomplete backdoors embedded into a "Teacher" model, and automatically inherited by multiple "Student" models through transfer learning. If any Student models include the label targeted by the backdoor, then its customization process completes the backdoor and makes it active. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.
In this paper, we explore the authorship attribution of The Golden Lotus using the traditional machine learning method of text classification. There are four candidate authors: Shizhen Wang, Wei Xu, Kaixian Li and Zhideng Wang. We choose The Golden Lotus's poems and four candidate authors' poems as data set. According to the characteristics of Chinese ancient poem, we choose Chinese character, rhyme, genre and overlapped word as features. We use six supervised machine learning algorithms, including Logistic Regression, Random Forests, Decision Tree and Naive Bayes, SVM and KNN classifiers respectively for text binary classification and multi-classification. According to two experiments results, the style of writing of Wei Xu's poems is the most similar to that of The Golden Lotus. It is proved that among four authors, Wei Xu most likely be the author of The Golden Lotus.