Biblio
Intellectual property (IP) and integrated circuit (IC) piracy are of increasing concern to IP/IC providers because of the globalization of IC design flow and supply chains. Such globalization is driven by the cost associated with the design, fabrication, and testing of integrated circuits and allows avenues for piracy. To protect the designs against IC piracy, we propose a fingerprinting scheme based on side-channel power analysis and machine learning methods. The proposed method distinguishes the ICs which realize a modified netlist, yet same functionality. Our method doesn't imply any hardware overhead. We specifically focus on the ability to detect minimal design variations, as quantified by the number of logic gates changed. Accuracy of the proposed scheme is greater than 96 percent, and typically 99 percent in detecting one or more gate-level netlist changes. Additionally, the effect of temperature has been investigated as part of this work. Results depict 95.4 percent accuracy in detecting the exact number of gate changes when data and classifier use the same temperature, while training with different temperatures results in 33.6 percent accuracy. This shows the effectiveness of building temperature-dependent classifiers from simulations at known operating temperatures.
In this study, delays between data packets were read by using different window sizes to detect data transmitted from covert timing channel in computer networks, and feature vectors were extracted from them and detection of hidden data by some classification algorithms was achieved with high performance rate.
Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.
Text-based CAPTCHAs are still commonly used to attempt to prevent automated access to web services. By displaying an image of distorted text, they attempt to create a challenge image that OCR software can not interpret correctly, but a human user can easily determine the correct response to. This work focuses on a CAPTCHA used by a popular Chinese language question-and-answer website and how resilient it is to modern machine learning methods. While the majority of text-based CAPTCHAs focus on transcription tasks, the CAPTCHA solved in this work is based on localization of inverted symbols in a distorted image. A convolutional neural network (CNN) was created to evaluate the likelihood of a region in the image belonging to an inverted character. It is used with a feature map and clustering to identify potential locations of inverted characters. Training of the CNN was performed using curriculum learning and compared to other potential training methods. The proposed method was able to determine the correct response in 95.2% of cases of a simulated CAPTCHA and 67.6% on a set of real CAPTCHAs. Potential methods to increase difficulty of the CAPTCHA and the success rate of the automated solver are considered.
E-mail communication is one of today's indispensable communication ways. The widespread use of email has brought about some problems. The most important one of these problems are spam (unwanted) e-mails, often composed of advertisements or offensive content, sent without the recipient's request. In this study, it is aimed to analyze the content information of e-mails written in Turkish with the help of Naive Bayes Classifier and Vector Space Model from machine learning methods, to determine whether these e-mails are spam e-mails and classify them. Both methods are subjected to different evaluation criteria and their performances are compared.
Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.
Keystroke dynamics is a form of behavioral biometrics that can be used for continuous authentication of computer users. Many classifiers have been proposed for the analysis of acquired user patterns and verification of users at computer terminals. The underlying machine learning methods that use Gaussian density estimator for outlier detection typically assume that the digraph patterns in keystroke data are generated from a single Gaussian distribution. In this paper, we relax this assumption by allowing digraphs to fit more than one distribution via the Gaussian Mixture Model (GMM). We have conducted an experiment with a public data set collected in a controlled environment. Out of 30 users with dynamic text, we obtain 0.08% Equal Error Rate (EER) with 2 components by using GMM, while pure Gaussian yields 1.3% EER for the same data set (an improvement of EER by 93.8%). Our results show that GMM can recognize keystroke dynamics more precisely and authenticate users with higher confidence level.