Visible to the public Biblio

Filters: Keyword is machine learning methods  [Clear All Filters]
2020-12-14
Dong, D., Ye, Z., Su, J., Xie, S., Cao, Y., Kochan, R..  2020.  A Malware Detection Method Based on Improved Fireworks Algorithm and Support Vector Machine. 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET). :846–851.
The increasing of malwares has presented a serious threat to the security of computer systems in recent years. Traditional signature-based anti-virus systems are not able to detect metamorphic and previously unseen malwares and it inspires people to use machine learning methods such as Naive Bayes and Decision Tree to identity malicious executables. Among these methods, detecting malwares by using Support Vector Machine (SVM) is one of the most effective approaches. However, the parameters of SVM have serious impacts on its classification performance. In order to find the optimal parameter combination and avoid the problem of falling into local optimal solution, many methods based on evolutionary algorithms are proposed, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Differential Evolution (DE) and others. But these algorithms still face the problem of being trapped into local solution spaces in different degree. In this paper, an improved fireworks algorithm is presented and applied to search parameters of SVM: penalty factor c and kernel function parameter g. To research the performance of the proposed algorithm, numeric experiments are made and compared with some typical algorithms, the experimental results demonstrate it outperforms other algorithms.
2020-07-30
Shey, James, Karimi, Naghmeh, Robucci, Ryan, Patel, Chintan.  2018.  Design-Based Fingerprinting Using Side-Channel Power Analysis for Protection Against IC Piracy. 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). :614—619.

Intellectual property (IP) and integrated circuit (IC) piracy are of increasing concern to IP/IC providers because of the globalization of IC design flow and supply chains. Such globalization is driven by the cost associated with the design, fabrication, and testing of integrated circuits and allows avenues for piracy. To protect the designs against IC piracy, we propose a fingerprinting scheme based on side-channel power analysis and machine learning methods. The proposed method distinguishes the ICs which realize a modified netlist, yet same functionality. Our method doesn't imply any hardware overhead. We specifically focus on the ability to detect minimal design variations, as quantified by the number of logic gates changed. Accuracy of the proposed scheme is greater than 96 percent, and typically 99 percent in detecting one or more gate-level netlist changes. Additionally, the effect of temperature has been investigated as part of this work. Results depict 95.4 percent accuracy in detecting the exact number of gate changes when data and classifier use the same temperature, while training with different temperatures results in 33.6 percent accuracy. This shows the effectiveness of building temperature-dependent classifiers from simulations at known operating temperatures.

2020-07-16
Karadoğan, İsmail, Karci, Ali.  2019.  Detection of Covert Timing Channels with Machine Learning Methods Using Different Window Sizes. 2019 International Artificial Intelligence and Data Processing Symposium (IDAP). :1—5.

In this study, delays between data packets were read by using different window sizes to detect data transmitted from covert timing channel in computer networks, and feature vectors were extracted from them and detection of hidden data by some classification algorithms was achieved with high performance rate.

2019-06-10
Sokolov, A. N., Pyatnitsky, I. A., Alabugin, S. K..  2018.  Research of Classical Machine Learning Methods and Deep Learning Models Effectiveness in Detecting Anomalies of Industrial Control System. 2018 Global Smart Industry Conference (GloSIC). :1-6.

Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.

2019-04-01
Stein, G., Peng, Q..  2018.  Low-Cost Breaking of a Unique Chinese Language CAPTCHA Using Curriculum Learning and Clustering. 2018 IEEE International Conference on Electro/Information Technology (EIT). :0595–0600.

Text-based CAPTCHAs are still commonly used to attempt to prevent automated access to web services. By displaying an image of distorted text, they attempt to create a challenge image that OCR software can not interpret correctly, but a human user can easily determine the correct response to. This work focuses on a CAPTCHA used by a popular Chinese language question-and-answer website and how resilient it is to modern machine learning methods. While the majority of text-based CAPTCHAs focus on transcription tasks, the CAPTCHA solved in this work is based on localization of inverted symbols in a distorted image. A convolutional neural network (CNN) was created to evaluate the likelihood of a region in the image belonging to an inverted character. It is used with a feature map and clustering to identify potential locations of inverted characters. Training of the CNN was performed using curriculum learning and compared to other potential training methods. The proposed method was able to determine the correct response in 95.2% of cases of a simulated CAPTCHA and 67.6% on a set of real CAPTCHAs. Potential methods to increase difficulty of the CAPTCHA and the success rate of the automated solver are considered.

2019-02-25
Karamollaoglu, H., Dogru, İ A., Dorterler, M..  2018.  Detection of Spam E-mails with Machine Learning Methods. 2018 Innovations in Intelligent Systems and Applications Conference (ASYU). :1–5.

E-mail communication is one of today's indispensable communication ways. The widespread use of email has brought about some problems. The most important one of these problems are spam (unwanted) e-mails, often composed of advertisements or offensive content, sent without the recipient's request. In this study, it is aimed to analyze the content information of e-mails written in Turkish with the help of Naive Bayes Classifier and Vector Space Model from machine learning methods, to determine whether these e-mails are spam e-mails and classify them. Both methods are subjected to different evaluation criteria and their performances are compared.

2017-12-12
Nadgowda, S., Duri, S., Isci, C., Mann, V..  2017.  Columbus: Filesystem Tree Introspection for Software Discovery. 2017 IEEE International Conference on Cloud Engineering (IC2E). :67–74.

Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.

2017-03-08
Çeker, H., Upadhyaya, S..  2015.  Enhanced recognition of keystroke dynamics using Gaussian mixture models. MILCOM 2015 - 2015 IEEE Military Communications Conference. :1305–1310.

Keystroke dynamics is a form of behavioral biometrics that can be used for continuous authentication of computer users. Many classifiers have been proposed for the analysis of acquired user patterns and verification of users at computer terminals. The underlying machine learning methods that use Gaussian density estimator for outlier detection typically assume that the digraph patterns in keystroke data are generated from a single Gaussian distribution. In this paper, we relax this assumption by allowing digraphs to fit more than one distribution via the Gaussian Mixture Model (GMM). We have conducted an experiment with a public data set collected in a controlled environment. Out of 30 users with dynamic text, we obtain 0.08% Equal Error Rate (EER) with 2 components by using GMM, while pure Gaussian yields 1.3% EER for the same data set (an improvement of EER by 93.8%). Our results show that GMM can recognize keystroke dynamics more precisely and authenticate users with higher confidence level.