Visible to the public Biblio

Found 789 results

Filters: Keyword is learning (artificial intelligence)  [Clear All Filters]
2019-06-10
Sokolov, A. N., Pyatnitsky, I. A., Alabugin, S. K..  2018.  Research of Classical Machine Learning Methods and Deep Learning Models Effectiveness in Detecting Anomalies of Industrial Control System. 2018 Global Smart Industry Conference (GloSIC). :1-6.

Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.

Farooq, H. M., Otaibi, N. M..  2018.  Optimal Machine Learning Algorithms for Cyber Threat Detection. 2018 UKSim-AMSS 20th International Conference on Computer Modelling and Simulation (UKSim). :32-37.

With the exponential hike in cyber threats, organizations are now striving for better data mining techniques in order to analyze security logs received from their IT infrastructures to ensure effective and automated cyber threat detection. Machine Learning (ML) based analytics for security machine data is the next emerging trend in cyber security, aimed at mining security data to uncover advanced targeted cyber threats actors and minimizing the operational overheads of maintaining static correlation rules. However, selection of optimal machine learning algorithm for security log analytics still remains an impeding factor against the success of data science in cyber security due to the risk of large number of false-positive detections, especially in the case of large-scale or global Security Operations Center (SOC) environments. This fact brings a dire need for an efficient machine learning based cyber threat detection model, capable of minimizing the false detection rates. In this paper, we are proposing optimal machine learning algorithms with their implementation framework based on analytical and empirical evaluations of gathered results, while using various prediction, classification and forecasting algorithms.

Eziama, E., Jaimes, L. M. S., James, A., Nwizege, K. S., Balador, A., Tepe, K..  2018.  Machine Learning-Based Recommendation Trust Model for Machine-to-Machine Communication. 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). :1-6.

The Machine Type Communication Devices (MTCDs) are usually based on Internet Protocol (IP), which can cause billions of connected objects to be part of the Internet. The enormous amount of data coming from these devices are quite heterogeneous in nature, which can lead to security issues, such as injection attacks, ballot stuffing, and bad mouthing. Consequently, this work considers machine learning trust evaluation as an effective and accurate option for solving the issues associate with security threats. In this paper, a comparative analysis is carried out with five different machine learning approaches: Naive Bayes (NB), Decision Tree (DT), Linear and Radial Support Vector Machine (SVM), KNearest Neighbor (KNN), and Random Forest (RF). As a critical element of the research, the recommendations consider different Machine-to-Machine (M2M) communication nodes with regard to their ability to identify malicious and honest information. To validate the performances of these models, two trust computation measures were used: Receiver Operating Characteristics (ROCs), Precision and Recall. The malicious data was formulated in Matlab. A scenario was created where 50% of the information were modified to be malicious. The malicious nodes were varied in the ranges of 10%, 20%, 30%, 40%, and the results were carefully analyzed.

Su, H., Zwolinski, M., Halak, B..  2018.  A Machine Learning Attacks Resistant Two Stage Physical Unclonable Functions Design. 2018 IEEE 3rd International Verification and Security Workshop (IVSW). :52-55.

Physical Unclonable Functions (PUFs) have been designed for many security applications such as identification, authentication of devices and key generation, especially for lightweight electronics. Traditional approaches to enhancing security, such as hash functions, may be expensive and resource dependent. However, modelling attacks using machine learning (ML) show the vulnerability of most PUFs. In this paper, a combination of a 32-bit current mirror and 16-bit arbiter PUFs in 65nm CMOS technology is proposed to improve resilience against modelling attacks. Both PUFs are vulnerable to machine learning attacks and we reduce the output prediction rate from 99.2% and 98.8% individually, to 60%.

Liu, D., Li, Y., Tang, Y., Wang, B., Xie, W..  2018.  VMPBL: Identifying Vulnerable Functions Based on Machine Learning Combining Patched Information and Binary Comparison Technique by LCS. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :800-807.

Nowadays, most vendors apply the same open source code to their products, which is dangerous. In addition, when manufacturers release patches, they generally hide the exact location of the vulnerabilities. So, identifying vulnerabilities in binaries is crucial. However, just searching source program has a lower identifying accuracy of vulnerability, which requires operators further to differentiate searched results. Under this context, we propose VMPBL to enhance identifying the accuracy of vulnerability with the help of patch files. VMPBL, compared with other proposed schemes, uses patched functions according to its vulnerable functions in patch file to further distinguish results. We establish a prototype of VMPBL, which can effectively identify vulnerable function types and get rid of safe functions from results. Firstly, we get the potential vulnerable-patched functions by binary comparison technique based on K-Trace algorithm. Then we combine the functions with vulnerability and patch knowledge database to classify these function pairs and identify the possible vulnerable functions and the vulnerability types. Finally, we test some programs containing real-world CWE vulnerabilities, and one of the experimental results about CWE415 shows that the results returned from only searching source program are about twice as much as the results from VMPBL. We can see that using VMPBL can significantly reduce the false positive rate of discovering vulnerabilities compared with analyzing source files alone.

Ponmaniraj, S., Rashmi, R., Anand, M. V..  2018.  IDS Based Network Security Architecture with TCP/IP Parameters Using Machine Learning. 2018 International Conference on Computing, Power and Communication Technologies (GUCON). :111-114.

This computer era leads human to interact with computers and networks but there is no such solution to get rid of security problems. Securities threats misleads internet, we are sometimes losing our hope and reliability with many server based access. Even though many more crypto algorithms are coming for integrity and authentic data in computer access still there is a non reliable threat penetrates inconsistent vulnerabilities in networks. These vulnerable sites are taking control over the user's computer and doing harmful actions without user's privileges. Though Firewalls and protocols may support our browsers via setting certain rules, still our system couldn't support for data reliability and confidentiality. Since these problems are based on network access, lets we consider TCP/IP parameters as a dataset for analysis. By doing preprocess of TCP/IP packets we can build sovereign model on data set and clump cluster. Further the data set gets classified into regular traffic pattern and anonymous pattern using KNN classification algorithm. Based on obtained pattern for normal and threats data sets, security devices and system will set rules and guidelines to learn by it to take needed stroke. This paper analysis the computer to learn security actions from the given data sets which already exist in the previous happens.

2019-05-08
Meng, F., Lou, F., Fu, Y., Tian, Z..  2018.  Deep Learning Based Attribute Classification Insider Threat Detection for Data Security. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). :576–581.

With the evolution of network threat, identifying threat from internal is getting more and more difficult. To detect malicious insiders, we move forward a step and propose a novel attribute classification insider threat detection method based on long short term memory recurrent neural networks (LSTM-RNNs). To achieve high detection rate, event aggregator, feature extractor, several attribute classifiers and anomaly calculator are seamlessly integrated into an end-to-end detection framework. Using the CERT insider threat dataset v6.2 and threat detection recall as our performance metric, experimental results validate that the proposed threat detection method greatly outperforms k-Nearest Neighbor, Isolation Forest, Support Vector Machine and Principal Component Analysis based threat detection methods.

Basu, S., Chua, Y. H. Victoria, Lee, M. Wah, Lim, W. G., Maszczyk, T., Guo, Z., Dauwels, J..  2018.  Towards a data-driven behavioral approach to prediction of insider-threat. 2018 IEEE International Conference on Big Data (Big Data). :4994–5001.

Insider threats pose a challenge to all companies and organizations. Identification of culprit after an attack is often too late and result in detrimental consequences for the organization. Majority of past research on insider threat has focused on post-hoc personality analysis of known insider threats to identify personality vulnerabilities. It has been proposed that certain personality vulnerabilities place individuals to be at risk to perpetuating insider threats should the environment and opportunity arise. To that end, this study utilizes a game-based approach to simulate a scenario of intellectual property theft and investigate behavioral and personality differences of individuals who exhibit insider-threat related behavior. Features were extracted from games, text collected through implicit and explicit measures, simultaneous facial expression recordings, and personality variables (HEXACO, Dark Triad and Entitlement Attitudes) calculated from questionnaire. We applied ensemble machine learning algorithms and show that they produce an acceptable balance of precision and recall. Our results showcase the possibility of harnessing personality variables, facial expressions and linguistic features in the modeling and prediction of insider-threat.

2019-05-01
Li, P., Liu, Q., Zhao, W., Wang, D., Wang, S..  2018.  Chronic Poisoning against Machine Learning Based IDSs Using Edge Pattern Detection. 2018 IEEE International Conference on Communications (ICC). :1-7.

In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). Poisoning attack, which is one of the most recognized security threats towards machine learning- based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.

Lu, X., Wan, X., Xiao, L., Tang, Y., Zhuang, W..  2018.  Learning-Based Rogue Edge Detection in VANETs with Ambient Radio Signals. 2018 IEEE International Conference on Communications (ICC). :1-6.
Edge computing for mobile devices in vehicular ad hoc networks (VANETs) has to address rogue edge attacks, in which a rogue edge node claims to be the serving edge in the vehicle to steal user secrets and help launch other attacks such as man-in-the-middle attacks. Rogue edge detection in VANETs is more challenging than the spoofing detection in indoor wireless networks due to the high mobility of onboard units (OBUs) and the large-scale network infrastructure with roadside units (RSUs). In this paper, we propose a physical (PHY)- layer rogue edge detection scheme for VANETs according to the shared ambient radio signals observed during the same moving trace of the mobile device and the serving edge in the same vehicle. In this scheme, the edge node under test has to send the physical properties of the ambient radio signals, including the received signal strength indicator (RSSI) of the ambient signals with the corresponding source media access control (MAC) address during a given time slot. The mobile device can choose to compare the received ambient signal properties and its own record or apply the RSSI of the received signals to detect rogue edge attacks, and determines test threshold in the detection. We adopt a reinforcement learning technique to enable the mobile device to achieve the optimal detection policy in the dynamic VANET without being aware of the VANET model and the attack model. Simulation results show that the Q-learning based detection scheme can significantly reduce the detection error rate and increase the utility compared with existing schemes.
2019-04-05
Vastel, A., Laperdrix, P., Rudametkin, W., Rouvoy, R..  2018.  FP-STALKER: Tracking Browser Fingerprint Evolutions. 2018 IEEE Symposium on Security and Privacy (SP). :728-741.
Browser fingerprinting has emerged as a technique to track users without their consent. Unlike cookies, fingerprinting is a stateless technique that does not store any information on devices, but instead exploits unique combinations of attributes handed over freely by browsers. The uniqueness of fingerprints allows them to be used for identification. However, browser fingerprints change over time and the effectiveness of tracking users over longer durations has not been properly addressed. In this paper, we show that browser fingerprints tend to change frequently-from every few hours to days-due to, for example, software updates or configuration changes. Yet, despite these frequent changes, we show that browser fingerprints can still be linked, thus enabling long-term tracking. FP-STALKER is an approach to link browser fingerprint evolutions. It compares fingerprints to determine if they originate from the same browser. We created two variants of FP-STALKER, a rule-based variant that is faster, and a hybrid variant that exploits machine learning to boost accuracy. To evaluate FP-STALKER, we conduct an empirical study using 98,598 fingerprints we collected from 1, 905 distinct browser instances. We compare our algorithm with the state of the art and show that, on average, we can track browsers for 54.48 days, and 26 % of browsers can be tracked for more than 100 days.
Bapat, R., Mandya, A., Liu, X., Abraham, B., Brown, D. E., Kang, H., Veeraraghavan, M..  2018.  Identifying Malicious Botnet Traffic Using Logistic Regression. 2018 Systems and Information Engineering Design Symposium (SIEDS). :266-271.

An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).

Dong, X., Hu, J., Cui, Y..  2018.  Overview of Botnet Detection Based on Machine Learning. 2018 3rd International Conference on Mechanical, Control and Computer Engineering (ICMCCE). :476-479.

With the rapid development of the information industry, the applications of Internet of things, cloud computing and artificial intelligence have greatly affected people's life, and the network equipment has increased with a blowout type. At the same time, more complex network environment has also led to a more serious network security problem. The traditional security solution becomes inefficient in the new situation. Therefore, it is an important task for the security industry to seek technical progress and improve the protection detection and protection ability of the security industry. Botnets have been one of the most important issues in many network security problems, especially in the last one or two years, and China has become one of the most endangered countries by botnets, thus the huge impact of botnets in the world has caused its detection problems to reset people's attention. This paper, based on the topic of botnet detection, focuses on the latest research achievements of botnet detection based on machine learning technology. Firstly, it expounds the application process of machine learning technology in the research of network space security, introduces the structure characteristics of botnet, and then introduces the machine learning in botnet detection. The security features of these solutions and the commonly used machine learning algorithms are emphatically analyzed and summarized. Finally, it summarizes the existing problems in the existing solutions, and the future development direction and challenges of machine learning technology in the research of network space security.

Chen, S., Chen, Y., Tzeng, W..  2018.  Effective Botnet Detection Through Neural Networks on Convolutional Features. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :372-378.

Botnet is one of the major threats on the Internet for committing cybercrimes, such as DDoS attacks, stealing sensitive information, spreading spams, etc. It is a challenging issue to detect modern botnets that are continuously improving for evading detection. In this paper, we propose a machine learning based botnet detection system that is shown to be effective in identifying P2P botnets. Our approach extracts convolutional version of effective flow-based features, and trains a classification model by using a feed-forward artificial neural network. The experimental results show that the accuracy of detection using the convolutional features is better than the ones using the traditional features. It can achieve 94.7% of detection accuracy and 2.2% of false positive rate on the known P2P botnet datasets. Furthermore, our system provides an additional confidence testing for enhancing performance of botnet detection. It further classifies the network traffic of insufficient confidence in the neural network. The experiment shows that this stage can increase the detection accuracy up to 98.6% and decrease the false positive rate up to 0.5%.

2019-04-01
Wang, M., Yang, Y., Zhu, M., Liu, J..  2018.  CAPTCHA Identification Based on Convolution Neural Network. 2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC). :364–368.
The CAPTCHA is an effective method commonly used in live interactive proofs on the Internet. The widely used CAPTCHAs are text-based schemes. In this paper, we document how we have broken such text-based scheme used by a website CAPTCHA. We use the sliding window to segment 1001 pieces of CAPTCHA to get 5900 images with single-character useful information, a total of 25 categories. In order to make the convolution neural network learn more image features, we augmented the data set to get 129924 pictures. The data set is trained and tested in AlexNet and GoogLeNet to get the accuracy of 87.45% and 98.92%, respectively. The experiment shows that the optimized network parameters can make the accuracy rate up to 92.7% in AlexNet and 98.96% in GoogLeNet.
Stein, G., Peng, Q..  2018.  Low-Cost Breaking of a Unique Chinese Language CAPTCHA Using Curriculum Learning and Clustering. 2018 IEEE International Conference on Electro/Information Technology (EIT). :0595–0600.

Text-based CAPTCHAs are still commonly used to attempt to prevent automated access to web services. By displaying an image of distorted text, they attempt to create a challenge image that OCR software can not interpret correctly, but a human user can easily determine the correct response to. This work focuses on a CAPTCHA used by a popular Chinese language question-and-answer website and how resilient it is to modern machine learning methods. While the majority of text-based CAPTCHAs focus on transcription tasks, the CAPTCHA solved in this work is based on localization of inverted symbols in a distorted image. A convolutional neural network (CNN) was created to evaluate the likelihood of a region in the image belonging to an inverted character. It is used with a feature map and clustering to identify potential locations of inverted characters. Training of the CNN was performed using curriculum learning and compared to other potential training methods. The proposed method was able to determine the correct response in 95.2% of cases of a simulated CAPTCHA and 67.6% on a set of real CAPTCHAs. Potential methods to increase difficulty of the CAPTCHA and the success rate of the automated solver are considered.

Hu, Y., Chen, L., Cheng, J..  2018.  A CAPTCHA recognition technology based on deep learning. 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). :617–620.
Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is an important human-machine distinction technology for website to prevent the automatic malicious program attack. CAPTCHA recognition studies can find security breaches in CAPTCHA, improve CAPTCHA technology, it can also promote the technologies of license plate recognition and handwriting recognition. This paper proposed a method based on Convolutional Neural Network (CNN) model to identify CAPTCHA and avoid the traditional image processing technology such as location and segmentation. The adaptive learning rate is introduced to accelerate the convergence rate of the model, and the problem of over-fitting and local optimal solution has been solved. The multi task joint training model is used to improve the accuracy and generalization ability of model recognition. The experimental results show that the model has a good recognition effect on CAPTCHA with background noise and character adhesion distortion.
Zhang, T., Zheng, H., Zhang, L..  2018.  Verification CAPTCHA Based on Deep Learning. 2018 37th Chinese Control Conference (CCC). :9056–9060.
At present, the captcha is widely used in the Internet. The method of captcha recognition using the convolutional neural networks was introduced in this paper. It was easier to apply the convolution neural network model of simple training to segment the captcha, and the network structure was established imitating VGGNet model. and the correct rate can be reached more than 90%. For the more difficult segmentation captcha, it can be used the end-to-end thought to the captcha as a whole to training, In this way, the recognition rate of the more difficult segmentation captcha can be reached about 85%.
2019-03-28
McDermott, C. D., Petrovski, A. V., Majdani, F..  2018.  Towards Situational Awareness of Botnet Activity in the Internet of Things. 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1-8.
The following topics are dealt with: security of data; risk management; decision making; computer crime; invasive software; critical infrastructures; data privacy; insurance; Internet of Things; learning (artificial intelligence).
2019-03-25
Mamdouh, M., Elrukhsi, M. A. I., Khattab, A..  2018.  Securing the Internet of Things and Wireless Sensor Networks via Machine Learning: A Survey. 2018 International Conference on Computer and Applications (ICCA). :215–218.

The Internet of Things (IoT) is the network where physical devices, sensors, appliances and other different objects can communicate with each other without the need for human intervention. Wireless Sensor Networks (WSNs) are main building blocks of the IoT. Both the IoT and WSNs have many critical and non-critical applications that touch almost every aspect of our modern life. Unfortunately, these networks are prone to various types of security threats. Therefore, the security of IoT and WSNs became crucial. Furthermore, the resource limitations of the devices used in these networks complicate the problem. One of the most recent and effective approaches to address such challenges is machine learning. Machine learning inspires many solutions to secure the IoT and WSNs. In this paper, we survey the different threats that can attack both IoT and WSNs and the machine learning techniques developed to counter them.

Ali-Tolppa, J., Kocsis, S., Schultz, B., Bodrog, L., Kajo, M..  2018.  SELF-HEALING AND RESILIENCE IN FUTURE 5G COGNITIVE AUTONOMOUS NETWORKS. 2018 ITU Kaleidoscope: Machine Learning for a 5G Future (ITU K). :1–8.
In the Self-Organizing Networks (SON) concept, self-healing functions are used to detect, diagnose and correct degraded states in the managed network functions or other resources. Such methods are increasingly important in future network deployments, since ultra-high reliability is one of the key requirements for the future 5G mobile networks, e.g. in critical machine-type communication. In this paper, we discuss the considerations for improving the resiliency of future cognitive autonomous mobile networks. In particular, we present an automated anomaly detection and diagnosis function for SON self-healing based on multi-dimensional statistical methods, case-based reasoning and active learning techniques. Insights from both the human expert and sophisticated machine learning methods are combined in an iterative way. Additionally, we present how a more holistic view on mobile network self-healing can improve its performance.
2019-03-22
Teoh, T. T., Chiew, G., Franco, E. J., Ng, P. C., Benjamin, M. P., Goh, Y. J..  2018.  Anomaly Detection in Cyber Security Attacks on Networks Using MLP Deep Learning. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). :1-5.

Malicious traffic has garnered more attention in recent years, owing to the rapid growth of information technology in today's world. In 2007 alone, an estimated loss of 13 billion dollars was made from malware attacks. Malware data in today's context is massive. To understand such information using primitive methods would be a tedious task. In this publication we demonstrate some of the most advanced deep learning techniques available, multilayer perceptron (MLP) and J48 (also known as C4.5 or ID3) on our selected dataset, Advanced Security Network Metrics & Non-Payload-Based Obfuscations (ASNM-NPBO) to show that the answer to managing cyber security threats lie in the fore-mentioned methodologies.

Obert, J., Chavez, A., Johnson, J..  2018.  Behavioral Based Trust Metrics and the Smart Grid. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1490-1493.

To ensure reliable and predictable service in the electrical grid it is important to gauge the level of trust present within critical components and substations. Although trust throughout a smart grid is temporal and dynamically varies according to measured states, it is possible to accurately formulate communications and service level strategies based on such trust measurements. Utilizing an effective set of machine learning and statistical methods, it is shown that establishment of trust levels between substations using behavioral pattern analysis is possible. It is also shown that the establishment of such trust can facilitate simple secure communications routing between substations.

Duan, J., Zeng, Z., Oprea, A., Vasudevan, S..  2018.  Automated Generation and Selection of Interpretable Features for Enterprise Security. 2018 IEEE International Conference on Big Data (Big Data). :1258-1265.

We present an effective machine learning method for malicious activity detection in enterprise security logs. Our method involves feature engineering, or generating new features by applying operators on features of the raw data. We generate DNF formulas from raw features, extract Boolean functions from them, and leverage Fourier analysis to generate new parity features and rank them based on their highest Fourier coefficients. We demonstrate on real enterprise data sets that the engineered features enhance the performance of a wide range of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall, while sacrificing no more than 0.47% in accuracy. We also observe better isolation of malicious clusters, when performing clustering on engineered features. In general, a small number of engineered features achieve higher performance than raw data features according to our metrics of interest. Our feature engineering method also retains interpretability, an important consideration in cyber security applications.

2019-03-15
Xue, M., Bian, R., Wang, J., Liu, W..  2018.  A Co-Training Based Hardware Trojan Detection Technique by Exploiting Unlabeled ICs and Inaccurate Simulation Models. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1452-1457.

Integrated circuits (ICs) are becoming vulnerable to hardware Trojans. Most of existing works require golden chips to provide references for hardware Trojan detection. However, a golden chip is extremely difficult to obtain. In previous work, we have proposed a classification-based golden chips-free hardware Trojan detection technique. However, the algorithm in the previous work are trained by simulated ICs without considering that there may be a shift which occurs between the simulation and the silicon fabrication. It is necessary to learn from actual silicon fabrication in order to obtain an accurate and effective classification model. We propose a co-training based hardware Trojan detection technique exploiting unlabeled fabricated ICs and inaccurate simulation models, to provide reliable detection capability when facing fabricated ICs, while eliminating the need of fabricated golden chips. First, we train two classification algorithms using simulated ICs. During test-time, the two algorithms can identify different patterns in the unlabeled ICs, and thus be able to label some of these ICs for the further training of the another algorithm. Moreover, we use a statistical examination to choose ICs labeling for the another algorithm in order to help prevent a degradation in performance due to the increased noise in the labeled ICs. We also use a statistical technique for combining the hypotheses from the two classification algorithms to obtain the final decision. The theoretical basis of why the co-training method can work is also described. Experiment results on benchmark circuits show that the proposed technique can detect unknown Trojans with high accuracy (92% 97%) and recall (88% 95%).