Visible to the public Biblio

Filters: Keyword is feature engineering  [Clear All Filters]
2023-08-23
Liang, Chenjun, Deng, Li, Zhu, Jincan, Cao, Zhen, Li, Chao.  2022.  Cloud Storage I/O Load Prediction Based on XB-IOPS Feature Engineering. 2022 IEEE 8th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :54—60.
With the popularization of cloud computing and the deepening of its application, more and more cloud block storage systems have been put into use. The performance optimization of cloud block storage systems has become an important challenge facing today, which is manifested in the reduction of system performance caused by the unbalanced resource load of cloud block storage systems. Accurately predicting the I/O load status of the cloud block storage system can effectively avoid the load imbalance problem. However, the cloud block storage system has the characteristics of frequent random reads and writes, and a large amount of I/O requests, which makes prediction difficult. Therefore, we propose a novel I/O load prediction method for XB-IOPS feature engineering. The feature engineering is designed according to the I/O request pattern, I/O size and I/O interference, and realizes the prediction of the actual load value at a certain moment in the future and the average load value in the continuous time interval in the future. Validated on a real dataset of Alibaba Cloud block storage system, the results show that the XB-IOPS feature engineering prediction model in this paper has better performance in Alibaba Cloud block storage devices where random I/O and small I/O dominate. The prediction performance is better, and the prediction time is shorter than other prediction models.
2023-07-21
Su, Xiangjing, Zhu, Zheng, Xiao, Shiqu, Fu, Yang, Wu, Yi.  2022.  Deep Neural Network Based Efficient Data Fusion Model for False Data Detection in Power System. 2022 IEEE 6th Conference on Energy Internet and Energy System Integration (EI2). :1462—1466.
Cyberattack on power system brings new challenges on the development of modern power system. Hackers may implement false data injection attack (FDIA) to cause unstable operating conditions of the power system. However, data from different power internet of things usually contains a lot of redundancy, making it difficult for current efficient discriminant model to precisely identify FDIA. To address this problem, we propose a deep learning network-based data fusion model to handle features from measurement data in power system. Proposed model includes a data enrichment module and a data fusion module. We firstly employ feature engineering technique to enrich features from power system operation in time dimension. Subsequently, a long short-term memory based autoencoder (LSTM-AE) is designed to efficiently avoid feature space explosion problem during data enriching process. Extensive experiments are performed on several classical attack detection models over the load data set from IEEE 14-bus system and simulation results demonstrate that fused data from proposed model shows higher detection accuracy with respect to the raw data.
2022-12-23
Huo, Da, Li, Xiaoyong, Li, Linghui, Gao, Yali, Li, Ximing, Yuan, Jie.  2022.  The Application of 1D-CNN in Microsoft Malware Detection. 2022 7th International Conference on Big Data Analytics (ICBDA). :181–187.
In the computer field, cybersecurity has always been the focus of attention. How to detect malware is one of the focuses and difficulties in network security research effectively. Traditional existing malware detection schemes can be mainly divided into two methods categories: database matching and the machine learning method. With the rise of deep learning, more and more deep learning methods are applied in the field of malware detection. Deeper semantic features can be extracted via deep neural network. The main tasks of this paper are as follows: (1) Using machine learning methods and one-dimensional convolutional neural networks to detect malware (2) Propose a machine The method of combining learning and deep learning is used for detection. Machine learning uses LGBM to obtain an accuracy rate of 67.16%, and one-dimensional CNN obtains an accuracy rate of 72.47%. In (2), LGBM is used to screen the importance of features and then use a one-dimensional convolutional neural network, which helps to further improve the detection result has an accuracy rate of 78.64%.
2020-01-20
Huang, Yongjie, Yang, Qiping, Qin, Jinghui, Wen, Wushao.  2019.  Phishing URL Detection via CNN and Attention-Based Hierarchical RNN. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :112–119.
Phishing websites have long been a serious threat to cyber security. For decades, many researchers have been devoted to developing novel techniques to detect phishing websites automatically. While state-of-the-art solutions can achieve superior performances, they require substantial manual feature engineering and are not adept at detecting newly emerging phishing attacks. Therefore, developing techniques that can detect phishing websites automatically and handle zero-day phishing attacks swiftly is still an open challenge in this area. In this work, we propose PhishingNet, a deep learning-based approach for timely detection of phishing Uniform Resource Locators (URLs). Specifically, we use a Convolutional Neural Network (CNN) module to extract character-level spatial feature representations of URLs; meanwhile, we employ an attention-based hierarchical Recurrent Neural Network(RNN) module to extract word-level temporal feature representations of URLs. We then fuse these feature representations via a three-layer CNN to build accurate feature representations of URLs, on which we train a phishing URL classifier. Extensive experiments on a verified dataset collected from the Internet demonstrate that the feature representations extracted automatically are conducive to the improvement of the generalization ability of our approach on newly emerging URLs, which makes our approach achieve competitive performance against other state-of-the-art approaches.
2018-02-15
Chanyaswad, T., Al, M., Chang, J. M., Kung, S. Y..  2017.  Differential mutual information forward search for multi-kernel discriminant-component selection with an application to privacy-preserving classification. 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). :1–6.

In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher's discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.

2018-02-14
Feng, C., Wu, S., Liu, N..  2017.  A user-centric machine learning framework for cyber security operations center. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :173–175.

To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security events from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. Because of this, potential malicious attacks and compromised hosts may be missed. Machine learning is a viable approach to reduce the false positive rate and improve the productivity of SOC analysts. In this paper, we develop a user-centric machine learning framework for the cyber security operation center in real enterprise environment. We discuss the typical data sources in SOC, their work flow, and how to leverage and process these data sets to build an effective machine learning system. The paper is targeted towards two groups of readers. The first group is data scientists or machine learning researchers who do not have cyber security domain knowledge but want to build machine learning systems for security operations center. The second group of audiences are those cyber security practitioners who have deep knowledge and expertise in cyber security, but do not have machine learning experiences and wish to build one by themselves. Throughout the paper, we use the system we built in the Symantec SOC production environment as an example to demonstrate the complete steps from data collection, label creation, feature engineering, machine learning algorithm selection, model performance evaluations, to risk score generation.