Visible to the public Biblio

Filters: Author is Ban, Tao  [Clear All Filters]
2023-08-03
Ndichu, Samuel, Ban, Tao, Takahashi, Takeshi, Inoue, Daisuke.  2022.  Security-Alert Screening with Oversampling Based on Conditional Generative Adversarial Networks. 2022 17th Asia Joint Conference on Information Security (AsiaJCIS). :1–7.
Imbalanced class distribution can cause information loss and missed/false alarms for deep learning and machine-learning algorithms. The detection performance of traditional intrusion detection systems tend to degenerate due to skewed class distribution caused by the uneven allocation of observations in different kinds of attacks. To combat class imbalance and improve network intrusion detection performance, we adopt the conditional generative adversarial network (CTGAN) that enables the generation of samples of specific classes of interest. CTGAN builds on the generative adversarial networks (GAN) architecture to model tabular data and generate high quality synthetic data by conditionally sampling rows from the generated model. Oversampling using CTGAN adds instances to the minority class such that both data in the majority and the minority class are of equal distribution. The generated security alerts are used for training classifiers that realize critical alert detection. The proposed scheme is evaluated on a real-world dataset collected from security operation center of a large enterprise. The experiment results show that detection accuracy can be substantially improved when CTGAN is adopted to produce a balanced security-alert dataset. We believe the proposed CTGAN-based approach can cast new light on building effective systems for critical alert detection with reduced missed/false alarms.
ISSN: 2765-9712
2022-05-19
Ndichu, Samuel, Ban, Tao, Takahashi, Takeshi, Inoue, Daisuke.  2021.  A Machine Learning Approach to Detection of Critical Alerts from Imbalanced Multi-Appliance Threat Alert Logs. 2021 IEEE International Conference on Big Data (Big Data). :2119–2127.
The extraordinary number of alerts generated by network intrusion detection systems (NIDS) can desensitize security analysts tasked with incident response. Security information and event management systems (SIEMs) perform some rudimentary automation but cannot replicate the decision-making process of a skilled analyst. Machine learning and artificial intelligence (AI) can detect patterns in data with appropriate training. In practice, the majority of the alert data comprises false alerts, and true alerts form only a small proportion. Consequently, a naive engine that classifies all security alerts into the majority class can yield a superficial high accuracy close to 100%. Without any correction for the class imbalance, the false alerts will dominate algorithmic predictions resulting in poor generalization performance. We propose a machine-learning approach to address the class imbalance problem in multi-appliance security alert data and automate the security alert analysis process performed in security operations centers (SOCs). We first used the neighborhood cleaning rule (NCR) to identify and remove ambiguous, noisy, and redundant false alerts. Then, we applied the support vector machine synthetic minority oversampling technique (SVMSMOTE) to generate synthetic training true alerts. Finally, we fit and evaluated the decision tree and random forest classifiers. In the experiments, using alert data from eight security appliances, we demonstrated that the proposed method can significantly reduce the need for manual auditing, decreasing the number of uninspected alerts and achieving a performance of 99.524% in recall.
2021-09-21
Lee, Yen-Ting, Ban, Tao, Wan, Tzu-Ling, Cheng, Shin-Ming, Isawa, Ryoichi, Takahashi, Takeshi, Inoue, Daisuke.  2020.  Cross Platform IoT-Malware Family Classification Based on Printable Strings. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :775–784.
In this era of rapid network development, Internet of Things (IoT) security considerations receive a lot of attention from both the research and commercial sectors. With limited computation resource, unfriendly interface, and poor software implementation, legacy IoT devices are vulnerable to many infamous mal ware attacks. Moreover, the heterogeneity of IoT platforms and the diversity of IoT malware make the detection and classification of IoT malware even more challenging. In this paper, we propose to use printable strings as an easy-to-get but effective cross-platform feature to identify IoT malware on different IoT platforms. The discriminating capability of these strings are verified using a set of machine learning algorithms on malware family classification across different platforms. The proposed scheme shows a 99% accuracy on a large scale IoT malware dataset consisted of 120K executable fils in executable and linkable format when the training and test are done on the same platform. Meanwhile, it also achieves a 96% accuracy when training is carried out on a few popular IoT platforms but test is done on different platforms. Efficient malware prevention and mitigation solutions can be enabled based on the proposed method to prevent and mitigate IoT malware damages across different platforms.
2020-07-03
Jia, Guanbo, Miller, Paul, Hong, Xin, Kalutarage, Harsha, Ban, Tao.  2019.  Anomaly Detection in Network Traffic Using Dynamic Graph Mining with a Sparse Autoencoder. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :458—465.

Network based attacks on ecommerce websites can have serious economic consequences. Hence, anomaly detection in dynamic network traffic has become an increasingly important research topic in recent years. This paper proposes a novel dynamic Graph and sparse Autoencoder based Anomaly Detection algorithm named GAAD. In GAAD, the network traffic over contiguous time intervals is first modelled as a series of dynamic bipartite graph increments. One mode projection is performed on each bipartite graph increment and the adjacency matrix derived. Columns of the resultant adjacency matrix are then used to train a sparse autoencoder to reconstruct it. The sum of squared errors between the reconstructed approximation and original adjacency matrix is then calculated. An online learning algorithm is then used to estimate a Gaussian distribution that models the error distribution. Outlier error values are deemed to represent anomalous traffic flows corresponding to possible attacks. In the experiment, a network emulator was used to generate representative ecommerce traffic flows over a time period of 225 minutes with five attacks injected, including SYN scans, host emulation and DDoS attacks. ROC curves were generated to investigate the influence of the autoencoder hyper-parameters. It was found that increasing the number of hidden nodes and their activation level, and increasing sparseness resulted in improved performance. Analysis showed that the sparse autoencoder was unable to encode the highly structured adjacency matrix structures associated with attacks, hence they were detected as anomalies. In contrast, SVD and variants, such as the compact matrix decomposition, were found to accurately encode the attack matrices, hence they went undetected.