Visible to the public Biblio

Filters: Keyword is data augmentation  [Clear All Filters]
2023-02-03
Lu, Dongzhe, Fei, Jinlong, Liu, Long, Li, Zecun.  2022.  A GAN-based Method for Generating SQL Injection Attack Samples. 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 10:1827–1833.
Due to the simplicity of implementation and high threat level, SQL injection attacks are one of the oldest, most prevalent, and most destructive types of security attacks on Web-based information systems. With the continuous development and maturity of artificial intelligence technology, it has been a general trend to use AI technology to detect SQL injection. The selection of the sample set is the deciding factor of whether AI algorithms can achieve good results, but dataset with tagged specific category labels are difficult to obtain. This paper focuses on data augmentation to learn similar feature representations from the original data to improve the accuracy of classification models. In this paper, deep convolutional generative adversarial networks combined with genetic algorithms are applied to the field of Web vulnerability attacks, aiming to solve the problem of insufficient number of SQL injection samples. This method is also expected to be applied to sample generation for other types of vulnerability attacks.
ISSN: 2693-2865
2023-02-02
Wang, Zirui, Duan, Shaoming, Wu, Chengyue, Lin, Wenhao, Zha, Xinyu, Han, Peiyi, Liu, Chuanyi.  2022.  Generative Data Augmentation for Non-IID Problem in Decentralized Clinical Machine Learning. 2022 4th International Conference on Data Intelligence and Security (ICDIS). :336–343.
Swarm learning (SL) is an emerging promising decentralized machine learning paradigm and has achieved high performance in clinical applications. SL solves the problem of a central structure in federated learning by combining edge computing and blockchain-based peer-to-peer network. While there are promising results in the assumption of the independent and identically distributed (IID) data across participants, SL suffers from performance degradation as the degree of the non-IID data increases. To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants. SL-GAN trains generators and discriminators locally, and periodically aggregation via a randomly elected coordinator in SL network. Under the standard assumptions, we theoretically prove the convergence of SL-GAN using stochastic approximations. Experimental results demonstrate that SL-GAN outperforms state-of-art methods on three real world clinical datasets including Tuberculosis, Leukemia, COVID-19.
2022-06-14
Su, Liyilei, Fu, Xianjun, Hu, Qingmao.  2021.  A convolutional generative adversarial framework for data augmentation based on a robust optimal transport metric. 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). :1155–1162.
Enhancement of the vanilla generative adversarial network (GAN) to preserve data variability in the presence of real world noise is of paramount significance in deep learning. In this study, we proposed a new distance metric of cosine distance in the framework of optimal transport (OT), and presented and validated a convolutional neural network (CNN) based GAN framework. In comparison with state-of-the-art methods based on Graphics Processing Units (GPU), the proposed framework could maintain the data diversity and quality best in terms of inception score (IS), Fréchet inception distance (FID) and enhancing the classification network of bone age, and is robust to noise degradation. The proposed framework is independent of hardware and thus could also be extended to more advanced hardware such as specialized Tensor Processing Units (TPU), and could be a potential built-in component of a general deep learning networks for such applications as image classification, segmentation, registration, and object detection.
2022-06-07
Gayathri, R G, Sajjanhar, Atul, Xiang, Yong, Ma, Xingjun.  2021.  Anomaly Detection for Scenario-based Insider Activities using CGAN Augmented Data. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :718–725.
Insider threats are the cyber attacks from the trusted entities within an organization. An insider attack is hard to detect as it may not leave a footprint and potentially cause huge damage to organizations. Anomaly detection is the most common approach for insider threat detection. Lack of real-world data and the skewed class distribution in the datasets makes insider threat analysis an understudied research area. In this paper, we propose a Conditional Generative Adversarial Network (CGAN) to enrich under-represented minority class samples to provide meaningful and diverse data for anomaly detection from the original malicious scenarios. Comprehensive experiments performed on benchmark dataset demonstrates the effectiveness of using CGAN augmented data, and the capability of multi-class anomaly detection for insider activity analysis. Moreover, the method is compared with other existing methods against different parameters and performance metrics.
2022-05-19
Qing-chao, Ni, Cong-jue, Yin, Dong-hua, Zhao.  2021.  Research on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation. 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). :53–57.
With the development of deep learning and the progress of natural language processing technology, as well as the continuous disclosure of judicial data such as judicial documents, legal intelligence has gradually become a research hot spot. The crime classification task is an important branch of text classification, which can help people related to the law to improve their work efficiency. However, in the actual research, the sample data is small and the distribution of crime categories is not balanced. To solve these two problems, BERT was used as the encoder to solve the problem of small data volume, and attribute extraction network was added to solve the problem of unbalanced distribution. Finally, the accuracy of 90.35% on small sample data set could be achieved, and F1 value was 67.62, which was close to the best model performance under sufficient data. Finally, a text enhancement method based on back-translation technology is proposed. Different models are used to conduct experiments. Finally, it is found that LSTM model is improved to some extent, but BERT is not improved to some extent.
2022-04-13
Issifu, Abdul Majeed, Ganiz, Murat Can.  2021.  A Simple Data Augmentation Method to Improve the Performance of Named Entity Recognition Models in Medical Domain. 2021 6th International Conference on Computer Science and Engineering (UBMK). :763–768.
Easy Data Augmentation is originally developed for text classification tasks. It consists of four basic methods: Synonym Replacement, Random Insertion, Random Deletion, and Random Swap. They yield accuracy improvements on several deep neural network models. In this study we apply these methods to a new domain. We augment Named Entity Recognition datasets from medical domain. Although the augmentation task is much more difficult due to the nature of named entities which consist of word or word groups in the sentences, we show that we can improve the named entity recognition performance.
2021-12-20
Chang, Sungkyun, Lee, Donmoon, Park, Jeongsoo, Lim, Hyungui, Lee, Kyogu, Ko, Karam, Han, Yoonchang.  2021.  Neural Audio Fingerprint for High-Specific Audio Retrieval Based on Contrastive Learning. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :3025–3029.
Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas. These replicas can simulate the degrading effects on original audio signals by applying small time offsets and various types of distortions, such as background noise and room/microphone impulse responses. In the segment-level search task, where the conventional audio fingerprinting systems used to fail, our system using 10x smaller storage has shown promising results. Our code and dataset are available at https://mimbres.github.io/neural-audio-fp/.
2021-03-29
Olaimat, M. Al, Lee, D., Kim, Y., Kim, J., Kim, J..  2020.  A Learning-based Data Augmentation for Network Anomaly Detection. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1–10.
While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel "Divide-Augment-Combine" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.
2020-12-11
Mikołajczyk, A., Grochowski, M..  2019.  Style transfer-based image synthesis as an efficient regularization technique in deep learning. 2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR). :42—47.

These days deep learning is the fastest-growing area in the field of Machine Learning. Convolutional Neural Networks are currently the main tool used for the image analysis and classification purposes. Although great achievements and perspectives, deep neural networks and accompanying learning algorithms have some relevant challenges to tackle. In this paper, we have focused on the most frequently mentioned problem in the field of machine learning, that is relatively poor generalization abilities. Partial remedies for this are regularization techniques e.g. dropout, batch normalization, weight decay, transfer learning, early stopping and data augmentation. In this paper we have focused on data augmentation. We propose to use a method based on a neural style transfer, which allows to generate new unlabeled images of high perceptual quality that combine the content of a base image with the appearance of another one. In a proposed approach, the newly created images are described with pseudo-labels, and then used as a training dataset. Real, labeled images are divided into the validation and test set. We validated proposed method on a challenging skin lesion classification case study. Four representative neural architectures are examined. Obtained results show the strong potential of the proposed approach.

2020-07-03
Feng, Ri-Chen, Lin, Daw-Tung, Chen, Ken-Min, Lin, Yi-Yao, Liu, Chin-De.  2019.  Improving Deep Learning by Incorporating Semi-automatic Moving Object Annotation and Filtering for Vision-based Vehicle Detection*. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :2484—2489.

Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.

2018-02-27
[Anonymous].  2017.  Sensitivity Analysis in Keystroke Dynamics Using Convolutional Neural Networks. 2017 IEEE Workshop on Information Forensics and Security (WIFS). :1–6.

Biometrics has become ubiquitous and spurred common use in many authentication mechanisms. Keystroke dynamics is a form of behavioral biometrics that can be used for user authentication while actively working at a terminal. The proposed mechanisms involve digraph, trigraph and n-graph analysis as separate solutions or suggest a fusion mechanism with certain limitations. However, deep learning can be used as a unifying machine learning technique that consolidates the power of all different features since it has shown tremendous results in image recognition and natural language processing. In this paper, we investigate the applicability of deep learning on three different datasets by using convolutional neural networks and Gaussian data augmentation technique. We achieve 10% higher accuracy and 7.3% lower equal error rate (EER) than existing methods. Also, our sensitivity analysis indicates that the convolution operation and the fully-connected layer are the most prominent factors that affect the accuracy and the convergence rate of a network trained with keystroke data.