Biblio
Semi-supervised learning has recently gained increasingly attention because it can combine abundant unlabeled data with carefully labeled data to train deep neural networks. However, common semi-supervised methods deeply rely on the quality of pseudo labels. In this paper, we proposed a new semi-supervised learning method based on Generative Adversarial Network (GAN), by using discriminator to learn the feature of both labeled and unlabeled data, instead of generating pseudo labels that cannot all be correct. Our approach, semi-supervised conditional GAN (SCGAN), builds upon the conditional GAN model, extending it to semi-supervised learning by changing the discriminator's output to a classification output and a real or false output. We evaluate our approach with basic semi-supervised model on MNIST dataset. It shows that our approach achieves the classification accuracy with 84.15%, outperforming the basic semi-supervised model with 72.94%, when labeled data are 1/600 of all data.
Phishing is typically deployed as an attack vector in the initial stages of a hacking endeavour. Due to it low-risk rightreward nature it has seen a widespread adoption, and detecting it has become a challenge in recent times. This paper proposes a novel means of detecting phishing websites using a Generative Adversarial Network. Taking into account the internal structure and external metadata of a website, the proposed approach uses a generator network which generates both legitimate as well as synthetic phishing features to train a discriminator network. The latter then determines if the features are either normal or phishing websites, before improving its detection accuracy based on the classification error. The proposed approach is evaluated using two different phishing datasets and is found to achieve a detection accuracy of up to 94%.
The increasing amount of malware variants seen in the wild is causing problems for Antivirus Software vendors, unable to keep up by creating signatures for each. The methods used to develop a signature, static and dynamic analysis, have various limitations. Machine learning has been used by Antivirus vendors to detect malware based on the information gathered from the analysis process. However, adversarial examples can cause machine learning algorithms to miss-classify new data. In this paper we describe a method for malware analysis by converting malware binaries to images and then preparing those images for training within a Generative Adversarial Network. These unsupervised deep neural networks are not susceptible to adversarial examples. The conversion to images from malware binaries should be faster than using dynamic analysis and it would still be possible to link malware families together. Using the Generative Adversarial Network, malware detection could be much more effective and reliable.
Previous saliency detection methods usually focus on extracting features to deal with the complex background in an image. However, these methods cannot effectively capture the semantic information of images. In recent years, Generative Adversarial Network (GAN) has become a prevalent research topic. Experiments show that GAN has ability to generate high quality images that look like natural images. Inspired by the effectiveness of GAN feature learning, we propose a novel multi-scale adversarial feature learning (MAFL) model for saliency detection. In particular, we model the complete framework of saliency detection is based on two deep CNN modules: the multi-scale G-network takes natural images as inputs and generates corresponding synthetic saliency map, and we designed a novel layer in D-network, namely a correlation layer, which is used to determine whether one image is a synthetic saliency map or ground-truth saliency map. Quantitative and qualitative experiments on three benchmark datasets demonstrate that our method outperforms seven state-of-the-art methods.
Imbalanced big data means big data where the ratio of a certain class is relatively small compared to other classes. When the machine learning model is trained by using imbalanced big data, the problem with performance drops for the minority class occurs. For this reason, various oversampling methodologies have been proposed, but simple oversampling leads to problem of the overfitting. In this paper, we propose a meta learning methodology for efficient analysis of imbalanced big data. The proposed meta learning methodology uses the meta information of the data generated by the generative model based on Generative Adversarial Networks. It prevents the generative model from becoming too similar to the real data in minority class. Compared to the simple oversampling methodology for analyzing imbalanced big data, it is less likely to cause overfitting. Experimental results show that the proposed method can efficiently analyze imbalanced big data.
Existing methods of generative adversarial network (GAN) use different criteria to distinguish between real and fake samples, such as probability [9],energy [44] energy or other losses [30]. In this paper, by employing the merits of deep metric learning, we propose a novel metric-based generative adversarial network (MBGAN), which uses the distance-criteria to distinguish between real and fake samples. Specifically, the discriminator of MBGAN adopts a triplet structure and learns a deep nonlinear transformation, which maps input samples into a new feature space. In the transformed space, the distance between real samples is minimized, while the distance between real sample and fake sample is maximized. Similar to the adversarial procedure of existing GANs, a generator is trained to produce synthesized examples, which are close to real examples, while a discriminator is trained to maximize the distance between real and fake samples to a large margin. Meanwhile, instead of using a fixed margin, we adopt a data-dependent margin [30], so that the generator could focus on improving the synthesized samples with poor quality, instead of wasting energy on well-produce samples. Our proposed method is verified on various benchmarks, such as CIFAR-10, SVHN and CelebA, and generates high-quality samples.
In this paper, we propose an autoencoder-based generative adversarial network (GAN) for automatic image generation, which is called "stylized adversarial autoencoder". Different from existing generative autoencoders which typically impose a prior distribution over the latent vector, the proposed approach splits the latent variable into two components: style feature and content feature, both encoded from real images. The split of the latent vector enables us adjusting the content and the style of the generated image arbitrarily by choosing different exemplary images. In addition, a multiclass classifier is adopted in the GAN network as the discriminator, which makes the generated images more realistic. We performed experiments on hand-writing digits, scene text and face datasets, in which the stylized adversarial autoencoder achieves superior results for image generation as well as remarkably improves the corresponding supervised recognition task.