Biblio
Over the last few years, there has been an increasing number of studies about facial emotion recognition because of the importance and the impact that it has in the interaction of humans with computers. With the growing number of challenging datasets, the application of deep learning techniques have all become necessary. In this paper, we study the challenges of Emotion Recognition Datasets and we also try different parameters and architectures of the Conventional Neural Networks (CNNs) in order to detect the seven emotions in human faces, such as: anger, fear, disgust, contempt, happiness, sadness and surprise. We have chosen iCV MEFED (Multi-Emotion Facial Expression Dataset) as the main dataset for our study, which is relatively new, interesting and very challenging.
As a part of body language, facial expression is a psychological state that reflects the current emotional state of the person. Recognition of facial expressions can help to understand others and enhance communication with others. We propose a facial expression recognition method based on convolutional neural network ensemble learning in this paper. Our model is composed of three sub-networks, and uses the SVM classifier to Integrate the output of the three networks to get the final result. The recognition accuracy of the model's expression on the FER2013 dataset reached 71.27%. The results show that the method has high test accuracy and short prediction time, and can realize real-time, high-performance facial recognition.
Facial expressions are one of the most powerful, natural and immediate means for human being to present their emotions and intensions. In this paper, we present a novel method for fully automatic facial expression recognition. The facial landmarks are detected for characterizing facial expressions. A graph convolutional neural network is proposed for feature extraction and facial expression recognition classification. The experiments were performed on the three facial expression databases. The result shows that the proposed FER method can achieve good recognition accuracy up to 95.85% using the proposed method.
A THz image edge detection approach based on wavelet and neural network is proposed in this paper. First, the source image is decomposed by wavelet, the edges in the low-frequency sub-image are detected using neural network method and the edges in the high-frequency sub-images are detected using wavelet transform method on the coarsest level of the wavelet decomposition, the two edge images are fused according to some fusion rules to obtain the edge image of this level, it then is projected to the next level. Afterwards the final edge image of L-1 level is got according to some fusion rule. This process is repeated until reaching the 0 level thus to get the final integrated and clear edge image. The experimental results show that our approach based on fusion technique is superior to Canny operator method and wavelet transform method alone.
Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adversarial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations (RP2), generates adversarial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the RP2 attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset, which shows that high frequency noise is introduced into the input image by the RP2 algorithm. To remove the high frequency noise, we introduce a depthwise convolution layer of standard blur kernels after the first layer. We perform a blackbox transfer attack to show that low-pass filtering the feature maps is more beneficial than filtering the input. We then present various regularization schemes to incorporate this lowpass filtering behavior into the training regime of the network and perform white-box attacks. We conclude with an adaptive attack evaluation to show that the success rate of the attack drops from 90% to 20% with total variation regularization, one of the proposed defenses.
Since Gatys et al. proved that the convolution neural network (CNN) can be used to generate new images with artistic styles by separating and recombining the styles and contents of images. Neural Style Transfer has attracted wide attention of computer vision researchers. This paper aims to provide an overview of the style transfer application deep learning network development process, and introduces the classical style migration model, on the basis of the research on the migration of style of the deep learning network for collecting and organizing, and put forward related to gathered during the investigation of the problem solution, finally some classical model in the image style to display and compare the results of migration.
Most of the data manipulation attacks on deep neural networks (DNNs) during the training stage introduce a perceptible noise that can be catered by preprocessing during inference, or can be identified during the validation phase. There-fore, data poisoning attacks during inference (e.g., adversarial attacks) are becoming more popular. However, many of them do not consider the imperceptibility factor in their optimization algorithms, and can be detected by correlation and structural similarity analysis, or noticeable (e.g., by humans) in multi-level security system. Moreover, majority of the inference attack rely on some knowledge about the training dataset. In this paper, we propose a novel methodology which automatically generates imperceptible attack images by using the back-propagation algorithm on pre-trained DNNs, without requiring any information about the training dataset (i.e., completely training data-unaware). We present a case study on traffic sign detection using the VGGNet trained on the German Traffic Sign Recognition Benchmarks dataset in an autonomous driving use case. Our results demonstrate that the generated attack images successfully perform misclassification while remaining imperceptible in both “subjective” and “objective” quality tests.
The world is fundamentally compositional, so it is natural to think of visual recognition as the recognition of basic visually primitives that are composed according to well-defined rules. This strategy allows us to recognize unseen complex concepts from simple visual primitives. However, the current trend in visual recognition follows a data greedy approach where huge amounts of data are required to learn models for any desired visual concept. In this paper, we build on the compositionality principle and develop an "algebra" to compose classifiers for complex visual concepts. To this end, we learn neural network modules to perform boolean algebra operations on simple visual classifiers. Since these modules form a complete functional set, a classifier for any complex visual concept defined as a boolean expression of primitives can be obtained by recursively applying the learned modules, even if we do not have a single training sample. As our experiments show, using such a framework, we can compose classifiers for complex visual concepts outperforming standard baselines on two well-known visual recognition benchmarks. Finally, we present a qualitative analysis of our method and its properties.
Deep neural networks (DNNs) provide good performance for image recognition, speech recognition, and pattern recognition. However, a poisoning attack is a serious threat to DNN's security. The poisoning attack is a method to reduce the accuracy of DNN by adding malicious training data during DNN training process. In some situations such as a military, it may be necessary to drop only a chosen class of accuracy in the model. For example, if an attacker does not allow only nuclear facilities to be selectively recognized, it may be necessary to intentionally prevent UAV from correctly recognizing nuclear-related facilities. In this paper, we propose a selective poisoning attack that reduces the accuracy of only chosen class in the model. The proposed method reduces the accuracy of a chosen class in the model by training malicious training data corresponding to a chosen class, while maintaining the accuracy of the remaining classes. For experiment, we used tensorflow as a machine learning library and MNIST and CIFAR10 as datasets. Experimental results show that the proposed method can reduce the accuracy of the chosen class to 43.2% and 55.3% in MNIST and CIFAR10, while maintaining the accuracy of the remaining classes.
Safety is one of basic human needs so we need a security system that able to prevent crime happens. Commonly, we use surveillance video to watch environment and human behaviour in a location. However, the surveillance video can only used to record images or videos with no additional information. Therefore we need more advanced camera to get another additional information such as human position and movement. This research were able to extract those information from surveillance video footage by using human detection and tracking algorithm. The human detection framework is based on Deep Learning Convolutional Neural Networks which is a very popular branch of artificial intelligence. For tracking algorithms, channel and spatial correlation filter is used to track detected human. This system will generate and export tracked movement on footage as an additional information. This tracked movement can be analysed furthermore for another research on surveillance video problems.
In this paper, a merged convolution neural network (MCNN) is proposed to improve the accuracy and robustness of real-time facial expression recognition (FER). Although there are many ways to improve the performance of facial expression recognition, a revamp of the training framework and image preprocessing renders better results in applications. When the camera is capturing images at high speed, however, changes in image characteristics may occur at certain moments due to the influence of light and other factors. Such changes can result in incorrect recognition of human facial expression. To solve this problem, we propose a statistical method for recognition results obtained from previous images, instead of using the current recognition output. Experimental results show that the proposed method can satisfactorily recognize seven basic facial expressions in real time.