Biblio
This paper deals with the problem of image forgery detection because of the problems it causes. Where The Fake im-ages can lead to social problems, for example, misleading the public opinion on political or religious personages, de-faming celebrities and people, and Presenting them in a law court as evidence, may Doing mislead the court. This work proposes a deep learning approach based on Deep CNN (Convolutional Neural Network) Architecture, to detect fake images. The network is based on a modified structure of Xception net, CNN based on depthwise separable convolution layers. After extracting the feature maps, pooling layers are used with dense connection with Xception output, to in-crease feature maps. Inspired by the idea of a densenet network. On the other hand, the work uses the YCbCr color system for images, which gave better Accuracy of %99.93, more than RGB, HSV, and Lab or other color systems.
ISSN: 2831-753X
Industrial control systems (ICSs) are used in various infrastructures and industrial plants for realizing their control operation and ensuring their safety. Concerns about the cybersecurity of industrial control systems have raised due to the increased number of cyber-attack incidents on critical infrastructures in the light of the advancement in the cyber activity of ICSs. Nevertheless, the operation of the industrial control systems is bind to vital aspects in life, which are safety, economy, and security. This paper presents a semi-supervised, hybrid attack detection approach for industrial control systems by combining Isolation Forest and Convolutional Neural Network (CNN) models. The proposed framework is developed using the normal operational data, and it is composed of a feature extraction model implemented using a One-Dimensional Convolutional Neural Network (1D-CNN) and an isolation forest model for the detection. The two models are trained independently such that the feature extraction model aims to extract useful features from the continuous-time signals that are then used along with the binary actuator signals to train the isolation forest-based detection model. The proposed approach is applied to a down-scaled industrial control system, which is a water treatment plant known as the Secure Water Treatment (SWaT) testbed. The performance of the proposed method is compared with the other works using the same testbed, and it shows an improvement in terms of the detection capability.
One of the most efficient tool for human face recognition is neural networks. However, the result of recognition can be spoiled by facial expressions and other deviation from the canonical face representation. In this paper, we propose a resampling method of human faces represented by 3D point clouds. The method is based on a non-rigid Iterative Closest Point (ICP) algorithm. To improve the facial recognition performance, we use a combination of the proposed method and convolutional neural network (CNN). Computer simulation results are provided to illustrate the performance of the proposed method.
Human emotion recognition plays a vital role in interpersonal communication and human-machine interaction domain. Emotions are expressed through speech, hand gestures and by the movements of other body parts and through facial expression. Facial emotions are one of the most important factors in human communication that help us to understand, what the other person is trying to communicate. People understand only one-third of the message verbally, and two-third of it is through non-verbal means. There are many face emotion recognition (FER) systems present right now, but in real-life scenarios, they do not perform efficiently. Though there are many which claim to be a near-perfect system and to achieve the results in favourable and optimal conditions. The wide variety of expressions shown by people and the diversity in facial features of different people will not aid in the process of coming up with a system that is definite in nature. Hence developing a reliable system without any flaws showed by the existing systems is a challenging task. This paper aims to build an enhanced system that can analyse the exact facial expression of a user at that particular time and generate the corresponding emotion. Datasets like JAFFE and FER2013 were used for performance analysis. Pre-processing methods like facial landmark and HOG were incorporated into a convolutional neural network (CNN), and this has achieved good accuracy when compared with the already existing models.
In this work, we applied a deep Convolutional Neural Network (CNN) with Xception model to perform malware image classification. The Xception model is a recently developed special CNN architecture that is more powerful with less over- fitting problems than the current popular CNN models such as VGG16. However only a few use cases of the Xception model can be found in literature, and it has never been used to solve the malware classification problem. The performance of our approach was compared with other methods including KNN, SVM, VGG16 etc. The experiments on two datasets (Malimg and Microsoft Malware Dataset) demonstrated that the Xception model can achieve the highest training accuracy than all other approaches including the champion approach, and highest validation accuracy than all other approaches including VGG16 model which are using image-based malware classification (except the champion solution as this information was not provided). Additionally, we proposed a novel ensemble model to combine the predictions from .bytes files and .asm files, showing that a lower logloss can be achieved. Although the champion on the Microsoft Malware Dataset achieved a bit lower logloss, our approach does not require any features engineering, making it more effective to adapt to any future evolution in malware, and very much less time consuming than the champion's solution.
We investigate a deep learning model for action recognition that simultaneously extracts spatio-temporal information from a raw RGB input data. The proposed multiple spatio-temporal scales recurrent neural network (MSTRNN) model is derived by combining multiple timescale recurrent dynamics with a conventional convolutional neural network model. The architecture of the proposed model imposes both spatial and temporal constraints simultaneously on its neural activities. The constraints vary, with multiple scales in different layers. As suggested by the principle of upward and downward causation, it is assumed that the network can develop a functional hierarchy using its constraints during training. To evaluate and observe the characteristics of the proposed model, we use three human action datasets consisting of different primitive actions and different compositionality levels. The performance capabilities of the MSTRNN model on these datasets are compared with those of other representative deep learning models used in the field. The results show that the MSTRNN outperforms baseline models while using fewer parameters. The characteristics of the proposed model are observed by analyzing its internal representation properties. The analysis clarifies how the spatio-temporal constraints of the MSTRNN model aid in how it extracts critical spatio-temporal information relevant to its given tasks.
Facial expression recognition is a challenging problem in the field of computer vision. In this paper, we propose a deep learning approach that can learn the joint low-level and high-level features of human face to resolve this problem. Our deep neural networks utilize convolution and downsampling to extract the abstract and local features of human face, and reconstruct the raw input images to learn global features as supplementary information at the same time. We also add an adjustable weight in the networks when combining the two kinds of features for the final classification. The experimental results show that the proposed method can achieve good results, which has an average recognition accuracy of 93.65% on the test datasets.
In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the fine-tuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.