Biblio
In image forensics, to determine whether the image is impurely transformed, it extracts and examines the features included in the suspicious image. In general, the features extracted for the detection of forgery images are based on numerical values, so it is somewhat unreasonable to use in the CNN structure for image classification. In this paper, the extraction method of a feature vector is using a least-squares solution. Treat a suspicious image like a matrix and its solution to be coefficients as the feature vector. Get two solutions from two images of the original and its median filter residual (MFR). Subsequently, the two features were formed into a visualized pattern and then fed into CNN deep learning to classify the various transformed images. A new structure of the CNN net layer was also designed by hybrid with the inception module and the residual block to classify visualized feature vector patterns. The performance of the proposed image forensics detection (IFD) scheme was measured with the seven transformed types of image: average filtered (window size: 3 × 3), gaussian filtered (window size: 3 × 3), JPEG compressed (quality factor: 90, 70), median filtered (window size: 3 × 3, 5 × 5), and unaltered. The visualized patterns are fed into the image input layer of the designed CNN hybrid model. Throughout the experiment, the accuracy of median filtering detection was 98% over. Also, the area under the curve (AUC) by sensitivity (TP: true positive rate) and 1-specificity (FP: false positive rate) results of the proposed IFD scheme approached to `1' on the designed CNN hybrid model. Experimental results show high efficiency and performance to classify the various transformed images. Therefore, the grade evaluation of the proposed scheme is “Excellent (A)”.
Motions of facial components convey significant information of facial expressions. Although remarkable advancement has been made, the dynamic of facial topology has not been fully exploited. In this paper, a novel facial expression recognition (FER) algorithm called Spatial Temporal Semantic Graph Network (STSGN) is proposed to automatically learn spatial and temporal patterns through end-to-end feature learning from facial topology structure. The proposed algorithm not only has greater discriminative power to capture the dynamic patterns of facial expression and stronger generalization capability to handle different variations but also higher interpretability. Experimental evaluation on two popular datasets, CK+ and Oulu-CASIA, shows that our algorithm achieves more competitive results than other state-of-the-art methods.
The rapid growth of artificial intelligence has contributed a lot to the technology world. As the traditional algorithms failed to meet the human needs in real time, Machine learning and deep learning algorithms have gained great success in different applications such as classification systems, recommendation systems, pattern recognition etc. Emotion plays a vital role in determining the thoughts, behaviour and feeling of a human. An emotion recognition system can be built by utilizing the benefits of deep learning and different applications such as feedback analysis, face unlocking etc. can be implemented with good accuracy. The main focus of this work is to create a Deep Convolutional Neural Network (DCNN) model that classifies 5 different human facial emotions. The model is trained, tested and validated using the manually collected image dataset.
Due to the noise in the images, the edges extracted from these noisy images are always discontinuous and inaccurate by traditional operators. In order to solve these problems, this paper proposes multi-direction edge detection operator to detect edges from noisy images. The new operator is designed by introducing the shear transformation into the traditional operator. On the one hand, the shear transformation can provide a more favorable treatment for directions, which can make the new operator detect edges in different directions and overcome the directional limitation in the traditional operator. On the other hand, all the single pixel edge images in different directions can be fused. In this case, the edge information can complement each other. The experimental results indicate that the new operator is superior to the traditional ones in terms of the effectiveness of edge detection and the ability of noise rejection.
Style transfer is a research hotspot in computer vision. Up to now, it is still a challenge although many researches have been conducted on it for high quality style transfer. In this work, we propose an algorithm named ASTCNN which is a real-time Arbitrary Style Transfer Convolution Neural Network. The ASTCNN consists of two independent encoders and a decoder. The encoders respectively extract style and content features from style and content and the decoder generates the style transferred image images. Experimental results show that ASTCNN achieves higher quality output image than the state-of-the-art style transfer algorithms and the floating point computation of ASTCNN is 23.3% less than theirs.
Over the years, technology has reformed the perception of the world related to security concerns. To tackle security problems, we proposed a system capable of detecting security alerts. System encompass audio events that occur as an outlier against background of unusual activity. This ambiguous behaviour can be handled by auditory classification. In this paper, we have discussed two techniques of extracting features from sound data including: time-based and signal based features. In first technique, we preserve time-series nature of sound, while in other signal characteristics are focused. Convolution neural network is applied for categorization of sound. Major aim of research is security challenges, so we have generated data related to surveillance in addition to available datasets such as UrbanSound 8k and ESC-50 datasets. We have achieved 94.6% accuracy for proposed methodology based on self-generated dataset. Improved accuracy on locally prepared dataset demonstrates novelty in research.
Attacks by Jamming on wireless communication network can provoke Denial of Services. According to the communication system which is affected, the consequences can be more or less critical. In this paper, we propose to develop an algorithm which could be implemented at the reception stage of a communication terminal in order to detect the presence of jamming signals. The work is performed on Wi-Fi communication signals and demonstrates the necessity to have a specific signal processing at the reception stage to be able to detect the presence of jamming signals.
Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, mixer and decoder. The style encoder and content encoder are used to extract the style and content factors from the style reference images and content reference images, respectively. The mixer employs a bilinear model to integrate the above two factors and finally feeds it into a decoder to generate images with target style and content. To separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. During training, the encoder network learns to extract styles and contents from two sets of reference images in limited size, one with shared style and the other with shared content. This learning framework allows simultaneous style transfer among multiple styles and can be deemed as a special 'multi-task' learning scenario. The encoders are expected to capture the underlying features for different styles and contents which is generalizable to new styles and contents. For validation, we applied the proposed algorithm to the Chinese Typeface transfer problem. Extensive experiment results on character generation have demonstrated the effectiveness and robustness of our method.
Humans have created many pioneers of art from the beginning of time. There are not many notable achievements by an artificial intelligence to create something visually captivating in the field of art. However, some breakthroughs were made in the past few years by learning the differences between the content and style of an image using convolution neural networks and texture synthesis. But most of the approaches have the limitations on either processing time, choosing a certain style image or altering the weight ratio of style image. Therefore, we are to address these restrictions and provide a system which allows any style image selection with a user defined style weight ratio in minimum time possible.
As an emerging paradigm for energy-efficiency design, approximate computing can reduce power consumption through simplification of logic circuits. Although calculation errors are caused by approximate computing, their impacts on the final results can be negligible in some error resilient applications, such as Convolutional Neural Networks (CNNs). Therefore, approximate computing has been applied to CNNs to reduce the high demand for computing resources and energy. Compared with the traditional method such as reducing data precision, this paper investigates the effect of approximate computing on the accuracy and power consumption of CNNs. To optimize the approximate computing technology applied to CNNs, we propose a method for quantifying the error resilience of each neuron by theoretical analysis and observe that error resilience varies widely across different neurons. On the basic of quantitative error resilience, dynamic adaptation of approximate bit-width and the corresponding configurable adder are proposed to fully exploit the error resilience of CNNs. Experimental results show that the proposed method further improves the performance of power consumption while maintaining high accuracy. By adopting the optimal approximate bit-width for each layer found by our proposed algorithm, dynamic adaptation of approximate bit-width reduces power consumption by more than 30% and causes less than 1% loss of the accuracy for LeNet-5.
We investigate a deep learning model for action recognition that simultaneously extracts spatio-temporal information from a raw RGB input data. The proposed multiple spatio-temporal scales recurrent neural network (MSTRNN) model is derived by combining multiple timescale recurrent dynamics with a conventional convolutional neural network model. The architecture of the proposed model imposes both spatial and temporal constraints simultaneously on its neural activities. The constraints vary, with multiple scales in different layers. As suggested by the principle of upward and downward causation, it is assumed that the network can develop a functional hierarchy using its constraints during training. To evaluate and observe the characteristics of the proposed model, we use three human action datasets consisting of different primitive actions and different compositionality levels. The performance capabilities of the MSTRNN model on these datasets are compared with those of other representative deep learning models used in the field. The results show that the MSTRNN outperforms baseline models while using fewer parameters. The characteristics of the proposed model are observed by analyzing its internal representation properties. The analysis clarifies how the spatio-temporal constraints of the MSTRNN model aid in how it extracts critical spatio-temporal information relevant to its given tasks.
The answer selection task is one of the most important issues within the automatic question answering system, and it aims to automatically find accurate answers to questions. Traditional methods for this task use manually generated features based on tf-idf and n-gram models to represent texts, and then select the right answers according to the similarity between the representations of questions and the candidate answers. Nowadays, many question answering systems adopt deep neural networks such as convolutional neural network (CNN) to generate the text features automatically, and obtained better performance than traditional methods. CNN can extract consecutive n-gram features with fixed length by sliding fixed-length convolutional kernels over the whole word sequence. However, due to the complex semantic compositionality of the natural language, there are many phrases with variable lengths and be composed of non-consecutive words in natural language, such as these phrases whose constituents are separated by other words within the same sentences. But the traditional CNN is unable to extract the variable length n-gram features and non-consecutive n-gram features. In this paper, we propose a multi-scale deformable convolutional neural network to capture the non-consecutive n-gram features by adding offset to the convolutional kernel, and also propose to stack multiple deformable convolutional layers to mine multi-scale n-gram features by the means of generating longer n-gram in higher layer. Furthermore, we apply the proposed model into the task of answer selection. Experimental results on public dataset demonstrate the effectiveness of our proposed model in answer selection.
While recent studies have shed light on the expressivity, complexity and compositionality of convolutional networks, the real inductive bias of the family of functions reachable by gradient descent on natural data is still unknown. By exploiting symmetries in the preactivation space of convolutional layers, we present preliminary empirical evidence of regularities in the preimage of trained rectifier networks, in terms of arrangements of polytopes, and relate it to the nonlinear transformations applied by the network to its input.