Biblio
Robot Operating System (ROS) is becoming more and more important and is used widely by developers and researchers in various domains. One of the most important fields where it is being used is the self-driving cars industry. However, this framework is far from being totally secure, and the existing security breaches do not have robust solutions. In this paper we focus on the camera vulnerabilities, as it is often the most important source for the environment discovery and the decision-making process. We propose an unsupervised anomaly detection tool for detecting suspicious frames incoming from camera flows. Our solution is based on spatio-temporal autoencoders used to truthfully reconstruct the camera frames and detect abnormal ones by measuring the difference with the input. We test our approach on a real-word dataset, i.e. flows coming from embedded cameras of self-driving cars. Our solution outperforms the existing works on different scenarios.
Since the neural networks are utilized to extract information from an image, Gatys et al. found that they could separate the content and style of images and reconstruct them to another image which called Style Transfer. Moreover, there are many feed-forward neural networks have been suggested to speeding up the original method to make Style Transfer become practical application. However, this takes a price: these feed-forward networks are unchangeable because of their fixed parameters which mean we cannot transfer arbitrary styles but only single one in real-time. Some coordinated approaches have been offered to relieve this dilemma. Such as a style-swap layer and an adaptive normalization layer (AdaIN) and soon. Its worth mentioning that we observed that the AdaIN layer only aligns the means and variance of the content feature maps with those of the style feature maps. Our method is aimed at presenting an operational approach that enables arbitrary style transfer in real-time, reserving more statistical information by histogram matching, providing more reliable texture clarity and more humane user control. We achieve performance more cheerful than existing approaches without adding calculation, complexity. And the speed comparable to the fastest Style Transfer method. Our method provides more flexible user control and trustworthy quality and stability.
Re-drawing the image as a certain artistic style is considered to be a complicated task for computer machine. On the contrary, human can easily master the method to compose and describe the style between different images. In the past, many researchers studying on the deep neural networks had found an appropriate representation of the artistic style using perceptual loss and style reconstruction loss. In the previous works, Gatys et al. proposed an artificial system based on convolutional neural networks that creates artistic images of high perceptual quality. Whereas in terms of running speed, it was relatively time-consuming, thus it cannot apply to video style transfer. Recently, a feed-forward CNN approach has shown the potential of fast style transformation, which is an end-to-end system without hundreds of iteration while transferring. We combined the benefits of both approaches, optimized the feed-forward network and defined time loss function to make it possible to implement the style transfer on video in real time. In contrast to the past method, our method runs in real time with higher resolution while creating competitive visually pleasing and temporally consistent experimental results.
Transferring the style of an image is a fundamental problem in computer vision. Which extracts the features of a context image and a style image, then fixes them to produce a new image with features of the both two input images. In this paper, we introduce an artificial system to separate and recombine the content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. We use a pre-trained deep convolutional neural network VGG19 to extract the feature map of the input style image and context image. Then we define a loss function that captures the difference between the output image and the two input images. We use the gradient descent algorithm to update the output image to minimize the loss function. Experiment results show the feasibility of the method.
Bi-dimensional empirical mode decomposition can decompose the source image into several Bi-dimensional Intrinsic Mode Functions. In the process of image decomposition, interpolation is needed and the upper and lower envelopes will be drawn. However, these interpolations and the drawings of upper and lower envelopes require a lot of computation time and manual screening. This paper proposes a simple but effective method that can maintain the characteristics of the original BEMD method, and the Hermite interpolation reconstruction method is used to replace the surface interpolation, and the variable neighborhood window method is used to replace the fixed neighborhood window method. We call it fast bi-dimensional empirical mode decomposition of the variable neighborhood window method based on research characteristics, and we finally complete the image fusion. The empirical analysis shows that this method can overcome the shortcomings that the source image features and details information of BIMF component decomposed from the original BEMD method are not rich enough, and reduce the calculation time, and the fusion quality is better.
Digital images are extensively used and exchanged through internet, which gave rise to the need of establishing authorship of images. Image watermarking has provided a solution to prevent false claims of ownership of the media. Information about the owner, generally in the form of a logo, text or image is imperceptibly hid into the subject. Many transforms have been explored by the researcher community for image watermarking. Many watermarking techniques have been developed based on Singular Value Decomposition (SVD) of images. This paper analyses Singular Value Decomposition to understand its use, ability and limitations to hide additional information into the cover image for Digital Image Watermarking application.
This paper explores the benefits of 3D face modeling for in-the-wild facial expression recognition (FER). Since there is limited in-the-wild 3D FER dataset, we first construct 3D facial data from available 2D dataset using recent advances in 3D face reconstruction. The 3D facial geometry representation is then extracted by deep learning technique. In addition, we also take advantage of manipulating the 3D face, such as using 2D projected images of 3D face as additional input for FER. These features are then fused with that of 2D FER typical network. By doing so, despite using common approaches, we achieve a competent recognition accuracy on Real-World Affective Faces (RAF) database and Static Facial Expressions in the Wild (SFEW 2.0) compared with the state-of-the-art reports. To the best of our knowledge, this is the first time such a deep learning combination of 3D and 2D facial modalities is presented in the context of in-the-wild FER.