Biblio
In the original algorithm for grey correlation analysis, the detected edge is comparatively rough and the thresholds need determining in advance. Thus, an adaptive edge detection method based on grey correlation analysis is proposed, in which the basic principle of the original algorithm for grey correlation analysis is used to get adaptively automatic threshold according to the mean value of the 3×3 area pixels around the detecting pixel and the property of people's vision. Because the false edge that the proposed algorithm detected is relatively large, the proposed algorithm is enhanced by dealing with the eight neighboring pixels around the edge pixel, which is merged to get the final edge map. The experimental results show that the algorithm can get more complete edge map with better continuity by comparing with the traditional edge detection algorithms.
Video streams acquired from thermal cameras are proven to be beneficial in diverse number of fields including military, healthcare, law enforcement, and security. Despite the hype, thermal imaging is increasingly affected by poor resolution, where it has expensive optical sensors and inability to attain optical precision. In recent years, deep learning based super-resolution algorithms are developed to enhance the video frame resolution at high accuracy. This paper presents a comparative analysis of super resolution (SR) techniques based on deep neural networks (DNN) that are applied on thermal video dataset. SRCNN, EDSR, Auto-encoder, and SRGAN are also discussed and investigated. Further the results on benchmark thermal datasets including FLIR, OSU thermal pedestrian database and OSU color thermal database are evaluated and analyzed. Based on the experimental results, it is concluded that, SRGAN has delivered a superior performance on thermal frames when compared to other techniques and improvements, which has the ability to provide state-of-the art performance in real time operations.
Re-drawing the image as a certain artistic style is considered to be a complicated task for computer machine. On the contrary, human can easily master the method to compose and describe the style between different images. In the past, many researchers studying on the deep neural networks had found an appropriate representation of the artistic style using perceptual loss and style reconstruction loss. In the previous works, Gatys et al. proposed an artificial system based on convolutional neural networks that creates artistic images of high perceptual quality. Whereas in terms of running speed, it was relatively time-consuming, thus it cannot apply to video style transfer. Recently, a feed-forward CNN approach has shown the potential of fast style transformation, which is an end-to-end system without hundreds of iteration while transferring. We combined the benefits of both approaches, optimized the feed-forward network and defined time loss function to make it possible to implement the style transfer on video in real time. In contrast to the past method, our method runs in real time with higher resolution while creating competitive visually pleasing and temporally consistent experimental results.
In this study we propose a novel method for drone surveillance that can simultaneously analyze time-frequency responses in all pixels of a high-frame-rate video. The propellers of flying drones rotate at hundreds of Hz and their principal vibration frequency components are much higher than those of their background objects. To separate the pixels around a drone's propellers from its background, we utilize these time-series features for vibration source localization with pixel-level short-time Fourier transform (STFT). We verify the relationship between the number of taps in the STFT computation and the performance of our algorithm, including the execution time and the localization accuracy, by conducting experiments under various conditions, such as degraded appearance, weather, and defocused blur. The robustness of the proposed algorithm is also verified by localizing a flying multi-copter in real-time in an outdoor scenario.
This paper introduces DeepCheck, a new approach for validating Deep Neural Networks (DNNs) based on core ideas from program analysis, specifically from symbolic execution. DeepCheck implements techniques for lightweight symbolic analysis of DNNs and applies them in the context of image classification to address two challenging problems: 1) identification of important pixels (for attribution and adversarial generation); and 2) creation of adversarial attacks. Experimental results using the MNIST data-set show that DeepCheck's lightweight symbolic analysis provides a valuable tool for DNN validation.
To solve the high-resolution three-dimensional (3D) microwave imaging is a challenging topic due to its inherent unmanageable computation. Recently, deep learning techniques that can fully explore the prior of meaningful pattern embodied in data have begun to show its intriguing merits in various areas of inverse problem. Motivated by this observation, we here present a deep-learning-inspired approach to the high-resolution 3D microwave imaging in the context of Generative Adversarial Network (GAN), termed as GANMI in this work. Simulation and experimental results have been provided to demonstrate that the proposed GANMI can remarkably outperform conventional methods in terms of both the image quality and computational time.
Deep learning has been successfully applied to the ordinary image super-resolution (SR). However, since the synthetic aperture radar (SAR) images are often disturbed by multiplicative noise known as speckle and more blurry than ordinary images, there are few deep learning methods for the SAR image SR. In this paper, a deep generative adversarial network (DGAN) is proposed to reconstruct the pseudo high-resolution (HR) SAR images. First, a generator network is constructed to remove the noise of low-resolution SAR image and generate HR SAR image. Second, a discriminator network is used to differentiate between the pseudo super-resolution images and the realistic HR images. The adversarial objective function is introduced to make the pseudo HR SAR images closer to real SAR images. The experimental results show that our method can maintain the SAR image content with high-level noise suppression. The performance evaluation based on peak signal-to-noise-ratio and structural similarity index shows the superiority of the proposed method to the conventional CNN baselines.
Super-resolution (SR) of hyperspectral images (HSIs) aims to enhance the spatial/spectral resolution of hyperspectral imagery and the super-resolved results will benefit many remote sensing applications. A generative adversarial network for HSIs super-resolution (HSRGAN) is proposed in this paper. Specifically, HSRGAN constructs spectral and spatial blocks with residual network in generator to effectively learn spectral and spatial features from HSIs. Furthermore, a new loss function which combines the pixel-wise loss and adversarial loss together is designed to guide the generator to recover images approximating the original HSIs and with finer texture details. Quantitative and qualitative results demonstrate that the proposed HSRGAN is superior to the state of the art methods like SRCNN and SRGAN for HSIs spatial SR.
There is an inevitable trade-off between spatial and spectral resolutions in optical remote sensing images. A number of data fusion techniques of multimodal images with different spatial and spectral characteristics have been developed to generate optical images with both spatial and spectral high resolution. Although some of the techniques take the spectral and spatial blurring process into account, there is no method that attempts to retrieve an optical image with both spatial and spectral high resolution, a spectral blurring filter and a spectral response simultaneously. In this paper, we propose a new framework of spatial resolution enhancement by a fusion of multiple optical images with different characteristics based on tensor decomposition. An optical image with both spatial and spectral high resolution, together with a spatial blurring filter and a spectral response, is generated via canonical polyadic (CP) decomposition of a set of tensors. Experimental results featured that relatively reasonable results were obtained by regularization based on nonnegativity and coupling.
Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.
Transferring artistic styles onto everyday photographs has become an extremely popular task in both academia and industry. Recently, offline training has replaced online iterative optimization, enabling nearly real-time stylization. When those stylization networks are applied directly to high-resolution images, however, the style of localized regions often appears less similar to the desired artistic style. This is because the transfer process fails to capture small, intricate textures and maintain correct texture scales of the artworks. Here we propose a multimodal convolutional neural network that takes into consideration faithful representations of both color and luminance channels, and performs stylization hierarchically with multiple losses of increasing scales. Compared to state-of-the-art networks, our network can also perform style transfer in nearly real-time by performing much more sophisticated training offline. By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones. That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.
In this paper, we propose to impose a multiscale contextual loss for image style transfer based on Convolutional Neural Networks (CNN). In the traditional optimization framework, a new stylized image is synthesized by constraining the high-level CNN features similar to a content image and the lower-level CNN features similar to a style image, which, however, appears to lost many details of the content image, presenting unpleasing and inconsistent distortions or artifacts. The proposed multiscale contextual loss, named Haar loss, is responsible for preserving the lost details by dint of matching the features derived from the content image and the synthesized image via wavelet transform. It endows the synthesized image with the characteristic to better retain the semantic information of the content image. More specifically, the unpleasant distortions can be effectively alleviated while the style can be well preserved. In the experiments, we show the visually more consistent and simultaneously well-stylized images generated by incorporating the multiscale contextual loss.
This paper investigates several techniques that increase the accuracy of motion boundaries in estimated motion fields of a local dense estimation scheme. In particular, we examine two matching metrics, one is MSE in the image domain and the other one is a recently proposed multiresolution metric that has been shown to produce more accurate motion boundaries. We also examine several different edge-preserving filters. The edge-aware moving average filter, proposed in this paper, takes an input image and the result of an edge detection algorithm, and outputs an image that is smooth except at the detected edges. Compared to the adoption of edge-preserving filters, we find that matching metrics play a more important role in estimating accurate and compressible motion fields. Nevertheless, the proposed filter may provide further improvements in the accuracy of the motion boundaries. These findings can be very useful for a number of recently proposed scalable interactive video coding schemes.
Salt and Pepper Noise is very common during transmission of images through a noisy channel or due to impairment in camera sensor module. For noise removal, methods have been proposed in literature, with two stage cascade various configuration. These methods, can remove low density impulse noise, are not suited for high density noise in terms of visible performance. We propose an efficient method for removal of high as well as low density impulse noise. Our approach is based on novel extension over iterated conditional modes (ICM). It is cascade configuration of two stages - noise detection and noise removal. Noise detection process is a combination of iterative decision based approach, while noise removal process is based on iterative noisy pixel estimation. Using improvised approach, up to 95% corrupted image have been recovered with good results, while 98% corrupted image have been recovered with quite satisfactory results. To benchmark the image quality, we have considered various metrics like PSNR (Peak Signal to Noise Ratio), MSE (Mean Square Error) and SSIM (Structure Similarity Index Measure).
The main emphasis of this paper is to develop an approach able to detect and assess blindly the perceptual blur degradation in images. The idea deals with a statistical modelling of perceptual blur degradation in the frequency domain using the discrete cosine transform (DCT) and the Just Noticeable Blur (JNB) concept. A machine learning system is then trained using the considered statistical features to detect perceptual blur effect in the acquired image and eventually produces a quality score denoted BBQM for Blind Blur Quality Metric. The proposed BBQM efficiency is tested objectively by evaluating it's performance against some existing metrics in terms of correlation with subjective scores.
With the recent developments in the field of visual sensor technology, multiple imaging sensors are used in several applications such as surveillance, medical imaging and machine vision, in order to improve their capabilities. The goal of any efficient image fusion algorithm is to combine the visual information, obtained from a number of disparate imaging sensors, into a single fused image without the introduction of distortion or loss of information. The existing fusion algorithms employ either the mean or choose-max fusion rule for selecting the best features for fusion. The choose-max rule distorts constants background information whereas the mean rule blurs the edges. In this paper, Non-Subsampled Contourlet Transform (NSCT) based two feature-level fusion schemes are proposed and compared. In the first method Fuzzy logic is applied to determine the weights to be assigned to each segmented region using the salient region feature values computed. The second method employs Golden Section Algorithm (GSA) to achieve the optimal fusion weights of each region based on its Petrovic metric. The regions are merged adaptively using the weights determined. Experiments show that the proposed feature-level fusion methods provide better visual quality with clear edge information and objective quality metrics than individual multi-resolution-based methods such as Dual Tree Complex Wavelet Transform and NSCT.