Biblio
Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.
Safety is one of basic human needs so we need a security system that able to prevent crime happens. Commonly, we use surveillance video to watch environment and human behaviour in a location. However, the surveillance video can only used to record images or videos with no additional information. Therefore we need more advanced camera to get another additional information such as human position and movement. This research were able to extract those information from surveillance video footage by using human detection and tracking algorithm. The human detection framework is based on Deep Learning Convolutional Neural Networks which is a very popular branch of artificial intelligence. For tracking algorithms, channel and spatial correlation filter is used to track detected human. This system will generate and export tracked movement on footage as an additional information. This tracked movement can be analysed furthermore for another research on surveillance video problems.
There is an inevitable trade-off between spatial and spectral resolutions in optical remote sensing images. A number of data fusion techniques of multimodal images with different spatial and spectral characteristics have been developed to generate optical images with both spatial and spectral high resolution. Although some of the techniques take the spectral and spatial blurring process into account, there is no method that attempts to retrieve an optical image with both spatial and spectral high resolution, a spectral blurring filter and a spectral response simultaneously. In this paper, we propose a new framework of spatial resolution enhancement by a fusion of multiple optical images with different characteristics based on tensor decomposition. An optical image with both spatial and spectral high resolution, together with a spatial blurring filter and a spectral response, is generated via canonical polyadic (CP) decomposition of a set of tensors. Experimental results featured that relatively reasonable results were obtained by regularization based on nonnegativity and coupling.
Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.
Edge detection is one of the most important topics of image processing. In the scenario of cloud computing, performing edge detection may also consider privacy protection. In this paper, we propose an edge detection and image segmentation scheme on an encrypted image with Sobel edge detector. We implement Gaussian filtering and Sobel operator on the image in the encrypted domain with homomorphic property. By implementing an adaptive threshold decision algorithm in the encrypted domain, we obtain a threshold determined by the image distribution. With the technique of garbled circuit, we perform comparison in the encrypted domain and obtain the edge of the image without decrypting the image in advanced. We then propose an image segmentation scheme on the encrypted image based on the detected edges. Our experiments demonstrate the viability and effectiveness of the proposed encrypted image edge detection and segmentation.
This paper investigates practical strategies for distributing payload across images with content-adaptive steganography and for pooling outputs of a single-image detector for steganalysis. Adopting a statistical model for the detector's output, the steganographer minimizes the power of the most powerful detector of an omniscient Warden, while the Warden, informed by the payload spreading strategy, detects with the likelihood ratio test in the form of a matched filter. Experimental results with state-of-the-art content-adaptive additive embedding schemes and rich models are included to show the relevance of the results.
We present a gradient-based attack against SVM-based forensic techniques relying on high-dimensional SPAM features. As opposed to prior work, the attack works directly in the pixel domain even if the relationship between pixel values and SPAM features can not be inverted. The proposed method relies on the estimation of the gradient of the SVM output with respect to pixel values, however it departs from gradient descent methodology due to the necessity of preserving the integer nature of pixels and to reduce the effect of the attack on image quality. A fast algorithm to estimate the gradient is also introduced to reduce the complexity of the attack. We tested the proposed attack against SVM detection of histogram stretching, adaptive histogram equalization and median filtering. In all cases the attack succeeded in inducing a decision error with a very limited distortion, the PSNR between the original and the attacked images ranging from 50 to 70 dBs. The attack is also effective in the case of attacks with Limited Knowledge (LK) when the SVM used by the attacker is trained on a different dataset with respect to that used by the analyst.
Most of the Depth Image Based Rendering (DIBR) techniques produce synthesized images which contain nonuniform geometric distortions affecting edges coherency. This type of distortions are challenging for common image quality metrics. Morphological filters maintain important geometric information such as edges across different resolution levels. In this paper, morphological wavelet peak signal-to-noise ratio measure, MW-PSNR, based on morphological wavelet decomposition is proposed to tackle the evaluation of DIBR synthesized images. It is shown that MW-PSNR achieves much higher correlation with human judgment compared to the state-of-the-art image quality measures in this context.
This paper investigates several techniques that increase the accuracy of motion boundaries in estimated motion fields of a local dense estimation scheme. In particular, we examine two matching metrics, one is MSE in the image domain and the other one is a recently proposed multiresolution metric that has been shown to produce more accurate motion boundaries. We also examine several different edge-preserving filters. The edge-aware moving average filter, proposed in this paper, takes an input image and the result of an edge detection algorithm, and outputs an image that is smooth except at the detected edges. Compared to the adoption of edge-preserving filters, we find that matching metrics play a more important role in estimating accurate and compressible motion fields. Nevertheless, the proposed filter may provide further improvements in the accuracy of the motion boundaries. These findings can be very useful for a number of recently proposed scalable interactive video coding schemes.
Most Depth Image Based Rendering (DIBR) techniques produce synthesized images which contain non-uniform geometric distortions affecting edges coherency. This type of distortions are challenging for common image quality metrics. Morphological filters maintain important geometric information such as edges across different resolution levels. There is inherent congruence between the morphological pyramid decomposition scheme and human visual perception. In this paper, multi-scale measure, morphological pyramid peak signal-to-noise ratio MP-PSNR, based on morphological pyramid decomposition is proposed for the evaluation of DIBR synthesized images. It is shown that MPPSNR achieves much higher correlation with human judgment compared to the state-of-the-art image quality measures in this context.
A variety of methods for images noise reduction has been developed so far. Most of them successfully remove noise but their edge preserving capabilities are weak. Therefore bilateral image filter is helpful to deal with this problem. Nevertheless, their performances depend on spatial and photometric parameters which are chosen by user. Conventionally, the geometric weight is calculated by means of distance of neighboring pixels and the photometric weight is calculated by means of color components of neighboring pixels. The range of weights is between zero and one. In this paper, geometric weights are estimated by fuzzy metrics and photometric weights are estimated by using fuzzy rule based system which does not require any predefined parameter. Experimental results of conventional, fuzzy bilateral filter and proposed approach have been included.
Images acquired and processed in communication and multimedia systems are often noisy. Thus, pre-filtering is a typical stage to remove noise. At this stage, a special attention has to be paid to image visual quality. This paper analyzes denoising efficiency from the viewpoint of visual quality improvement using metrics that take into account human vision system (HVS). Specific features of the paper consist in, first, considering filters based on discrete cosine transform (DCT) and, second, analyzing the filter performance locally. Such an analysis is possible due to the structure and peculiarities of the metric PSNR-HVS-M. It is shown that a more advanced DCT-based filter BM3D outperforms a simpler (and faster) conventional DCT-based filter in locally active regions, i.e., neighborhoods of edges and small-sized objects. This conclusions allows accelerating BM3D filter and can be used in further improvement of the analyzed denoising techniques.
We present a new method for mitigating wall return and a new greedy algorithm for detecting stationary targets after wall clutter has been cancelled. Given limited measurements of a stepped-frequency radar signal consisting of both wall and target return, our objective is to detect and localize the potential targets. Modulated Discrete Prolate Spheroidal Sequences (DPSS's) form an efficient basis for sampled bandpass signals. We mitigate the wall clutter efficiently within the compressive measurements through the use of a bandpass modulated DPSS basis. Then, in each step of an iterative algorithm for detecting the target positions, we use a modulated DPSS basis to cancel nearly all of the target return corresponding to previously selected targets. With this basis, we improve upon the target detection sensitivity of a Fourier-based technique.
This paper proposes a fast and robust procedure for sensing and reconstruction of sparse or compressible magnetic resonance images based on the compressive sampling theory. The algorithm starts with incoherent undersampling of the k-space data of the image using a random matrix. The undersampled data is sparsified using Haar transformation. The Haar transform coefficients of the k-space data are then reconstructed using the orthogonal matching Pursuit algorithm. The reconstructed coefficients are inverse transformed into k-space data and then into the image in spatial domain. Finally, a median filter is used to suppress the recovery noise artifacts. Experimental results show that the proposed procedure greatly reduces the image data acquisition time without significantly reducing the image quality. The results also show that the error in the reconstructed image is reduced by median filtering.
A robust appearance model is usually required in visual tracking, which can handle pose variation, illumination variation, occlusion and many other interferences occurring in video. So far, a number of tracking algorithms make use of image samples in previous frames to update appearance models. There are many limitations of that approach: 1) At the beginning of tracking, there exists no sufficient amount of data for online update because these adaptive models are data-dependent and 2) in many challenging situations, robustly updating the appearance models is difficult, which often results in drift problems. In this paper, we proposed a tracking algorithm based on compressive sensing theory and particle filter framework. Features are extracted by random projection with data-independent basis. Particle filter is employed to make a more accurate estimation of the target location and make much of the updated classifier. The robustness and the effectiveness of our tracker have been demonstrated in several experiments.