Biblio
In this paper, we present an improved approach to transfer style for videos based on semantic segmentation. We segment foreground objects and background, and then apply different styles respectively. A fully convolutional neural network is used to perform semantic segmentation. We increase the reliability of the segmentation, and use the information of segmentation and the relationship between foreground objects and background to improve segmentation iteratively. We also use segmentation to improve optical flow, and apply different motion estimation methods between foreground objects and background. This improves the motion boundaries of optical flow, and solves the problems of incorrect and discontinuous segmentation caused by occlusion and shape deformation.
Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN.
In the past few years, visual information collection and transmission is increased significantly for various applications. Smart vehicles, service robotic platforms and surveillance cameras for the smart city applications are collecting a large amount of visual data. The preservation of the privacy of people presented in this data is an important factor in storage, processing, sharing and transmission of visual data across the Internet of Robotic Things (IoRT). In this paper, a novel anonymisation method for information security and privacy preservation in visual data in sharing layer of the Web of Robotic Things (WoRT) is proposed. The proposed framework uses deep neural network based semantic segmentation to preserve the privacy in video data base of the access level of the applications and users. The data is anonymised to the applications with lower level access but the applications with higher legal access level can analyze and annotated the complete data. The experimental results show that the proposed method while giving the required access to the authorities for legal applications of smart city surveillance, is capable of preserving the privacy of the people presented in the data.
In order to study the application of improved image hashing algorithm in image tampering detection, based on compressed sensing and ring segmentation, a new image hashing technique is studied. The image hash algorithm based on compressed sensing and ring segmentation is proposed. First, the algorithm preprocesses the input image. Then, the ring segment is used to extract the set of pixels in each ring region. These aggregate data are separately performed compressed sensing measurements. Finally, the hash value is constructed by calculating the inner product of the measurement vector and the random vector. The results show that the algorithm has good perceived robustness, uniqueness and security. Finally, the ROC curve is used to analyze the classification performance. The comparison of ROC curves shows that the performance of the proposed algorithm is better than FM-CS, GF-LVQ and RT-DCT.
With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.
Edge detection is one of the most important topics of image processing. In the scenario of cloud computing, performing edge detection may also consider privacy protection. In this paper, we propose an edge detection and image segmentation scheme on an encrypted image with Sobel edge detector. We implement Gaussian filtering and Sobel operator on the image in the encrypted domain with homomorphic property. By implementing an adaptive threshold decision algorithm in the encrypted domain, we obtain a threshold determined by the image distribution. With the technique of garbled circuit, we perform comparison in the encrypted domain and obtain the edge of the image without decrypting the image in advanced. We then propose an image segmentation scheme on the encrypted image based on the detected edges. Our experiments demonstrate the viability and effectiveness of the proposed encrypted image edge detection and segmentation.
This paper address the problem of shadow detection and removal in traffic vision analysis. Basically, the presence of the shadow in the traffic sequences is imminent, and therefore leads to errors at segmentation stage and often misclassified as an object region or as a moving object. This paper presents a shadow removal method, based on both color and texture features, aiming to contribute to retrieve efficiently the moving objects whose detection are usually under the influence of cast-shadows. Additionally, in order to get a shadow-free foreground segmentation image, a morphology reconstruction algorithm is used to recover the foreground disturbed by shadow removal. Once shadows are detected, an automatic shadow removal model is proposed based on the information retrieved from the histogram shape. Experimental results on a real traffic sequence is presented to test the proposed approach and to validate the algorithm's performance.
Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.
This paper analyzes score normalization for keystroke dynamics authentication systems. Previous studies have shown that the performance of behavioral biometric recognition systems (e.g. voice and signature) can be largely improved with score normalization and target-dependent techniques. The main objective of this work is twofold: i) to analyze the effects of different thresholding techniques in 4 different keystroke dynamics recognition systems for real operational scenarios; and ii) to improve the performance of keystroke dynamics on the basis of target-dependent score normalization techniques. The experiments included in this work are worked out over the keystroke pattern of 114 users from two different publicly available databases. The experiments show that there is large room for improvements in keystroke dynamic systems. The results suggest that score normalization techniques can be used to improve the performance of keystroke dynamics systems in more than 20%. These results encourage researchers to explore this research line to further improve the performance of these systems in real operational environments.
A process of de-identification used for privacy protection in multimedia content should be applied not only for primary biometric traits (face, voice) but for soft biometric traits as well. This paper deals with a proposal of the automatic hair color de-identification method working with video records. The method involves image hair area segmentation, basic hair color recognition, and modification of hair color for real-looking de-identified images.
This paper describes Smartpig, an algorithm for the iterative mosaicking of images of a planar surface using a unique parameterization which decomposes inter-image projective warps into camera intrinsics, fronto-parallel projections, and inter-image similarities. The constraints resulting from the inter-image alignments within an image set are stored in an undirected graph structure allowing efficient optimization of image projections on the plane. Camera pose is also directly recoverable from the graph, making Smartpig a feasible solution to the problem of simultaneous location and mapping (SLAM). Smartpig is demonstrated on a set of 144 high resolution aerial images and evaluated with a number of metrics against ground control.
The TRECVID report of 2010 [14] evaluated video shot boundary detectors as achieving "excellent performance on [hard] cuts and gradual transitions." Unfortunately, while re-evaluating the state of the art of the shot boundary detection, we found that they need to be improved because the characteristics of consumer-produced videos have changed significantly since the introduction of mobile gadgets, such as smartphones, tablets and outdoor activity purposed cameras, and video editing software has been evolving rapidly. In this paper, we evaluate the best-known approach on a contemporary, publicly accessible corpus, and present a method that achieves better performance, particularly on soft transitions. Our method combines color histograms with key point feature matching to extract comprehensive frame information. Two similarity metrics, one for individual frames and one for sets of frames, are defined based on graph cuts. These metrics are formed into temporal feature vectors on which a SVM is trained to perform the final segmentation. The evaluation on said "modern" corpus of relatively short videos yields a performance of 92% recall (at 89% precision) overall, compared to 69% (91%) of the best-known method.