Biblio
Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.
Safety is one of basic human needs so we need a security system that able to prevent crime happens. Commonly, we use surveillance video to watch environment and human behaviour in a location. However, the surveillance video can only used to record images or videos with no additional information. Therefore we need more advanced camera to get another additional information such as human position and movement. This research were able to extract those information from surveillance video footage by using human detection and tracking algorithm. The human detection framework is based on Deep Learning Convolutional Neural Networks which is a very popular branch of artificial intelligence. For tracking algorithms, channel and spatial correlation filter is used to track detected human. This system will generate and export tracked movement on footage as an additional information. This tracked movement can be analysed furthermore for another research on surveillance video problems.
Video retrieval technology faces a series of challenges with the tremendous growth in the number of videos. In order to improve the retrieval performance in efficiency and accuracy, a novel deep hash method for video data hierarchical retrieval is proposed in this paper. The approach first uses cluster-based method to extract key frames, which reduces the workload of subsequent work. On the basis of this, high-level semantical features are extracted from VGG16, a widely used deep convolutional neural network (deep CNN) model. Then we utilize a hierarchical retrieval strategy to improve the retrieval performance, roughly can be categorized as coarse search and fine search. In coarse search, we modify simHash to learn hash codes for faster speed, and in fine search, we use the Euclidean distance to achieve higher accuracy. Finally, we compare our approach with other two methods through practical experiments on two videos, and the results demonstrate that our approach has better retrieval effect.
Video Steganography is an extension of image steganography where any kind of file in any extension is hidden into a digital video. The video content is dynamic in nature and this makes the detection of hidden data difficult than other steganographic techniques. The main motive of using video steganography is that the videos can store large amount of data in it. This paper focuses on security using the combination of hybrid neural networks and hash function for determining the best bits in the cover video to embed the secret data. For the embedding process, the cover video and the data to be hidden is uploaded. Then the hash algorithm and neural networks are applied to form the stego video. For the extraction process, the reverse process is applied and the secret data is obtained. All experiments are done using MatLab2016a software.
Steganography is defined as the art of hiding secret data in a non-secret digital carrier called cover media. Trading delicate data without assurance against intruders that may intrude on this data is a lethal. In this manner, transmitting delicate information and privileged insights must not rely on upon just the current communications channels insurance advancements. Likewise should make more strides towards information insurance. This article proposes an improved approach for video steganography. The improvement made by searching for exact matching between the secret text and the video frames RGB channels and Random Key -Dependent Data, achieving steganography performance criteria, invisibility, payload/capacity and robustness.
The natural redundancy in video data due to its spatio-temporal correlation of neighbouring pixels require highly complex encryption process to successfully cipher the data. Conventional encryption methods are based on lengthy keys and higher number of rounds which are inefficient for low powered, small battery operated devices. Motivated by the success of lightweight encryption methods specially designed for IoT environment, herein an efficient method for video encryption is proposed. The proposed technique is based on a recently proposed encryption algorithm named Secure IoT (SIT), which utilizes P and Q functions of the KHAZAD cipher to achieve high encryption at low computation cost. Extensive simulations are performed to evaluate the efficacy of the proposed method and results are compared with Secure Force (SF-64) cipher. Under all conditions the proposed method achieved significantly improved results.
With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.
Training a feed-forward network for the fast neural style transfer of images has proven successful, but the naive extension of processing videos frame by frame is prone to producing flickering results. We propose the first end-to-end network for online video style transfer, which generates temporally coherent stylized video sequences in near realtime. Two key ideas include an efficient network by incorporating short-term coherence, and propagating short-term coherence to long-term, which ensures consistency over a longer period of time. Our network can incorporate different image stylization networks and clearly outperforms the per-frame baseline both qualitatively and quantitatively. Moreover, it can achieve visually comparable coherence to optimization-based video style transfer, but is three orders of magnitude faster.
Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods. However, real-time methods are highly unstable resulting in visible flickering when applied to videos. In this work we characterize the instability of these methods by examining the solution set of the style transfer objective. We show that the trace of the Gram matrix representing style is inversely related to the stability of the method. Then, we present a recurrent convolutional network for real-time video style transfer which incorporates a temporal consistency loss and overcomes the instability of prior methods. Our networks can be applied at any resolution, do not require optical flow at test time, and produce high quality, temporally consistent stylized videos in real-time.
Recent research endeavors have shown the potential of using feed-forward convolutional neural networks to accomplish fast style transfer for images. In this work, we take one step further to explore the possibility of exploiting a feed-forward network to perform style transfer for videos and simultaneously maintain temporal consistency among stylized video frames. Our feed-forward network is trained by enforcing the outputs of consecutive frames to be both well stylized and temporally consistent. More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames. To calculate the temporal loss during the training stage, a novel two-frame synergic training mechanism is proposed. Compared with directly applying an existing image style transfer method to videos, our proposed method employs the trained network to yield temporally consistent stylized videos which are much more visually pleasant. In contrast to the prior video style transfer method which relies on time-consuming optimization on the fly, our method runs in real time while generating competitive visual results.
A high definition(HD) wide dynamic video surveillance system is designed and implemented based on Field Programmable Gate Array(FPGA). This system is composed of three subsystems, which are video capture, video wide dynamic processing and video display subsystem. The images in the video are captured directly through the camera that is configured in a pattern have long exposure in odd frames and short exposure in even frames. The video data stream is buffered in DDR2 SDRAM to obtain two adjacent frames. Later, the image data fusion is completed by fusing the long exposure image with the short exposure image (pixel by pixel). The video image display subsystem can display the image through a HDMI interface. The system is designed on the platform of Lattice ECP3-70EA FPGA, and camera is the Panasonic MN34229 sensor. The experimental result shows that this system can expand dynamic range of the HD video with 30 frames per second and a resolution equal to 1920*1080 pixels by real-time wide dynamic range (WDR) video processing, and has a high practical value.
The world is witnessing a remarkable increase in the usage of video surveillance systems. Besides fulfilling an imperative security and safety purpose, it also contributes towards operations monitoring, hazard detection and facility management in industry/smart factory settings. Most existing surveillance techniques use hand-crafted features analyzed using standard machine learning pipelines for action recognition and event detection. A key shortcoming of such techniques is the inability to learn from unlabeled video streams. The entire video stream is unlabeled when the requirement is to detect irregular, unforeseen and abnormal behaviors, anomalies. Recent developments in intelligent high-level video analysis have been successful in identifying individual elements in a video frame. However, the detection of anomalies in an entire video feed requires incremental and unsupervised machine learning. This paper presents a novel approach that incorporates high-level video analysis outcomes with incremental knowledge acquisition and self-learning for autonomous video surveillance. The proposed approach is capable of detecting changes that occur over time and separating irregularities from re-occurrences, without the prerequisite of a labeled dataset. We demonstrate the proposed approach using a benchmark video dataset and the results confirm its validity and usability for autonomous video surveillance.
We present an object tracking framework which fuses multiple unstable video-based methods and supports automatic tracker initialization and termination. To evaluate our system, we collected a large dataset of hand-annotated 5-minute traffic surveillance videos, which we are releasing to the community. To the best of our knowledge, this is the first publicly available dataset of such long videos, providing a diverse range of real-world object variation, scale change, interaction, different resolutions and illumination conditions. In our comprehensive evaluation using this dataset, we show that our automatic object tracking system often outperforms state-of-the-art trackers, even when these are provided with proper manual initialization. We also demonstrate tracking throughput improvements of 5× or more vs. the competition.
This work introduces concepts and algorithms along with a case study validating them, to enhance the event detection, pattern recognition and anomaly identification results in real life video surveillance. The motivation for the work underlies in the observation that human behavioral patterns in general continuously evolve and adapt with time, rather than being static. First, limitations in existing work with respect to this phenomena are identified. Accordingly, the notion and algorithms of Dynamic Clustering are introduced in order to overcome these drawbacks. Correspondingly, we propose the concept of maintaining two separate sets of data in parallel, namely the Normal Plane and the Anomaly Plane, to successfully achieve the task of learning continuously. The practicability of the proposed algorithms in a real life scenario is demonstrated through a case study. From the analysis presented in this work, it is evident that a more comprehensive analysis, closely following human perception can be accomplished by incorporating the proposed notions and algorithms in a video surveillance event.