Biblio
Deep learning is the segment of artificial intelligence which is involved with imitating the learning approach that human beings utilize to get some different types of knowledge. Analyzing videos, a part of deep learning is one of the most basic problems of computer vision and multi-media content analysis for at least 20 years. The job is very challenging as the video contains a lot of information with large differences and difficulties. Human supervision is still required in all surveillance systems. New advancement in computer vision which are observed as an important trend in video surveillance leads to dramatic efficiency gains. We propose a CCTV based theft detection along with tracking of thieves. We use image processing to detect theft and motion of thieves in CCTV footage, without the use of sensors. This system concentrates on object detection. The security personnel can be notified about the suspicious individual committing burglary using Real-time analysis of the movement of any human from CCTV footage and thus gives a chance to avert the same.
Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.
Surveillance cameras, which is a form of Cyber Physical System, are deployed extensively to provide visual surveillance monitoring of activities of interest or anomalies. However, these cameras are at risks of physical security attacks against their physical attributes or configuration like tampering of their recording coverage, camera positions or recording configurations like focus and zoom factors. Such adversarial alteration of physical configuration could also be invoked through cyber security attacks against the camera's software vulnerabilities to administratively change the camera's physical configuration settings. When such Cyber Physical attacks occur, they affect the integrity of the targeted cameras that would in turn render these cameras ineffective in fulfilling the intended security functions. There is a significant measure of research work in detection mechanisms of cyber-attacks against these Cyber Physical devices, however it is understudied area with such mechanisms against integrity attacks on physical configuration. This research proposes the use of the novel use of deep learning algorithms to detect such physical attacks originating from cyber or physical spaces. Additionally, we proposed the novel use of deep learning-based video frame interpolation for such detection that has comparatively better performance to other anomaly detectors in spatiotemporal environments.
Safety is one of basic human needs so we need a security system that able to prevent crime happens. Commonly, we use surveillance video to watch environment and human behaviour in a location. However, the surveillance video can only used to record images or videos with no additional information. Therefore we need more advanced camera to get another additional information such as human position and movement. This research were able to extract those information from surveillance video footage by using human detection and tracking algorithm. The human detection framework is based on Deep Learning Convolutional Neural Networks which is a very popular branch of artificial intelligence. For tracking algorithms, channel and spatial correlation filter is used to track detected human. This system will generate and export tracked movement on footage as an additional information. This tracked movement can be analysed furthermore for another research on surveillance video problems.
In the past few years, visual information collection and transmission is increased significantly for various applications. Smart vehicles, service robotic platforms and surveillance cameras for the smart city applications are collecting a large amount of visual data. The preservation of the privacy of people presented in this data is an important factor in storage, processing, sharing and transmission of visual data across the Internet of Robotic Things (IoRT). In this paper, a novel anonymisation method for information security and privacy preservation in visual data in sharing layer of the Web of Robotic Things (WoRT) is proposed. The proposed framework uses deep neural network based semantic segmentation to preserve the privacy in video data base of the access level of the applications and users. The data is anonymised to the applications with lower level access but the applications with higher legal access level can analyze and annotated the complete data. The experimental results show that the proposed method while giving the required access to the authorities for legal applications of smart city surveillance, is capable of preserving the privacy of the people presented in the data.
The usage of small drones/UAVs has significantly increased recently. Consequently, there is a rising potential of small drones being misused for illegal activities such as terrorism, smuggling of drugs, etc. posing high-security risks. Hence, tracking and surveillance of drones are essential to prevent security breaches. The similarity in the appearance of small drone and birds in complex background makes it challenging to detect drones in surveillance videos. This paper addresses the challenge of detecting small drones in surveillance videos using popular and advanced deep learning-based object detection methods. Different CNN-based architectures such as ResNet-101 and Inception with Faster-RCNN, as well as Single Shot Detector (SSD) model was used for experiments. Due to sparse data available for experiments, pre-trained models were used while training the CNNs using transfer learning. Best results were obtained from experiments using Faster-RCNN with the base architecture of ResNet-101. Experimental analysis on different CNN architectures is presented in the paper, along with the visual analysis of the test dataset.
Today, as surveillance systems are widely used for indoor and outdoor monitoring applications, there is a growing interest in real-time generation detection and there are many different applications for real-time generation detection and analysis. Two-dimensional videos; It is used in multimedia content-based indexing, information acquisition, visual surveillance and distributed cross-camera surveillance systems, human tracking, traffic monitoring and similar applications. It is of great importance for the development of systems for national security by following a moving target within the scope of military applications. In this research, a more efficient solution is proposed in addition to the existing methods. Therefore, we present YOLO, a new approach to object detection for military applications.
Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.
With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.
Devices sharing a network compete for bandwidth, being able to transmit only a limited amount of data. This is for example the case with a network of cameras, that should record and transmit video streams to a monitor node for video surveillance. Adaptive cameras can reduce the quality of their video, thereby increasing the frame compression, to limit network congestion. In this paper, we exploit our experience with computing capacity allocation to design and implement a network bandwidth allocation strategy based on game theory, that accommodates multiple adaptive streams with convergence guarantees. We conduct some experiments with our implementation and discuss the results, together with some conclusions and future challenges.
Person re-identification is an important task in video surveillance, focusing on finding the same person across different cameras. However, most existing methods of video-based person re-identification still have some limitations (e.g., the lack of effective deep learning framework, the robustness of the model, and the same treatment for all video frames) which make them unable to achieve better recognition performance. In this paper, we propose a novel self-paced learning algorithm for video-based person re-identification, which could gradually learn from simple to complex samples for a mature and stable model. Self-paced learning is employed to enhance video-based person re-identification based on deep neural network, so that deep neural network and self-paced learning are unified into one frame. Then, based on the trained self-paced learning, we propose to employ deep reinforcement learning to discard misleading and confounding frames and find the most representative frames from video pairs. With the advantage of deep reinforcement learning, our method can learn strategies to select the optimal frame groups. Experiments show that the proposed framework outperforms the existing methods on the iLIDS-VID, PRID-2011 and MARS datasets.
Abnormal event detection in video surveillance is a valuable but challenging problem. Most methods adopt a supervised setting that requires collecting videos with only normal events for training. However, very few attempts are made under unsupervised setting that detects abnormality without priorly knowing normal events. Existing unsupervised methods detect drastic local changes as abnormality, which overlooks the global spatio-temporal context. This paper proposes a novel unsupervised approach, which not only avoids manually specifying normality for training as supervised methods do, but also takes the whole spatio-temporal context into consideration. Our approach consists of two stages: First, normality estimation stage trains an autoencoder and estimates the normal events globally from the entire unlabeled videos by a self-adaptive reconstruction loss thresholding scheme. Second, normality modeling stage feeds the estimated normal events from the previous stage into one-class support vector machine to build a refined normality model, which can further exclude abnormal events and enhance abnormality detection performance. Experiments on various benchmark datasets reveal that our method is not only able to outperform existing unsupervised methods by a large margin (up to 14.2% AUC gain), but also favorably yields comparable or even superior performance to state-of-the-art supervised methods.
Crowd surveillance will play a fundamental role in the coming generation of video surveillance systems, in particular for improving public safety and security. However, traditional camera networks are mostly not able to closely survey the entire monitoring area due to limitations in coverage, resolution and analytics performance. A smart camera network, on the other hand, offers the ability to reconfigure the sensing infrastructure by incorporating active devices such as pan-tilt-zoom (PTZ) cameras and UAV-based cameras, which enable the adaptation of coverage and target resolution over time. This paper proposes a novel decentralized approach for dynamic network reconfiguration, where cameras locally control their PTZ parameters and position, to optimally cover the entire scene. For crowded scenes, cameras must deal with a trade-off among global coverage and target resolution to effectively perform crowd analysis. We evaluate our approach in a simulated environment surveyed with fixed, PTZ, and UAV-based cameras.
Loitering is a suspicious behavior that often leads to criminal actions, such as pickpocketing and illegal entry. Tracking methods can determine suspicious behavior based on trajectory, but require continuous appearance and are difficult to scale up to multi-camera systems. Using the duration of appearance of features works on multiple cameras, but does not consider major aspects of loitering behavior, such as repeated appearance and trajectory of candidates. We introduce an entropy model that maps the location of a person's features on a heatmap. It can be used as an abstraction of trajectory tracking across multiple surveillance cameras. We evaluate our method over several datasets and compare it to other loitering detection methods. The results show that our approach has similar results to state of the art, but can provide additional interesting candidates.