Biblio
Video surveillance, closed-circuit TV and IP-camera systems became virtually omnipresent and indispensable for many organizations, businesses, and users. Their main purpose is to provide physical security, increase safety, and prevent crime. They also became increasingly complex, comprising many communication means, embedded hardware and non-trivial firmware. However, most research to date focused mainly on the privacy aspects of such systems, and did not fully address their issues related to cyber-security in general, and visual layer (i.e., imagery semantics) attacks in particular. In this paper, we conduct a systematic review of existing and novel threats in video surveillance, closed-circuit TV and IP-camera systems based on publicly available data. The insights can then be used to better understand and identify the security and the privacy risks associated with the development, deployment and use of these systems. We study existing and novel threats, along with their existing or possible countermeasures, and summarize this knowledge into a comprehensive table that can be used in a practical way as a security checklist when assessing cyber-security level of existing or new CCTV designs and deployments. We also provide a set of recommendations and mitigations that can help improve the security and privacy levels provided by the hardware, the firmware, the network communications and the operation of video surveillance systems. We hope the findings in this paper will provide a valuable knowledge of the threat landscape that such systems are exposed to, as well as promote further research and widen the scope of this field beyond its current boundaries.
With recent advances in consumer electronics and the increasingly urgent need for public security, camera networks have evolved from their early role of providing simple and static monitoring to current complex systems capable of obtaining extensive video information for intelligent processing, such as target localization, identification, and tracking. In all cases, it is of vital importance that the optimal camera configuration (i.e., optimal location, orientation, etc.) is determined before cameras are deployed as a suboptimal placement solution will adversely affect intelligent video surveillance and video analytic algorithms. The optimal configuration may also provide substantial savings on the total number of cameras required to achieve the same level of utility. In this article, we examine most, if not all, of the recent approaches (post 2000) addressing camera placement in a structured manner. We believe that our work can serve as a first point of entry for readers wishing to start researching into this area or engineers who need to design a camera system in practice. To this end, we attempt to provide a complete study of relevant formulation strategies and brief introductions to most commonly used optimization techniques by researchers in this field. We hope our work to be inspirational to spark new ideas in the field.
This paper evaluates a new video surveillance platform presented in a previous study, through an abandoned object detection task. The proposed platform has a function of automated detection and alerting, which is still a big challenge for a machine algorithm due to its recall-precision tradeoff problem. To achieve both high recall and high precision simultaneously, a hybrid approach using crowdsourcing after image analysis is proposed. This approach, however, is still not clear about what extent it can improve detection accuracy and raise quicker alerts. In this paper, the experiment is conducted for abandoned object detection, as one of the most common surveillance tasks. The results show that detection accuracy was improved from 50% (without crowdsourcing) to stable 95-100% (with crowdsourcing) by majority vote of 7 crowdworkers for each task. In contrast, alert time issue still remains open to further discussion since at least 7+ minutes are required to get the best performance.
In the area of the Internet of Things, cloud-based camera surveillance systems are ubiquitously available for industrial and private environments. However, the sensitive nature of the surveillance use case imposes high requirements on privacy/confidentiality, authenticity, and availability of such systems. In this work, we investigate how currently available mass-market camera systems comply with these requirements. Considering two attacker models, we test the cameras for weaknesses and analyze for their implications. We reverse-engineered the security implementation and discovered several vulnerabilities in every tested system. These weaknesses impair the users' privacy and, as a consequence, may also damage the camera system manufacturer's reputation. We demonstrate how an attacker can exploit these vulnerabilities to blackmail users and companies by denial-of-service attacks, injecting forged video streams, and by eavesdropping private video data - even without physical access to the device. Our analysis shows that current systems lack in practice the necessary care when implementing security for IoT devices.
With the outgrowth of video editing tools, video information trustworthiness becomes a hypersensitive field. Today many devices have the capability of capturing digital videos such as CCTV, digital cameras and mobile phones and these videos may transmitted over the Internet or any other non secure channel. As digital video can be used to as supporting evidence, it has to be protected against manipulation or tampering. As most video authentication techniques are based on watermarking and digital signatures, these techniques are effectively used in copyright purposes but difficult to implement in other cases such as video surveillance or in videos captured by consumer's cameras. In this paper we propose an intelligent technique for video authentication which uses the video local information which makes it useful for real world applications. The proposed algorithm relies on the video's statistical local information which was applied on a dataset of videos captured by a range of consumer video cameras. The results show that the proposed algorithm has potential to be a reliable intelligent technique in digital video authentication without the need to use for SVM classifier which makes it faster and less computationally expensive in comparing with other intelligent techniques.
Detecting stationary crowd groups and analyzing their behaviors have important applications in crowd video surveillance, but have rarely been studied. The contributions of this paper are in two aspects. First, a stationary crowd detection algorithm is proposed to estimate the stationary time of foreground pixels. It employs spatial-temporal filtering and motion filtering in order to be robust to noise caused by occlusions and crowd clutters. Second, in order to characterize the emergence and dispersal processes of stationary crowds and their behaviors during the stationary periods, three attributes are proposed for quantitative analysis. These attributes are recognized with a set of proposed crowd descriptors which extract visual features from the results of stationary crowd detection. The effectiveness of the proposed algorithms is shown through experiments on a benchmark dataset.
Multiple-object tracking is an important task in automated video surveillance. In this paper, we present a multiple-human-tracking approach that takes the single-frame human detection results as input and associates them to form trajectories while improving the original detection results by making use of reliable temporal information in a closed-loop manner. It works by first forming tracklets, from which reliable temporal information is extracted, and then refining the detection responses inside the tracklets, which also improves the accuracy of tracklets' quantities. After this, local conservative tracklet association is performed and reliable temporal information is propagated across tracklets so that more detection responses can be refined. The global tracklet association is done last to resolve association ambiguities. Experimental results show that the proposed approach improves both the association and detection results. Comparison with several state-of-the-art approaches demonstrates the effectiveness of the proposed approach.
Abnormal crowd behavior detection is an important research issue in video processing and computer vision. In this paper we introduce a novel method to detect abnormal crowd behaviors in video surveillance based on interest points. A complex network-based algorithm is used to detect interest points and extract the global texture features in scenarios. The performance of the proposed method is evaluated on publicly available datasets. We present a detailed analysis of the characteristics of the crowd behavior in different density crowd scenes. The analysis of crowd behavior features and simulation results are also demonstrated to illustrate the effectiveness of our proposed method.
This paper presents the application of fusion meth- ods to a visual surveillance scenario. The range of relevant features for re-identifying vehicles is discussed, along with the methods for fusing probabilistic estimates derived from these estimates. In particular, two statistical parametric fusion methods are considered: Bayesian Networks and the Dempster Shafer approach. The main contribution of this paper is the development of a metric to allow direct comparison of the benefits of the two methods. This is achieved by generalising the Kelly betting strategy to accommodate a variable total stake for each sample, subject to a fixed expected (mean) stake. This metric provides a method to quantify the extra information provided by the Dempster-Shafer method, in comparison to a Bayesian Fusion approach.
Many surveillance cameras are using everywhere, the videos or images captured by these cameras are still dumped but they are not processed. Many methods are proposed for tracking and detecting the objects in the videos but we need the meaningful content called semantic content from these videos. Detecting Human activity recognition is quite complex. The proposed method called Semantic Content Extraction (SCE) from videos is used to identify the objects and the events present in the video. This model provides useful methodology for intruder detecting systems which provides the behavior and the activities performed by the intruder. Construction of ontology enhances the spatial and temporal relations between the objects or features extracted. Thus proposed system provides a best way for detecting the intruders, thieves and malpractices happening around us.
This paper propose a fast human detection algorithm of video surveillance in emergencies. Firstly through the background subtraction based on the single Guassian model and frame subtraction, we get the target mask which is optimized by Gaussian filter and dilation. Then the interest points of head is obtained from figures with target mask and edge detection. Finally according to detecting these pionts we can track the head and count the number of people with the frequence of moving target at the same place. Simulation results show that the algorithm can detect the moving object quickly and accurately.
This paper presents a human model-based feature extraction method for a video surveillance retrieval system. The proposed method extracts, from a normalized scene, object features such as height, speed, and representative color using a simple human model based on multiple-ellipse. Experimental results show that the proposed system can effectively track moving routes of people such as a missing child, an absconder, and a suspect after events.
Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two un-synchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data.
An abnormal behavior detection algorithm for surveillance is required to correctly identify the targets as being in a normal or chaotic movement. A model is developed here for this purpose. The uniqueness of this algorithm is the use of foreground detection with Gaussian mixture (FGMM) model before passing the video frames to optical flow model using Lucas-Kanade approach. Information of horizontal and vertical displacements and directions associated with each pixel for object of interest is extracted. These features are then fed to feed forward neural network for classification and simulation. The study is being conducted on the real time videos and some synthesized videos. Accuracy of method has been calculated by using the performance parameters for Neural Networks. In comparison of plain optical flow with this model, improved results have been obtained without noise. Classes are correctly identified with an overall performance equal to 3.4e-02 with & error percentage of 2.5.
We propose a method for analysis of surveillance video by using low rank and sparse decomposition (LRSD) with low latency combined with compressive sensing to segment the background and extract moving objects in a surveillance video. Video is acquired by compressive measurements, and the measurements are used to analyze the video by a low rank and sparse decomposition of a matrix. The low rank component represents the background, and the sparse component, which is obtained in a tight wavelet frame domain, is used to identify moving objects in the surveillance video. An important feature of the proposed low latency method is that the decomposition can be performed with a small number of video frames, which reduces latency in the reconstruction and makes it possible for real time processing of surveillance video. The low latency method is both justified theoretically and validated experimentally.
To reduce human efforts in browsing long surveillance videos, synopsis videos are proposed. Traditional synopsis video generation applying optimization on video tubes is very time consuming and infeasible for real-time online generation. This dilemma significantly reduces the feasibility of synopsis video generation in practical situations. To solve this problem, the synopsis video generation problem is formulated as a maximum a posteriori probability (MAP) estimation problem in this paper, where the positions and appearing frames of video objects are chronologically rearranged in real time without the need to know their complete trajectories. Moreover, a synopsis table is employed with MAP estimation to decide the temporal locations of the incoming foreground objects in the synopsis video without needing an optimization procedure. As a result, the computational complexity of the proposed video synopsis generation method can be significantly reduced. Furthermore, as it does not require prescreening the entire video, this approach can be applied on online streaming videos.
The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.
H.264/advanced video coding surveillance video encoders use the Skip mode specified by the standard to reduce bandwidth. They also use multiple frames as reference for motion-compensated prediction. In this paper, we propose two techniques to reduce the bandwidth and computational cost of static camera surveillance video encoders without affecting detection and recognition performance. A spatial sampler is proposed to sample pixels that are segmented using a Gaussian mixture model. Modified weight updates are derived for the parameters of the mixture model to reduce floating point computations. A storage pattern of the parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. The second contribution is a low computational cost algorithm to choose the reference frames. The proposed reference frame selection algorithm reduces the cost of coding uncovered background regions. We also study the number of reference frames required to achieve good coding efficiency. Distortion over foreground pixels is measured to quantify the performance of the proposed techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence.
Surveillance video streams monitoring is an important task that the surveillance operators usually carry out. The distribution of video surveillance facilities over multiple premises and the mobility of surveillance users requires that they are able to view surveillance video seamlessly from their mobile devices. In order to satisfy this requirement, we propose a cloud-based IPTV (Internet Protocol Television) solution that leverages the power of cloud infrastructure and the benefits of IPTV technology to seamlessly deliver surveillance video content on different client devices anytime and anywhere. The proposed mechanism also supports user-controlled frame rate adjustment of video streams and sharing of these streams with other users. In this paper, we describe the overall approach of this idea, address and identify key technical challenges for its practical implementation. In addition, initial experimental results were presented to justify the viability of the proposed cloud-based IPTV surveillance framework over the traditional IPTV surveillance approach.
Unmanned Aerial Systems (UAS) have raised a great concern on privacy recently. A practical method to protect privacy is needed for adopting UAS in civilian airspace. This paper examines the privacy policies, filtering strategies, existing techniques, then proposes a novel method based on the encrypted video stream and the cloud-based privacy servers. In this scheme, all video surveillance images are initially encrypted, then delivered to a privacy server. The privacy server decrypts the video using the shared key with the camera, and filters the image according to the privacy policy specified for the surveyed region. The sanitized video is delivered to the surveillance operator or anyone on the Internet who is authorized. In a larger system composed of multiple cameras and multiple privacy servers, the keys can be distributed using Kerberos protocol. With this method the privacy policy can be changed on demand in real-time and there is no need for a costly on-board processing unit. By utilizing the cloud-based servers, advanced image processing algorithms and new filtering algorithms can be applied immediately without upgrading the camera software. This method is cost-efficient and promotes video sharing among multiple subscribers, thus it can spur wide adoption.
The video surveillance widely installed in public areas poses a significant threat to the privacy. This paper proposes a new privacy preserving method via the Generalized Random-Grid based Visual Cryptography Scheme (GRG-based VCS). We first separate the foreground from the background for each video frame. These foreground pixels contain the most important information that needs to be protected. Every foreground area is encrypted into two shares based on GRG-based VCS. One share is taken as the foreground, and the other one is embedded into another frame with random selection. The content of foreground can only be recovered when these two shares are got together. The performance evaluation on several surveillance scenarios demonstrates that our proposed method can effectively protect sensitive privacy information in surveillance videos.