Biblio
Deep learning techniques have demonstrated the ability to perform a variety of object recognition tasks using visible imager data; however, deep learning has not been implemented as a means to autonomously detect and assess targets of interest in a physical security system. We demonstrate the use of transfer learning on a convolutional neural network (CNN) to significantly reduce training time while keeping detection accuracy of physical security relevant targets high. Unlike many detection algorithms employed by video analytics within physical security systems, this method does not rely on temporal data to construct a background scene; targets of interest can halt motion indefinitely and still be detected by the implemented CNN. A key advantage of using deep learning is the ability for a network to improve over time. Periodic retraining can lead to better detection and higher confidence rates. We investigate training data size versus CNN test accuracy using physical security video data. Due to the large number of visible imagers, significant volume of data collected daily, and currently deployed human in the loop ground truth data, physical security systems present a unique environment that is well suited for analysis via CNNs. This could lead to the creation of algorithmic element that reduces human burden and decreases human analyzed nuisance alarms.
Identity masking methods have been developed in recent years for use in multiple applications aimed at protecting privacy. There is only limited work, however, targeted at evaluating effectiveness of methods-with only a handful of studies testing identity masking effectiveness for human perceivers. Here, we employed human participants to evaluate identity masking algorithms on video data of drivers, which contains subtle movements of the face and head. We evaluated the effectiveness of the “personalized supervised bilinear regression method for Facial Action Transfer (FAT)” de-identification algorithm. We also evaluated an edge-detection filter, as an alternate “fill-in” method when face tracking failed due to abrupt or fast head motions. Our primary goal was to develop methods for humanbased evaluation of the effectiveness of identity masking. To this end, we designed and conducted two experiments to address the effectiveness of masking in preventing recognition and in preserving action perception. 1- How effective is an identity masking algorithm?We conducted a face recognition experiment and employed Signal Detection Theory (SDT) to measure human accuracy and decision bias. The accuracy results show that both masks (FAT mask and edgedetection) are effective, but that neither completely eliminated recognition. However, the decision bias data suggest that both masks altered the participants' response strategy and made them less likely to affirm identity. 2- How effectively does the algorithm preserve actions? We conducted two experiments on facial behavior annotation. Results showed that masking had a negative effect on annotation accuracy for the majority of actions, with differences across action types. Notably, the FAT mask preserved actions better than the edge-detection mask. To our knowledge, this is the first study to evaluate a deidentification method aimed at preserving facial ac- ions employing human evaluators in a laboratory setting.
In recent years, more and more multimedia data are generated and transmitted in various fields. So, many encryption methods for multimedia content have been put forward to satisfy various applications. However, there are still some open issues. Each encryption method has its advantages and drawbacks. Our main goal is expected to provide a solution for multimedia encryption which satisfies the target application constraints and performs metrics of the encryption algorithm. The Advanced Encryption Standard (AES) is the most popular algorithm used in symmetric key cryptography. Furthermore, chaotic encryption is a new research direction of cryptography which is characterized by high initial-value sensitivity and good randomness. In this paper we propose a hybrid video cryptosystem which combines two encryption techniques. The proposed cryptosystem realizes the video encryption through the chaos and AES in CTR mode. Experimental results and security analysis demonstrate that this cryptosystem is highly efficient and a robust system for video encryption.
This paper presents a framework for privacy-preserving video delivery system to fulfill users' privacy demands. The proposed framework leverages the inference channels in sensitive behavior prediction and object tracking in a video surveillance system for the sequence privacy protection. For such a goal, we need to capture different pieces of evidence which are used to infer the identity. The temporal, spatial and context features are extracted from the surveillance video as the observations to perceive the privacy demands and their correlations. Taking advantage of quantifying various evidence and utility, we let users subscribe videos with a viewer-dependent pattern. We implement a prototype system for off-line and on-line requirements in two typical monitoring scenarios to construct extensive experiments. The evaluation results show that our system can efficiently satisfy users' privacy demands while saving over 25% more video information compared to traditional video privacy protection schemes.
This paper address the problem of shadow detection and removal in traffic vision analysis. Basically, the presence of the shadow in the traffic sequences is imminent, and therefore leads to errors at segmentation stage and often misclassified as an object region or as a moving object. This paper presents a shadow removal method, based on both color and texture features, aiming to contribute to retrieve efficiently the moving objects whose detection are usually under the influence of cast-shadows. Additionally, in order to get a shadow-free foreground segmentation image, a morphology reconstruction algorithm is used to recover the foreground disturbed by shadow removal. Once shadows are detected, an automatic shadow removal model is proposed based on the information retrieved from the histogram shape. Experimental results on a real traffic sequence is presented to test the proposed approach and to validate the algorithm's performance.
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. At the same time, there is an increasing need to share such video data across a wide spectrum of stakeholders including professionals, therapists and families facing similar challenges. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this paper, we propose a method of manipulating facial expression and body shape to conceal the identity of individuals while preserving the underlying affect states. The experiment results demonstrate the effectiveness of our method.
In this paper we present a framework for Quality of Information (QoI)-aware networking. QoI quantifies how useful a piece of information is for a given query or application. Herein, we present a general QoI model, as well as a specific example instantiation that carries throughout the rest of the paper. In this model, we focus on the tradeoffs between precision and accuracy. As a motivating example, we look at traffic video analysis. We present simple algorithms for deriving various traffic metrics from video, such as vehicle count and average speed. We implement these algorithms both on a desktop workstation and less-capable mobile device. We then show how QoI-awareness enables end devices to make intelligent decisions about how to process queries and form responses, such that huge bandwidth savings are realized.
The TRECVID report of 2010 [14] evaluated video shot boundary detectors as achieving "excellent performance on [hard] cuts and gradual transitions." Unfortunately, while re-evaluating the state of the art of the shot boundary detection, we found that they need to be improved because the characteristics of consumer-produced videos have changed significantly since the introduction of mobile gadgets, such as smartphones, tablets and outdoor activity purposed cameras, and video editing software has been evolving rapidly. In this paper, we evaluate the best-known approach on a contemporary, publicly accessible corpus, and present a method that achieves better performance, particularly on soft transitions. Our method combines color histograms with key point feature matching to extract comprehensive frame information. Two similarity metrics, one for individual frames and one for sets of frames, are defined based on graph cuts. These metrics are formed into temporal feature vectors on which a SVM is trained to perform the final segmentation. The evaluation on said "modern" corpus of relatively short videos yields a performance of 92% recall (at 89% precision) overall, compared to 69% (91%) of the best-known method.
The technology of vehicle video detecting and tracking has been playing an important role in the ITS (Intelligent Transportation Systems) field during recent years. The occlusion phenomenon among vehicles is one of the most difficult problems related to vehicle tracking. In order to handle occlusion, this paper proposes an effective solution that applied Markov Random Field (MRF) to the traffic images. The contour of the vehicle is firstly detected by using background subtraction, then numbers of blocks with vehicle's texture and motion information are filled inside each vehicle. We extract several kinds of information of each block to process the following tracking. As for each occlusive block two groups of clique functions in MRF model are defined, which represents spatial correlation and motion coherence respectively. By calculating each occlusive block's total energy function, we finally solve the attribution problem of occlusive blocks. The experimental results show that our method can handle occlusion problems effectively and track each vehicle continuously.
Detecting stationary crowd groups and analyzing their behaviors have important applications in crowd video surveillance, but have rarely been studied. The contributions of this paper are in two aspects. First, a stationary crowd detection algorithm is proposed to estimate the stationary time of foreground pixels. It employs spatial-temporal filtering and motion filtering in order to be robust to noise caused by occlusions and crowd clutters. Second, in order to characterize the emergence and dispersal processes of stationary crowds and their behaviors during the stationary periods, three attributes are proposed for quantitative analysis. These attributes are recognized with a set of proposed crowd descriptors which extract visual features from the results of stationary crowd detection. The effectiveness of the proposed algorithms is shown through experiments on a benchmark dataset.
Abnormal crowd behavior detection is an important research issue in video processing and computer vision. In this paper we introduce a novel method to detect abnormal crowd behaviors in video surveillance based on interest points. A complex network-based algorithm is used to detect interest points and extract the global texture features in scenarios. The performance of the proposed method is evaluated on publicly available datasets. We present a detailed analysis of the characteristics of the crowd behavior in different density crowd scenes. The analysis of crowd behavior features and simulation results are also demonstrated to illustrate the effectiveness of our proposed method.
To reduce human efforts in browsing long surveillance videos, synopsis videos are proposed. Traditional synopsis video generation applying optimization on video tubes is very time consuming and infeasible for real-time online generation. This dilemma significantly reduces the feasibility of synopsis video generation in practical situations. To solve this problem, the synopsis video generation problem is formulated as a maximum a posteriori probability (MAP) estimation problem in this paper, where the positions and appearing frames of video objects are chronologically rearranged in real time without the need to know their complete trajectories. Moreover, a synopsis table is employed with MAP estimation to decide the temporal locations of the incoming foreground objects in the synopsis video without needing an optimization procedure. As a result, the computational complexity of the proposed video synopsis generation method can be significantly reduced. Furthermore, as it does not require prescreening the entire video, this approach can be applied on online streaming videos.
- « first
- ‹ previous
- 1
- 2
- 3