Biblio
In recent months, AI-synthesized face swapping videos referred to as deepfake have become an emerging problem. False video is becoming more and more difficult to distinguish, which brings a series of challenges to social security. Some scholars are devoted to studying how to improve the detection accuracy of deepfake video. At the same time, in order to conduct better research, some datasets for deepfake detection are made. Companies such as Google and Facebook have also spent huge sums of money to produce datasets for deepfake video detection, as well as holding deepfake detection competitions. The continuous advancement of video tampering technology and the improvement of video quality have also brought great challenges to deepfake detection. Some scholars have achieved certain results on existing datasets, while the results on some high-quality datasets are not as good as expected. In this paper, we propose new method with clustering-based embedding regularization for deepfake detection. We use open source algorithms to generate videos which can simulate distinctive artifacts in the deepfake videos. To improve the local smoothness of the representation space, we integrate a clustering-based embedding regularization term into the classification objective, so that the obtained model learns to resist adversarial examples. We evaluate our method on three latest deepfake datasets. Experimental results demonstrate the effectiveness of our method.
Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.
Video surveillance plays an important role in our times. It is a great help in reducing the crime rate, and it can also help to monitor the status of facilities. The performance of the video surveillance system is limited by human factors such as fatigue, time efficiency, and human resources. It would be beneficial for all if fully automatic video surveillance systems are employed to do the job. The automation of the video surveillance system is still not satisfying regarding many problems such as the accuracy of the detector, bandwidth consumption, storage usage, etc. This scientific paper mainly focuses on a video surveillance system using Convolutional Neural Networks (CNN), IoT and cloud. The system contains multi nods, each node consists of a microprocessor(Raspberry Pi) and a camera, the nodes communicate with each other using client and server architecture. The nodes can detect humans using a pretraining MobileNetv2-SSDLite model and Common Objects in Context(COCO) dataset, the captured video will stream to the main node(only one node will communicate with cloud) in order to stream the video to the cloud. Also, the main node will send an SMS notification to the security team to inform the detection of humans. The security team can check the videos captured using a mobile application or web application. Operating the Object detection model of Deep learning will be required a large amount of the computational power, for instance, the Raspberry Pi with a limited in performance for that reason we used the MobileNetv2-SSDLite model.
Object recognition with the help of outdoor video surveillance cameras is an important task in the context of ensuring the security at enterprises, public places and even private premises. There have long existed systems that allow detecting moving objects in the image sequence from a video surveillance system. Such a system is partially considered in this research. It detects moving objects using a background model, which has certain problems. Due to this some objects are missed or detected falsely. We propose to combine the moving objects detection results with the classification, using a deep neural network. This will allow determining whether a detected object belongs to a certain class, sorting out false detections, discarding the unnecessary ones (sometimes individual classes are unwanted), to divide detected people into the employees in the uniform and all others, etc. The authors perform a network training in the Keras developer-friendly environment that provides for quick building, changing and training of network architectures. The performance of the Keras integration into a video analysis system, using direct Python script execution techniques, is between 6 and 52 ms, while the precision is between 59.1% and 97.2% for different architectures. The integration, made by freezing a selected network architecture with weights, is selected after testing. After that, frozen architecture can be imported into video analysis using the TensorFlow interface for C++. The performance of such type of integration is between 3 and 49 ms. The precision is between 63.4% and 97.8% for different architectures.
The incidence of abnormal road traffic events, especially abnormal traffic congestion, is becoming more and more prominent in daily traffic management in China. It has become the main research work of urban traffic management to detect and identify traffic congestion incidents in time. Efficient and accurate detection of traffic congestion incidents can provide a good strategy for traffic management. At present, the detection and recognition of traffic congestion events mainly rely on the integration of road traffic flow data and the passing data collected by electronic police or devices of checkpoint, and then estimating and forecasting road conditions through the method of big data analysis; Such methods often have some disadvantages such as low time-effect, low precision and small prediction range. Therefore, with the help of the current large and medium cities in the public security, traffic police have built video surveillance equipment, through computer vision technology to analyze the traffic flow from video monitoring, in this paper, the motion state and the changing trend of vehicle flow are obtained by using the technology of vehicle detection from video and multi-target tracking based on deep learning, so as to realize the perception and recognition of traffic congestion. The method achieves the recognition accuracy of less than 60 seconds in real-time, more than 80% in detection rate of congestion event and more than 82.5% in accuracy of detection. At the same time, it breaks through the restriction of traditional big data prediction, such as traffic flow data, truck pass data and GPS floating car data, and enlarges the scene and scope of detection.
As the assets of people are growing, security and surveillance have become a matter of great concern today. When a criminal activity takes place, the role of the witness plays a major role in nabbing the criminal. The witness usually states the gender of the criminal, the pattern of the criminal's dress, facial features of the criminal, etc. Based on the identification marks provided by the witness, the criminal is searched for in the surveillance cameras. Surveillance cameras are ubiquitous and finding criminals from a huge volume of surveillance video frames is a tedious process. In order to automate the search process, proposed a novel smart methodology using deep learning. This method takes gender, shirt pattern, and spectacle status as input to find out the object as person from the video log. The performance of this method achieves an accuracy of 87% in identifying the person in the video frame.
Video streams acquired from thermal cameras are proven to be beneficial in diverse number of fields including military, healthcare, law enforcement, and security. Despite the hype, thermal imaging is increasingly affected by poor resolution, where it has expensive optical sensors and inability to attain optical precision. In recent years, deep learning based super-resolution algorithms are developed to enhance the video frame resolution at high accuracy. This paper presents a comparative analysis of super resolution (SR) techniques based on deep neural networks (DNN) that are applied on thermal video dataset. SRCNN, EDSR, Auto-encoder, and SRGAN are also discussed and investigated. Further the results on benchmark thermal datasets including FLIR, OSU thermal pedestrian database and OSU color thermal database are evaluated and analyzed. Based on the experimental results, it is concluded that, SRGAN has delivered a superior performance on thermal frames when compared to other techniques and improvements, which has the ability to provide state-of-the art performance in real time operations.
This paper proposes a method for detecting anomalies in video data. A Variational Autoencoder (VAE) is used for reducing the dimensionality of video frames, generating latent space information that is comparable to low-dimensional sensory data (e.g., positioning, steering angle), making feasible the development of a consistent multi-modal architecture for autonomous vehicles. An Adapted Markov Jump Particle Filter defined by discrete and continuous inference levels is employed to predict the following frames and detecting anomalies in new video sequences. Our method is evaluated on different video scenarios where a semi-autonomous vehicle performs a set of tasks in a closed environment.
Re-drawing the image as a certain artistic style is considered to be a complicated task for computer machine. On the contrary, human can easily master the method to compose and describe the style between different images. In the past, many researchers studying on the deep neural networks had found an appropriate representation of the artistic style using perceptual loss and style reconstruction loss. In the previous works, Gatys et al. proposed an artificial system based on convolutional neural networks that creates artistic images of high perceptual quality. Whereas in terms of running speed, it was relatively time-consuming, thus it cannot apply to video style transfer. Recently, a feed-forward CNN approach has shown the potential of fast style transformation, which is an end-to-end system without hundreds of iteration while transferring. We combined the benefits of both approaches, optimized the feed-forward network and defined time loss function to make it possible to implement the style transfer on video in real time. In contrast to the past method, our method runs in real time with higher resolution while creating competitive visually pleasing and temporally consistent experimental results.
In this paper, we present an improved approach to transfer style for videos based on semantic segmentation. We segment foreground objects and background, and then apply different styles respectively. A fully convolutional neural network is used to perform semantic segmentation. We increase the reliability of the segmentation, and use the information of segmentation and the relationship between foreground objects and background to improve segmentation iteratively. We also use segmentation to improve optical flow, and apply different motion estimation methods between foreground objects and background. This improves the motion boundaries of optical flow, and solves the problems of incorrect and discontinuous segmentation caused by occlusion and shape deformation.
We formulate a tracker which performs incessant decision making in order to track objects where the objects may undergo different challenges such as partial occlusions, moving camera, cluttered background etc. In the process, the agent must make a decision on whether to keep track of the object when it is occluded or has moved out of the frame temporarily based on its prediction from the previous location or to reinitialize the tracker based on the belief that the target has been lost. Instead of the heuristic methods we depend on reward and penalty based training that helps the agent reach an optimal solution via this partially observable Markov decision making (POMDP). Furthermore, we employ deeply learned compositional model to estimate human pose in order to better handle occlusion without needing human inputs. By learning compositionality of human bodies via deep neural network the agent can make better decision on presence of human in a frame or lack thereof under occlusion. We adapt skeleton based part representation and do away with the large spatial state requirement. This especially helps in cases where orientation of the target in focus is unorthodox. Finally we demonstrate that the deep reinforcement learning based training coupled with pose estimation capabilities allows us to train and tag multiple large video datasets much quicker than previous works.
Witnessing the increasingly pervasive deployment of security video surveillance systems(VSS), more and more individuals have become concerned with the issues of privacy violations. While the majority of the public have a favorable view of surveillance in terms of crime deterrence, individuals do not accept the invasive monitoring of their private life. To date, however, there is not a lightweight and secure privacy-preserving solution for video surveillance systems. The recent success of blockchain (BC) technologies and their applications in the Internet of Things (IoT) shed a light on this challenging issue. In this paper, we propose a Lightweight, Blockchain-based Privacy protection (Lib-Pri) scheme for surveillance cameras at the edge. It enables the VSS to perform surveillance without compromising the privacy of people captured in the videos. The Lib-Pri system transforms the deployed VSS into a system that functions as a federated blockchain network capable of carrying out integrity checking, blurring keys management, feature sharing, and video access sanctioning. The policy-based enforcement of privacy measures is carried out at the edge devices for real-time video analytics without cluttering the network.
With the growing number of streaming services, internet providers are increasingly needing to be able to identify the types of data and content providers that are being used on their networks. Traditional methods, such as IP and port scanning, are not always available for clients using VPNs or with providers using varying IP addresses. As such, in this paper we explore a potential method using neural networks and Markov Decision Process in order to augment deep packet inspection techniques in identifying the source and class of video streaming services.
Deep learning is the segment of artificial intelligence which is involved with imitating the learning approach that human beings utilize to get some different types of knowledge. Analyzing videos, a part of deep learning is one of the most basic problems of computer vision and multi-media content analysis for at least 20 years. The job is very challenging as the video contains a lot of information with large differences and difficulties. Human supervision is still required in all surveillance systems. New advancement in computer vision which are observed as an important trend in video surveillance leads to dramatic efficiency gains. We propose a CCTV based theft detection along with tracking of thieves. We use image processing to detect theft and motion of thieves in CCTV footage, without the use of sensors. This system concentrates on object detection. The security personnel can be notified about the suspicious individual committing burglary using Real-time analysis of the movement of any human from CCTV footage and thus gives a chance to avert the same.