Biblio
In this paper, we quantify elements representing video features and we propose the bitrate prediction of compressed encoding video using deep learning. Particularly, to overcome disadvantage that we cannot predict bitrate of compression video by using Constant Rate Factor (CRF), we use deep learning. We can find element of video feature with relationship of bitrate when we compress the video, and we can confirm its possibility to find relationship through various deep learning techniques.
In recent years, in order to continuously promote the construction of safe cities, security monitoring equipment has been widely used all over the country. How to use computer vision technology to realize effective intelligent analysis of violence in video surveillance is very important to maintain social stability and ensure people's life and property safety. Video surveillance system has been widely used because of its intuitive and convenient advantages. However, the existing video monitoring system has relatively single function, and generally only has the functions of monitoring video viewing, query and playback. In addition, relevant researchers pay less attention to the complex abnormal behavior of violence, and relevant research often ignores the differences between violent behaviors in different scenes. At present, there are two main problems in video abnormal behavior event detection: the video data of abnormal behavior is less and the definition of abnormal behavior in different scenes cannot be clearly distinguished. The main existing methods are to model normal behavior events first, and then define videos that do not conform to the normal model as abnormal, among which the learning method of video space-time feature representation based on deep learning shows a good prospect. In the face of massive surveillance videos, it is necessary to use deep learning to identify violent behaviors, so that the machine can learn to identify human actions, instead of manually monitoring camera images to complete the alarm of violent behaviors. Network training mainly uses video data set to identify network training.
One of the biggest studies on public safety and tracking that has sparked a lot of interest in recent years is deep learning approach. Current public safety methods are existent for counting and detecting persons. But many issues such as aberrant occurring in public spaces are seldom detected and reported to raise an automated alarm. Our proposed method detects anomalies (deviation from normal events) from the video surveillance footages using deep learning and raises an alarm, if anomaly is found. The proposed model is trained to detect anomalies and then it is applied to the video recording of the surveillance that is used to monitor public safety. Then the video is assessed frame by frame to detect anomaly and then if there is match, an alarm is raised.
Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.
ISSN: 2693-9371
With the rapid development of artificial intelligence, video target tracking is widely used in the fields of intelligent video surveillance, intelligent transportation, intelligent human-computer interaction and intelligent medical diagnosis. Deep learning has achieved remarkable results in the field of computer vision. The development of deep learning not only breaks through many problems that are difficult to be solved by traditional algorithms, improves the computer's cognitive level of images and videos, but also promotes the progress of related technologies in the field of computer vision. This paper combines the deep learning algorithm and target tracking algorithm to carry out relevant experiments on basketball motion detection video, hoping that the experimental results can be helpful to basketball motion detection video target tracking.
To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored “big” surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.
ISSN: 2642-9357
Advanced video compression is required due to the rise of online video content. A strong compression method can help convey video data effectively over a constrained bandwidth. We observed how more internet usage for video conferences, online gaming, and education led to decreased video quality from Netflix, YouTube, and other streaming services in Europe and other regions, particularly during the COVID-19 epidemic. They are represented in standard video compression algorithms as a succession of reference frames after residual frames, and these approaches are limited in their application. Deep learning's introduction and current advancements have the potential to overcome such problems. This study provides a deep learning-based video compression model that meets or exceeds current H.264 standards.
With the rapid development of multimedia and short video, there is a growing concern for video copyright protection. Some work has been proposed to add some copyright or fingerprint information to the video to trace the source of the video when it is stolen and protect video copyright. This paper proposes a video watermarking method based on a deep neural network and curriculum learning for watermarking of sliced videos. The first frame of the segmented video is perturbed by an encoder network, which is invisible and can be distinguished by the decoder network. Our model is trained and tested on an online educational video dataset consisting of 2000 different video clips. Experimental results show that our method can successfully discriminate most watermarked and non-watermarked videos with low visual disturbance, which can be achieved even under a relatively high video compression rate(H.264 video compress with CRF 32).
In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side. The overall system is trained in an end-to-end fashion minimizing a reconstruction error on the encoder output. Objective and subjective quality evaluation experiments demonstrate that the proposed approach provides an average bitrate reduction for the same visual quality of more than 60% compared to HEVC.
ISSN: 2381-8549
The requirements of much larger file sizes, different storage formats, and immersive viewing conditions pose significant challenges to the goals of compressing VR content. At the same time, the great potential of deep learning to advance progress on the video compression problem has driven a significant research effort. Because of the high bandwidth requirements of VR, there has also been significant interest in the use of space-variant, foveated compression protocols. We have integrated these techniques to create an end-to-end deep learning video compression framework. A feature of our new compression model is that it dispenses with the need for expensive search-based motion prediction computations by using displaced frame differences. We also implement foveation in our learning based approach, by introducing a Foveation Generator Unit (FGU) that generates foveation masks which direct the allocation of bits, significantly increasing compression efficiency while making it possible to retain an impression of little to no additional visual loss given an appropriate viewing geometry. Our experiment results reveal that our new compression model, which we call the Foveated MOtionless VIdeo Codec (Foveated MOVI-Codec), is able to efficiently compress videos without computing motion, while outperforming foveated version of both H.264 and H.265 on the widely used UVG dataset and on the HEVC Standard Class B Test Sequences.