Visible to the public Biblio

Filters: Keyword is deep video  [Clear All Filters]
2023-06-23
Choi, Hankaram, Bae, Yongchul.  2022.  Prediction of encoding bitrate for each CRF value using video features and deep learning. 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS&ISIS). :1–2.

In this paper, we quantify elements representing video features and we propose the bitrate prediction of compressed encoding video using deep learning. Particularly, to overcome disadvantage that we cannot predict bitrate of compression video by using Constant Rate Factor (CRF), we use deep learning. We can find element of video feature with relationship of bitrate when we compress the video, and we can confirm its possibility to find relationship through various deep learning techniques.

Wang, Xuezhong.  2022.  Research on Video Surveillance Violence Detection Technology Based on Deep Convolution Network. 2022 International Conference on Information System, Computing and Educational Technology (ICISCET). :347–350.

In recent years, in order to continuously promote the construction of safe cities, security monitoring equipment has been widely used all over the country. How to use computer vision technology to realize effective intelligent analysis of violence in video surveillance is very important to maintain social stability and ensure people's life and property safety. Video surveillance system has been widely used because of its intuitive and convenient advantages. However, the existing video monitoring system has relatively single function, and generally only has the functions of monitoring video viewing, query and playback. In addition, relevant researchers pay less attention to the complex abnormal behavior of violence, and relevant research often ignores the differences between violent behaviors in different scenes. At present, there are two main problems in video abnormal behavior event detection: the video data of abnormal behavior is less and the definition of abnormal behavior in different scenes cannot be clearly distinguished. The main existing methods are to model normal behavior events first, and then define videos that do not conform to the normal model as abnormal, among which the learning method of video space-time feature representation based on deep learning shows a good prospect. In the face of massive surveillance videos, it is necessary to use deep learning to identify violent behaviors, so that the machine can learn to identify human actions, instead of manually monitoring camera images to complete the alarm of violent behaviors. Network training mainly uses video data set to identify network training.

Nithesh, K, Tabassum, Nikhath, Geetha, D. D., Kumari, R D Anitha.  2022.  Anomaly Detection in Surveillance Videos Using Deep Learning. 2022 International Conference on Knowledge Engineering and Communication Systems (ICKES). :1–6.

One of the biggest studies on public safety and tracking that has sparked a lot of interest in recent years is deep learning approach. Current public safety methods are existent for counting and detecting persons. But many issues such as aberrant occurring in public spaces are seldom detected and reported to raise an automated alarm. Our proposed method detects anomalies (deviation from normal events) from the video surveillance footages using deep learning and raises an alarm, if anomaly is found. The proposed model is trained to detect anomalies and then it is applied to the video recording of the surveillance that is used to monitor public safety. Then the video is assessed frame by frame to detect anomaly and then if there is match, an alarm is raised.

Sun, Haoran, Zhu, Xiaolong, Zhou, Conghua.  2022.  Deep Reinforcement Learning for Video Summarization with Semantic Reward. 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C). :754–755.

Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.

ISSN: 2693-9371

Xia, Tieniu.  2022.  Embedded Basketball Motion Detection Video Target Tracking Algorithm Based on Deep Learning. 2022 International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS). :143–146.

With the rapid development of artificial intelligence, video target tracking is widely used in the fields of intelligent video surveillance, intelligent transportation, intelligent human-computer interaction and intelligent medical diagnosis. Deep learning has achieved remarkable results in the field of computer vision. The development of deep learning not only breaks through many problems that are difficult to be solved by traditional algorithms, improves the computer's cognitive level of images and videos, but also promotes the progress of related technologies in the field of computer vision. This paper combines the deep learning algorithm and target tracking algorithm to carry out relevant experiments on basketball motion detection video, hoping that the experimental results can be helpful to basketball motion detection video target tracking.

Rajin, S M Ataul Karim, Murshed, Manzur, Paul, Manoranjan, Teng, Shyh Wei, Ma, Jiangang.  2022.  Human pose based video compression via forward-referencing using deep learning. 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP). :1–5.

To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored “big” surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.

ISSN: 2642-9357

P, Dayananda, Subramanian, Siddharth, Suresh, Vijayalakshmi, Shivalli, Rishab, Sinha, Shrinkhla.  2022.  Video Compression using Deep Neural Networks. 2022 Fourth International Conference on Cognitive Computing and Information Processing (CCIP). :1–5.

Advanced video compression is required due to the rise of online video content. A strong compression method can help convey video data effectively over a constrained bandwidth. We observed how more internet usage for video conferences, online gaming, and education led to decreased video quality from Netflix, YouTube, and other streaming services in Europe and other regions, particularly during the COVID-19 epidemic. They are represented in standard video compression algorithms as a succession of reference frames after residual frames, and these approaches are limited in their application. Deep learning's introduction and current advancements have the potential to overcome such problems. This study provides a deep learning-based video compression model that meets or exceeds current H.264 standards.

Ke, Zehui, Huang, Hailiang, Liang, Yingwei, Ding, Yi, Cheng, Xin, Wu, Qingyao.  2022.  Robust Video watermarking based on deep neural network and curriculum learning. 2022 IEEE International Conference on e-Business Engineering (ICEBE). :80–85.

With the rapid development of multimedia and short video, there is a growing concern for video copyright protection. Some work has been proposed to add some copyright or fingerprint information to the video to trace the source of the video when it is stolen and protect video copyright. This paper proposes a video watermarking method based on a deep neural network and curriculum learning for watermarking of sliced videos. The first frame of the segmented video is perturbed by an encoder network, which is invisible and can be distinguished by the decoder network. Our model is trained and tested on an online educational video dataset consisting of 2000 different video clips. Experimental results show that our method can successfully discriminate most watermarked and non-watermarked videos with low visual disturbance, which can be achieved even under a relatively high video compression rate(H.264 video compress with CRF 32).

Konuko, Goluck, Valenzise, Giuseppe, Lathuilière, Stéphane.  2022.  Ultra-Low Bitrate Video Conferencing Using Deep Image Animation. 2022 IEEE International Conference on Image Processing (ICIP). :3515–3520.

In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side. The overall system is trained in an end-to-end fashion minimizing a reconstruction error on the encoder output. Objective and subjective quality evaluation experiments demonstrate that the proposed approach provides an average bitrate reduction for the same visual quality of more than 60% compared to HEVC.

ISSN: 2381-8549

Chen, Meixu, Webb, Richard, Bovik, Alan C..  2022.  Foveated MOVI-Codec: Foveation-based Deep Video Compression without Motion. 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). :1–5.

The requirements of much larger file sizes, different storage formats, and immersive viewing conditions pose significant challenges to the goals of compressing VR content. At the same time, the great potential of deep learning to advance progress on the video compression problem has driven a significant research effort. Because of the high bandwidth requirements of VR, there has also been significant interest in the use of space-variant, foveated compression protocols. We have integrated these techniques to create an end-to-end deep learning video compression framework. A feature of our new compression model is that it dispenses with the need for expensive search-based motion prediction computations by using displaced frame differences. We also implement foveation in our learning based approach, by introducing a Foveation Generator Unit (FGU) that generates foveation masks which direct the allocation of bits, significantly increasing compression efficiency while making it possible to retain an impression of little to no additional visual loss given an appropriate viewing geometry. Our experiment results reveal that our new compression model, which we call the Foveated MOtionless VIdeo Codec (Foveated MOVI-Codec), is able to efficiently compress videos without computing motion, while outperforming foveated version of both H.264 and H.265 on the widely used UVG dataset and on the HEVC Standard Class B Test Sequences.

2022-04-25
Wang, Chenxu, Yao, Yanxin, Yao, Han.  2021.  Video anomaly detection method based on future frame prediction and attention mechanism. 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). :0405–0407.
With the development of deep learning technology, a large number of new technologies for video anomaly detection have emerged. This paper proposes a video anomaly detection algorithm based on the future frame prediction using Generative Adversarial Network (GAN) and attention mechanism. For the generation model, a U-Net model, is modified and added with an attention module. For the discrimination model, a Markov GAN discrimination model with self-attention mechanism is proposed, which can affect the generator and improve the generation quality of the future video frame. Experiments show that the new video anomaly detection algorithm improves the detection performance, and the attention module plays an important role in the overall detection performance. It is found that the more the attention modules are appliedthe deeper the application level is, the better the detection effect is, which also verifies the rationality of the model structure used in this project.
Wu, Fubao, Gao, Lixin, Zhou, Tian, Wang, Xi.  2021.  MOTrack: Real-time Configuration Adaptation for Video Analytics through Movement Tracking. 2021 IEEE Global Communications Conference (GLOBECOM). :01–06.
Video analytics has many applications in traffic control, security monitoring, action/event analysis, etc. With the adoption of deep neural networks, the accuracy of video analytics in video streams has been greatly improved. However, deep neural networks for performing video analytics are compute-intensive. In order to reduce processing time, many systems switch to the lower frame rate or resolution. State-of-the-art switching approaches adjust configurations by profiling video clips on a large configuration space. Multiple configurations are tested periodically and the cheapest one with a desired accuracy is adopted. In this paper, we propose a method that adapts the configuration by analyzing past video analytics results instead of profiling candidate configurations. Our method adopts a lower/higher resolution or frame rate when objects move slow/fast. We train a model that automatically selects the best configuration. We evaluate our method with two real-world video analytics applications: traffic tracking and pose estimation. Compared to the periodic profiling method, our method achieves 3%-12% higher accuracy with the same resource cost and 8-17x faster with comparable accuracy.
Nguyen, Huy Hoang, Ta, Thi Nhung, Nguyen, Ngoc Cuong, Bui, Van Truong, Pham, Hung Manh, Nguyen, Duc Minh.  2021.  YOLO Based Real-Time Human Detection for Smart Video Surveillance at the Edge. 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE). :439–444.
Recently, smart video surveillance at the edge has become a trend in developing security applications since edge computing enables more image processing tasks to be implemented on the decentralised network note of the surveillance system. As a result, many security applications such as behaviour recognition and prediction, employee safety, perimeter intrusion detection and vandalism deterrence can minimise their latency or even process in real-time when the camera network system is extended to a larger degree. Technically, human detection is a key step in the implementation of these applications. With the advantage of high detection rates, deep learning methods have been widely employed on edge devices in order to detect human objects. However, due to their high computation costs, it is challenging to apply these methods on resource limited edge devices for real-time applications. Inspired by the You Only Look Once (YOLO), residual learning and Spatial Pyramid Pooling (SPP), a novel form of real-time human detection is presented in this paper. Our approach focuses on designing a network structure so that the developed model can achieve a good trade-off between accuracy and processing time. Experimental results show that our trained model can process 2 FPS on Raspberry PI 3B and detect humans with accuracies of 95.05 % and 96.81 % when tested respectively on INRIA and PENN FUDAN datasets. On the human COCO test dataset, our trained model outperforms the performance of the Tiny-YOLO versions. Additionally, compare to the SSD based L-CNN method, our algorithm achieves better accuracy than the other method.
El Rai, Marwa, Al-Saad, Mina, Darweesh, Muna, Al Mansoori, Saeed, Al Ahmad, Hussain, Mansoor, Wathiq.  2021.  Moving Objects Segmentation in Infrared Scene Videos. 2021 4th International Conference on Signal Processing and Information Security (ICSPIS). :17–20.
Nowadays, developing an intelligent system for segmenting the moving object from the background is essential task for video surveillance applications. Recently, a deep learning segmentation algorithm composed of encoder CNN, a Feature Pooling Module and a decoder CNN called FgSegNET\_S has been proposed. It is capable to train the model using few training examples. FgSegNET\_S is relying only on the spatial information while it is fundamental to include temporal information to distinguish if an object is moving or not. In this paper, an improved version known as (T\_FgSegNET\_S) is proposed by using the subtracted images from the initial background as input. The proposed approach is trained and evaluated using two publicly available infrared datasets: remote scene infrared videos captured by medium-wave infrared (MWIR) sensors and the Grayscale Thermal Foreground Detection (GTFD) dataset. The performance of network is evaluated using precision, recall, and F-measure metrics. The experiments show improved results, especially when compared to other state-of-the-art methods.
Sunil, Ajeet, Sheth, Manav Hiren, E, Shreyas, Mohana.  2021.  Usual and Unusual Human Activity Recognition in Video using Deep Learning and Artificial Intelligence for Security Applications. 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT). :1–6.
The main objective of Human Activity Recognition (HAR) is to detect various activities in video frames. Video surveillance is an import application for various security reasons, therefore it is essential to classify activities as usual and unusual. This paper implements the deep learning model that has the ability to classify and localize the activities detected using a Single Shot Detector (SSD) algorithm with a bounding box, which is explicitly trained to detect usual and unusual activities for security surveillance applications. Further this model can be deployed in public places to improve safety and security of individuals. The SSD model is designed and trained using transfer learning approach. Performance evaluation metrics are visualised using Tensor Board tool. This paper further discusses the challenges in real-time implementation.
Khasanova, Aliia, Makhmutova, Alisa, Anikin, Igor.  2021.  Image Denoising for Video Surveillance Cameras Based on Deep Learning Techniques. 2021 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). :713–718.
Nowadays, video surveillance cameras are widely used in many smart city applications for ensuring road safety. We can use video data from them to solve such tasks as traffic management, driving control, environmental monitoring, etc. Most of these applications are based on object recognition and tracking algorithms. However, the video image quality is not always meet the requirements of such algorithms due to the influence of different external factors. A variety of adverse weather conditions produce noise on the images, which often makes it difficult to detect objects correctly. Lately, deep learning methods show good results in image processing, including denoising tasks. This work is devoted to the study of using these methods for image quality enhancement in difficult weather conditions such as snow, rain, fog. Different deep learning techniques were evaluated in terms of their impact on the quality of object detection/recognition. Finally, the system for automatic image denoising was developed.
Jaiswal, Gaurav.  2021.  Hybrid Recurrent Deep Learning Model for DeepFake Video Detection. 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON). :1–5.
Nowadays deepfake videos are concern with social ethics, privacy and security. Deepfake videos are synthetically generated videos that are generated by modifying the facial features and audio features to impose one person’s facial data and audio to other videos. These videos can be used for defaming and fraud. So, counter these types of manipulations and threats, detection of deepfake video is needed. This paper proposes multilayer hybrid recurrent deep learning models for deepfake video detection. Proposed models exploit the noise-based temporal facial convolutional features and temporal learning of hybrid recurrent deep learning models. Experiment results of these models demonstrate its performance over stacked recurrent deep learning models.
Ren, Jing, Xia, Feng, Liu, Yemeng, Lee, Ivan.  2021.  Deep Video Anomaly Detection: Opportunities and Challenges. 2021 International Conference on Data Mining Workshops (ICDMW). :959–966.
Anomaly detection is a popular and vital task in various research contexts, which has been studied for several decades. To ensure the safety of people’s lives and assets, video surveillance has been widely deployed in various public spaces, such as crossroads, elevators, hospitals, banks, and even in private homes. Deep learning has shown its capacity in a number of domains, ranging from acoustics, images, to natural language processing. However, it is non-trivial to devise intelligent video anomaly detection systems cause anomalies significantly differ from each other in different application scenarios. There are numerous advantages if such intelligent systems could be realised in our daily lives, such as saving human resources in a large degree, reducing financial burden on the government, and identifying the anomalous behaviours timely and accurately. Recently, many studies on extending deep learning models for solving anomaly detection problems have emerged, resulting in beneficial advances in deep video anomaly detection techniques. In this paper, we present a comprehensive review of deep learning-based methods to detect the video anomalies from a new perspective. Specifically, we summarise the opportunities and challenges of deep learning models on video anomaly detection tasks, respectively. We put forth several potential future research directions of intelligent video anomaly detection system in various application domains. Moreover, we summarise the characteristics and technical problems in current deep learning methods for video anomaly detection.
Pawar, Karishma, Attar, Vahida.  2021.  Application of Deep Learning for Crowd Anomaly Detection from Surveillance Videos. 2021 11th International Conference on Cloud Computing, Data Science Engineering (Confluence). :506–511.
Due to immense need for implementing security measures and control ongoing activities, intelligent video analytics is regarded as one of the outstanding and challenging research domains in Computer Vision. Assigning video operator to manually monitor the surveillance videos 24×7 to identify occurrence of interesting and anomalous events like robberies, wrong U-turns, violence, accidents is cumbersome and error- prone. Therefore, to address the issue of continuously monitoring surveillance videos and detect the anomalies from them, a deep learning approach based on pipelined sequence of convolutional autoencoder and sequence to sequence long short-term memory autoencoder has been proposed. Specifically, unsupervised learning approach encompassing one-class classification paradigm has been proposed for detection of anomalies in videos. The effectiveness of the propped model is demonstrated on benchmarked anomaly detection dataset and significant results in terms of equal error rate, area under curve and time required for detection have been achieved.
2021-01-11
Shin, H. C., Chang, J., Na, K..  2020.  Anomaly Detection Algorithm Based on Global Object Map for Video Surveillance System. 2020 20th International Conference on Control, Automation and Systems (ICCAS). :793—795.

Recently, smart video security systems have been active. The existing video security system is mainly a method of detecting a local abnormality of a unit camera. In this case, it is difficult to obtain the characteristics of each local region and the situation for the entire watching area. In this paper, we developed an object map for the entire surveillance area using a combination of surveillance cameras, and developed an algorithm to detect anomalies by learning normal situations. The surveillance camera in each area detects and tracks people and cars, and creates a local object map and transmits it to the server. The surveillance server combines each local maps to generate a global map for entire areas. Probability maps were automatically calculated from the global maps, and normal and abnormal decisions were performed through trained data about normal situations. For three reporting status: normal, caution, and warning, and the caution report performance shows that normal detection 99.99% and abnormal detection 86.6%.

Mihanpour, A., Rashti, M. J., Alavi, S. E..  2020.  Human Action Recognition in Video Using DB-LSTM and ResNet. 2020 6th International Conference on Web Research (ICWR). :133—138.

Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.

Khadka, A., Argyriou, V., Remagnino, P..  2020.  Accurate Deep Net Crowd Counting for Smart IoT Video acquisition devices. 2020 16th International Conference on Distributed Computing in Sensor Systems (DCOSS). :260—264.

A novel deep neural network is proposed, for accurate and robust crowd counting. Crowd counting is a complex task, as it strongly depends on the deployed camera characteristics and, above all, the scene perspective. Crowd counting is essential in security applications where Internet of Things (IoT) cameras are deployed to help with crowd management tasks. The complexity of a scene varies greatly, and a medium to large scale security system based on IoT cameras must cater for changes in perspective and how people appear from different vantage points. To address this, our deep architecture extracts multi-scale features with a pyramid contextual module to provide long-range contextual information and enlarge the receptive field. Experiments were run on three major crowd counting datasets, to test our proposed method. Results demonstrate our method supersedes the performance of state-of-the-art methods.

YE, X., JI, B., Chen, X., QIAN, D., Zhao, Z..  2020.  Probability Boltzmann Machine Network for Face Detection on Video. 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). :138—147.

By the multi-layer nonlinear mapping and the semantic feature extraction of the deep learning, a deep learning network is proposed for video face detection to overcome the challenge of detecting faces rapidly and accurately in video with changeable background. Particularly, a pre-training procedure is used to initialize the network parameters to avoid falling into the local optimum, and the greedy layer-wise learning is introduced in the pre-training to avoid the training error transfer in layers. Key to the network is that the probability of neurons models the status of human brain neurons which is a continuous distribution from the most active to the least active and the hidden layer’s neuron number decreases layer-by-layer to reduce the redundant information of the input data. Moreover, the skin color detection is used to accelerate the detection speed by generating candidate regions. Experimental results show that, besides the faster detection speed and robustness against face rotation, the proposed method possesses lower false detection rate and lower missing detection rate than traditional algorithms.

Khudhair, A. B., Ghani, R. F..  2020.  IoT Based Smart Video Surveillance System Using Convolutional Neural Network. 2020 6th International Engineering Conference “Sustainable Technology and Development" (IEC). :163—168.

Video surveillance plays an important role in our times. It is a great help in reducing the crime rate, and it can also help to monitor the status of facilities. The performance of the video surveillance system is limited by human factors such as fatigue, time efficiency, and human resources. It would be beneficial for all if fully automatic video surveillance systems are employed to do the job. The automation of the video surveillance system is still not satisfying regarding many problems such as the accuracy of the detector, bandwidth consumption, storage usage, etc. This scientific paper mainly focuses on a video surveillance system using Convolutional Neural Networks (CNN), IoT and cloud. The system contains multi nods, each node consists of a microprocessor(Raspberry Pi) and a camera, the nodes communicate with each other using client and server architecture. The nodes can detect humans using a pretraining MobileNetv2-SSDLite model and Common Objects in Context(COCO) dataset, the captured video will stream to the main node(only one node will communicate with cloud) in order to stream the video to the cloud. Also, the main node will send an SMS notification to the security team to inform the detection of humans. The security team can check the videos captured using a mobile application or web application. Operating the Object detection model of Deep learning will be required a large amount of the computational power, for instance, the Raspberry Pi with a limited in performance for that reason we used the MobileNetv2-SSDLite model.

Fomin, I., Burin, V., Bakhshiev, A..  2020.  Research on Neural Networks Integration for Object Classification in Video Analysis Systems. 2020 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). :1—5.

Object recognition with the help of outdoor video surveillance cameras is an important task in the context of ensuring the security at enterprises, public places and even private premises. There have long existed systems that allow detecting moving objects in the image sequence from a video surveillance system. Such a system is partially considered in this research. It detects moving objects using a background model, which has certain problems. Due to this some objects are missed or detected falsely. We propose to combine the moving objects detection results with the classification, using a deep neural network. This will allow determining whether a detected object belongs to a certain class, sorting out false detections, discarding the unnecessary ones (sometimes individual classes are unwanted), to divide detected people into the employees in the uniform and all others, etc. The authors perform a network training in the Keras developer-friendly environment that provides for quick building, changing and training of network architectures. The performance of the Keras integration into a video analysis system, using direct Python script execution techniques, is between 6 and 52 ms, while the precision is between 59.1% and 97.2% for different architectures. The integration, made by freezing a selected network architecture with weights, is selected after testing. After that, frozen architecture can be imported into video analysis using the TensorFlow interface for C++. The performance of such type of integration is between 3 and 49 ms. The precision is between 63.4% and 97.8% for different architectures.