Human Action Recognition in Video Using DB-LSTM and ResNet

Submitted by grigby1 on Mon, 01/11/2021 - 2:01pm

Title	Human Action Recognition in Video Using DB-LSTM and ResNet
Publication Type	Conference Paper
Year of Publication	2020
Authors	Mihanpour, A., Rashti, M. J., Alavi, S. E.
Conference Name	2020 6th International Conference on Web Research (ICWR)
Date Published	April 2020
Publisher	IEEE
ISBN Number	978-1-7281-1051-6
Keywords	action recognition, CNN architecture, convolutional neural nets, convolutional neural network, convolutional neural networks, DB-LSTM, DB-LSTM network, deep bidirectional LSTM networks, deep neural networks, deep video, feature extraction, human action recognition method, image motion analysis, image processing, image sequences, learning (artificial intelligence), man-machine interaction, Metrics, object detection, pubcrawl, PyTorch, recurrent neural nets, resilience, Resiliency, ResNet152, Scalability, video frames, video processing, video signal processing, video-content-based monitoring
Abstract	Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.
URL	https://ieeexplore.ieee.org/document/9122304
DOI	10.1109/ICWR49608.2020.9122304
Citation Key	mihanpour_human_2020

Groups:

Science of Security VO