Human Action Recognition in Video Using DB-LSTM and ResNet
Title | Human Action Recognition in Video Using DB-LSTM and ResNet |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Mihanpour, A., Rashti, M. J., Alavi, S. E. |
Conference Name | 2020 6th International Conference on Web Research (ICWR) |
Date Published | April 2020 |
Publisher | IEEE |
ISBN Number | 978-1-7281-1051-6 |
Keywords | action recognition, CNN architecture, convolutional neural nets, convolutional neural network, convolutional neural networks, DB-LSTM, DB-LSTM network, deep bidirectional LSTM networks, deep neural networks, deep video, feature extraction, human action recognition method, image motion analysis, image processing, image sequences, learning (artificial intelligence), man-machine interaction, Metrics, object detection, pubcrawl, PyTorch, recurrent neural nets, resilience, Resiliency, ResNet152, Scalability, video frames, video processing, video signal processing, video-content-based monitoring |
Abstract | Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study. |
URL | https://ieeexplore.ieee.org/document/9122304 |
DOI | 10.1109/ICWR49608.2020.9122304 |
Citation Key | mihanpour_human_2020 |
- learning (artificial intelligence)
- video-content-based monitoring
- video signal processing
- video processing
- video frames
- Scalability
- ResNet152
- Resiliency
- resilience
- recurrent neural nets
- PyTorch
- pubcrawl
- object detection
- Metrics
- man-machine interaction
- action recognition
- image sequences
- Image Processing
- image motion analysis
- human action recognition method
- feature extraction
- deep video
- deep neural networks
- deep bidirectional LSTM networks
- DB-LSTM network
- DB-LSTM
- convolutional neural networks
- convolutional neural network
- convolutional neural nets
- CNN architecture