Visible to the public Simple, Efficient and Effective Encodings of Local Deep Features for Video Action Recognition

TitleSimple, Efficient and Effective Encodings of Local Deep Features for Video Action Recognition
Publication TypeConference Paper
Year of Publication2017
AuthorsDuta, Ionut C., Ionescu, Bogdan, Aizawa, Kiyoharu, Sebe, Nicu
Conference NameProceedings of the 2017 ACM on International Conference on Multimedia Retrieval
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4701-3
Keywordsaction recognition, deep feature encoding, deep video, Metrics, pubcrawl, resilience, Resiliency, Scalability, video classification
Abstract

For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are not also highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in the context of action recognition tasks, our pipeline obtains state-of-the-art results on three challenging datasets: HMDB51, UCF50 and UCF101.

URLhttps://dl.acm.org/doi/10.1145/3078971.3078988
DOI10.1145/3078971.3078988
Citation Keyduta_simple_2017