Visible to the public Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance

TitleMulti-source Multi-modal Activity Recognition in Aerial Video Surveillance
Publication TypeConference Paper
Year of Publication2014
AuthorsHammoud, R.I., Sahin, C.S., Blasch, E.P., Rhodes, B.J.
Conference NameComputer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on
Date PublishedJune
KeywordsACO, activities of interest, activity pattern learning framework, activity video segments, aerial imagery, aerial video surveillance, analyst call-outs, associated text, FMV streaming, FMV target track representation, FMV videos, full-motion video, geolocation, image matching, image motion analysis, image representation, index query, indexing, learning (artificial intelligence), multi-intelligence user interface, multiple dynamic target detection, multiple dynamic target tracking, multisource associated data, multisource multimodal activity recognition, multisource multimodal event recognition, object detection, overhead imagery, Pattern recognition, probabilistic graph-based matching approach, query processing, Radar tracking, Semantics, spatial-temporal activity boundary detection, Streaming media, target tracking, targets-of-interest, unsynchronized data sources, user interfaces, Vehicles, video streaming, video surveillance, voice-to-text chat messages
Abstract

Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two un-synchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data.

URLhttps://ieeexplore.ieee.org/document/6909989/
DOI10.1109/CVPRW.2014.44
Citation Key6909989