Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance
Title | Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Hammoud, R.I., Sahin, C.S., Blasch, E.P., Rhodes, B.J. |
Conference Name | Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on |
Date Published | June |
Keywords | ACO, activities of interest, activity pattern learning framework, activity video segments, aerial imagery, aerial video surveillance, analyst call-outs, associated text, FMV streaming, FMV target track representation, FMV videos, full-motion video, geolocation, image matching, image motion analysis, image representation, index query, indexing, learning (artificial intelligence), multi-intelligence user interface, multiple dynamic target detection, multiple dynamic target tracking, multisource associated data, multisource multimodal activity recognition, multisource multimodal event recognition, object detection, overhead imagery, Pattern recognition, probabilistic graph-based matching approach, query processing, Radar tracking, Semantics, spatial-temporal activity boundary detection, Streaming media, target tracking, targets-of-interest, unsynchronized data sources, user interfaces, Vehicles, video streaming, video surveillance, voice-to-text chat messages |
Abstract | Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two un-synchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data. |
URL | https://ieeexplore.ieee.org/document/6909989/ |
DOI | 10.1109/CVPRW.2014.44 |
Citation Key | 6909989 |
- spatial-temporal activity boundary detection
- multisource associated data
- multisource multimodal activity recognition
- multisource multimodal event recognition
- object detection
- overhead imagery
- Pattern recognition
- probabilistic graph-based matching approach
- query processing
- Radar tracking
- Semantics
- multiple dynamic target tracking
- Streaming media
- target tracking
- targets-of-interest
- unsynchronized data sources
- user interfaces
- vehicles
- video streaming
- video surveillance
- voice-to-text chat messages
- full-motion video
- activities of interest
- activity pattern learning framework
- activity video segments
- aerial imagery
- aerial video surveillance
- analyst call-outs
- associated text
- FMV streaming
- FMV target track representation
- FMV videos
- ACO
- geolocation
- image matching
- image motion analysis
- image representation
- index query
- indexing
- learning (artificial intelligence)
- multi-intelligence user interface
- multiple dynamic target detection