Visible to the public Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

TitleScalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web
Publication TypeConference Paper
Year of Publication2017
AuthorsMattmann, Chris A., Sharan, Madhav
Conference NameProceedings of the 2017 ACM on International Conference on Multimedia Retrieval
Date PublishedJune 2017
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4701-3
Keywordsdark web, darpa, Hadoop, Human Behavior, human factors, memex, pooled time series, pubcrawl, video
Abstract

We contribute a scalable, open source implementation of the Pooled Time Series (PoT) algorithm from CVPR 2015. The algorithm is evaluated on approximately 6800 human trafficking (HT) videos collected from the deep and dark web, and on an open dataset: the Human Motion Database (HMDB). We describe PoT and our motivation for using it on larger data and the issues we encountered. Our new solution reimagines PoT as an Apache Hadoop-based algorithm. We demonstrate that our new Hadoop-based algorithm successfully identifies similar videos in the HT and HMDB datasets and we evaluate the algorithm qualitatively and quantitatively.

URLhttp://doi.acm.org/10.1145/3078971.3079019
DOI10.1145/3078971.3079019
Citation Keymattmann_scalable_2017