Biblio
This paper describes the applications of deep learning-based image recognition in the DARPA Memex program and its repository of 1.4 million weapons-related images collected from the Deep web. We develop a fast, efficient, and easily deployable framework for integrating Google's Tensorflow framework with Apache Tika for automatically performing image forensics on the Memex data. Our framework and its integration are evaluated qualitatively and quantitatively and our work suggests that automated, large-scale, and reliable image classification and forensics can be widely used and deployed in bulk analysis for answering domain-specific questions.
We contribute a scalable, open source implementation of the Pooled Time Series (PoT) algorithm from CVPR 2015. The algorithm is evaluated on approximately 6800 human trafficking (HT) videos collected from the deep and dark web, and on an open dataset: the Human Motion Database (HMDB). We describe PoT and our motivation for using it on larger data and the issues we encountered. Our new solution reimagines PoT as an Apache Hadoop-based algorithm. We demonstrate that our new Hadoop-based algorithm successfully identifies similar videos in the HT and HMDB datasets and we evaluate the algorithm qualitatively and quantitatively.