Visible to the public TWC: Small: Assessing Online Information Exposure Using Web FootprintsConflict Detection Enabled

Project Details

Lead PI

Performance Period

Jan 15, 2013 - Dec 31, 2016

Institution(s)

Georgetown University

Award Number


Outcomes Report URL


This research project studies a new area of research - exposure detection - that is at the intersection of data mining, security, and natural language processing. Exposure detection refers to discovering components/attributes of a user's public profile that reduce the user's privacy. To help the public understand the privacy risks of sharing certain information on the web, this research project focuses on developing efficient algorithms for modeling how an adversary learns information using incomplete and schemaless public data sources. Theoretically sound and efficient techniques for identifying accurate web footprints are introduced, including: new methods for data matching using a novel probabilistic join operator on multi-granular data, automated approaches for generating inference rules, and new solutions for identifying missing information and unifying mismatched vocabulary using lightweight natural language processing and text mining. The research activities also investigate methods for quantifying and adjusting exposure and risk, facilitating a better understanding of individuals' vulnerability on the web. These techniques not only advance the state of the art in re-identification, probabilistic reasoning and inference logic, and natural language understanding, but also serve as a foundation for exposure detection.