Visible to the public TWC: Medium: Understanding and Illuminating Non-Public Data FlowsConflict Detection Enabled

Project Details

Performance Period

Sep 01, 2015 - Aug 31, 2018

Institution(s)

International Computer Science Institute

Award Number


Our lives are surrounded by a constant web of data, picked up by a global network of unseen programs that gather, coalesce, combine, and merge every scrap of data they can acquire. These programs and companies operate out of public view, collecting and exchanging data for profit without clear public knowledge. This is a complex ecosystem, the original collectors of data are likely unaware of eventual uses, users of data may be unaware of the original source. This project seeks to illuminate this ecosystem through a series of experiments by attempting to measure and perturb unseen data pools by selectively adding or retrieving information. Additionally, this project focuses on creating traps and triggers, artificial data that future data providers might employ, enabling discovery of new collection and use of data. Finally, simply researching the phenomenon is insufficient: a final critical factor is education and outreach, empowering the public with an understanding of these otherwise unseen programs. The philosophy of this project is simple: If these data pools affect our lives, we must know what they are and what they do.

The technical focus of this project involves perturbing the data systems and soundly measuring the results. Some data brokers provide user access, allowing the direct validation of inferences. The project also involves creating ?personas?, artificial identities designed to leave traces in data pools. If a data broker purchases and acts on this data, this creates a causal link between data source and data consumer, allowing attribution of data flows within the data ecosystem. Other portions of the project involve purchasing data directly from brokers, evaluating the potential damage that such brokers may entail, and deliberately seeding multimedia content which includes various levels of identifiable information to detect when data brokers begin scraping these multimedia sources.