Visible to the public Biblio

Filters: Keyword is big data pipeline  [Clear All Filters]
2018-05-09
Dering, M. L., Tucker, C. S..  2017.  Generative Adversarial Networks for Increasing the Veracity of Big Data. 2017 IEEE International Conference on Big Data (Big Data). :2595–2602.

This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.