Visible to the public Generative Adversarial Networks for Increasing the Veracity of Big Data

TitleGenerative Adversarial Networks for Increasing the Veracity of Big Data
Publication TypeConference Paper
Year of Publication2017
AuthorsDering, M. L., Tucker, C. S.
Conference Name2017 IEEE International Conference on Big Data (Big Data)
Keywordsautomated data generation, Big Data, big data pipeline, compositionality, crowd-sourcing methodology, Data models, Deep Learning, Gallium nitride, GANs, generative adversarial networks, Generative Models, Generators, human drawn sketches, human verification task, learning (artificial intelligence), Metrics, Neurons, Pipelines, pubcrawl, resilience, Resiliency, Scalability, scalable verification, sketch data, Sketches, Training
Abstract

This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.

URLhttps://ieeexplore.ieee.org/document/8258219
DOI10.1109/BigData.2017.8258219
Citation Keydering_generative_2017