Generative Adversarial Networks for Increasing the Veracity of Big Data

Submitted by grigby1 on Wed, 05/09/2018 - 1:47pm

Title	Generative Adversarial Networks for Increasing the Veracity of Big Data
Publication Type	Conference Paper
Year of Publication	2017
Authors	Dering, M. L., Tucker, C. S.
Conference Name	2017 IEEE International Conference on Big Data (Big Data)
Keywords	automated data generation, Big Data, big data pipeline, compositionality, crowd-sourcing methodology, Data models, Deep Learning, Gallium nitride, GANs, generative adversarial networks, Generative Models, Generators, human drawn sketches, human verification task, learning (artificial intelligence), Metrics, Neurons, Pipelines, pubcrawl, resilience, Resiliency, Scalability, scalable verification, sketch data, Sketches, Training
Abstract	This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.
URL	https://ieeexplore.ieee.org/document/8258219
DOI	10.1109/BigData.2017.8258219
Citation Key	dering_generative_2017

Groups:

Science of Security VO