Generative Adversarial Networks for Increasing the Veracity of Big Data
Title | Generative Adversarial Networks for Increasing the Veracity of Big Data |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Dering, M. L., Tucker, C. S. |
Conference Name | 2017 IEEE International Conference on Big Data (Big Data) |
Keywords | automated data generation, Big Data, big data pipeline, compositionality, crowd-sourcing methodology, Data models, Deep Learning, Gallium nitride, GANs, generative adversarial networks, Generative Models, Generators, human drawn sketches, human verification task, learning (artificial intelligence), Metrics, Neurons, Pipelines, pubcrawl, resilience, Resiliency, Scalability, scalable verification, sketch data, Sketches, Training |
Abstract | This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment. |
URL | https://ieeexplore.ieee.org/document/8258219 |
DOI | 10.1109/BigData.2017.8258219 |
Citation Key | dering_generative_2017 |
- human verification task
- Training
- Sketches
- sketch data
- scalable verification
- Scalability
- Resiliency
- resilience
- pubcrawl
- Pipelines
- Neurons
- Metrics
- learning (artificial intelligence)
- automated data generation
- human drawn sketches
- Generators
- Generative Models
- generative adversarial networks
- GANs
- Gallium nitride
- deep learning
- Data models
- crowd-sourcing methodology
- Compositionality
- big data pipeline
- Big Data