Using synthetic data to train neural networks is model-based reasoning

Submitted by grigby1 on Wed, 12/20/2017 - 1:03pm

Title	Using synthetic data to train neural networks is model-based reasoning
Publication Type	Conference Paper
Year of Publication	2017
Authors	Le, T. A., Baydin, A. G., Zinkov, R., Wood, F.
Conference Name	2017 International Joint Conference on Neural Networks (IJCNN)
Date Published	may
Keywords	approximate inference, Bayesian model-based reasoning, CAPTCHA, Captcha-breaking architecture, captchas, composability, Computational modeling, Data models, Facebook, Generators, Human Behavior, human beings, human factors, learning (artificial intelligence), model-based reasoning, neural nets, neural network generalization, neural network parameter optimization, neural network training, Neural networks, proposal distribution generator learning, pubcrawl, social networking (online), synthetic training data, synthetic-data generative model, task-specific posterior uncertainty, Training, Training data, Web sites, wikipedia
Abstract	We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.
DOI	10.1109/IJCNN.2017.7966298
Citation Key	le_using_2017

Groups:

Science of Security VO