Mining Input Grammars from Dynamic Taints

Submitted by grigby1 on Tue, 05/30/2017 - 1:06pm

Title	Mining Input Grammars from Dynamic Taints
Publication Type	Conference Paper
Year of Publication	2016
Authors	Höschele, Matthias, Zeller, Andreas
Conference Name	Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-3845-5
Keywords	composability, context-free grammars, dynamic tainting, fuzzing, Input formats, Metrics, pubcrawl, taint analysis
Abstract	Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program. In a URL textlesspretextgreaterhttp://www.example.com/path/textless/pretextgreater, for instance, the protocol textlesspretextgreaterhttptextless/pretextgreater, the host textlesspretextgreaterwww.example.comtextless/pretextgreater, and the path textlesspretextgreaterpathtextless/pretextgreater would be handled by different functions and stored in different variables. Given a set of sample inputs, we use dynamic tainting to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities. The result is a context-free grammar that reflects valid input structure. In its evaluation, our AUTOGRAM prototype automatically produced readable and structurally accurate grammars for inputs like URLs, spreadsheets or configuration files. The resulting grammars not only allow simple reverse engineering of input formats, but can also directly serve as input for test generators.
URL	http://doi.acm.org/10.1145/2970276.2970321
DOI	10.1145/2970276.2970321
Citation Key	hoschele_mining_2016

Groups:

Science of Security VO