Visible to the public Biblio

Filters: Keyword is context-free grammars  [Clear All Filters]
2020-03-30
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
2018-05-09
Barenghi, A., Mainardi, N., Pelosi, G..  2017.  A Security Audit of the OpenPGP Format. 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks 2017 11th International Conference on Frontier of Computer Science and Technology 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC). :336–343.

For over two decades the OpenPGP format has provided the mainstay of email confidentiality and authenticity, and is currently being relied upon to provide authenticated package distributions in open source Unix systems. In this work, we provide the first language theoretical analysis of the OpenPGP format, classifying it as a deterministic context free language and establishing that an automatically generated parser can in principle be defined. However, we show that the number of rules required to describe it with a deterministic context free grammar is prohibitively high, and we identify security vulnerabilities in the OpenPGP format specification. We identify possible attacks aimed at tampering with messages and certificates while retaining their syntactical and semantical validity. We evaluate the effectiveness of these attacks against the two OpenPGP implementations covering the overwhelming majority of uses, i.e., the GNU Privacy Guard (GPG) and Symantec PGP. The results of the evaluation show that both implementations turn out not to be vulnerable due to conser- vative choices in dealing with malicious input data. Finally, we provide guidelines to improve the OpenPGP specification

2018-02-15
Bieschke, T., Hermerschmidt, L., Rumpe, B., Stanchev, P..  2017.  Eliminating Input-Based Attacks by Deriving Automated Encoders and Decoders from Context-Free Grammars. 2017 IEEE Security and Privacy Workshops (SPW). :93–101.

Software systems nowadays communicate via a number of complex languages. This is often the cause of security vulnerabilities like arbitrary code execution, or injections. Whereby injections such as cross-site scripting are widely known from textual languages such as HTML and JSON that constantly gain more popularity. These systems use parsers to read input and unparsers write output, where these security vulnerabilities arise. Therefore correct parsing and unparsing of messages is of the utmost importance when developing secure and reliable systems. Part of the challenge developers face is to correctly encode data during unparsing and decode it during parsing. This paper presents McHammerCoder, an (un)parser and encoding generator supporting textual and binary languages. Those (un)parsers automatically apply the generated encoding, that is derived from the language's grammar. Therefore manually defining and applying encoding is not required to effectively prevent injections when using McHammerCoder. By specifying the communication language within a grammar, McHammerCoder provides developers with correct input and output handling code for their custom language.

2017-05-30
Höschele, Matthias, Zeller, Andreas.  2016.  Mining Input Grammars from Dynamic Taints. Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. :720–725.

Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program. In a URL textlesspretextgreaterhttp://www.example.com/path/textless/pretextgreater, for instance, the protocol textlesspretextgreaterhttptextless/pretextgreater, the host textlesspretextgreaterwww.example.comtextless/pretextgreater, and the path textlesspretextgreaterpathtextless/pretextgreater would be handled by different functions and stored in different variables. Given a set of sample inputs, we use dynamic tainting to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities. The result is a context-free grammar that reflects valid input structure. In its evaluation, our AUTOGRAM prototype automatically produced readable and structurally accurate grammars for inputs like URLs, spreadsheets or configuration files. The resulting grammars not only allow simple reverse engineering of input formats, but can also directly serve as input for test generators.