Biblio
Although sequence-to-sequence attentional neural machine translation (NMT) has achieved great progress recently, it is confronted with two challenges: learning optimal model parameters for long parallel sentences and well exploiting different scopes of contexts. In this paper, partially inspired by the idea of segmenting a long sentence into short clauses, each of which can be easily translated by NMT, we propose a hierarchy-to-sequence attentional NMT model to handle these two challenges. Our encoder takes the segmented clause sequence as input and explores a hierarchical neural network structure to model words, clauses, and sentences at different levels, particularly with two layers of recurrent neural networks modeling semantic compositionality at the word and clause level. Correspondingly, the decoder sequentially translates segmented clauses and simultaneously applies two types of attention models to capture contexts of interclause and intraclause for translation prediction. In this way, we can not only improve parameter learning, but also well explore different scopes of contexts for translation. Experimental results on Chinese-English and English-German translation demonstrate the superiorities of the proposed model over the conventional NMT model.
Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN.
Software systems nowadays communicate via a number of complex languages. This is often the cause of security vulnerabilities like arbitrary code execution, or injections. Whereby injections such as cross-site scripting are widely known from textual languages such as HTML and JSON that constantly gain more popularity. These systems use parsers to read input and unparsers write output, where these security vulnerabilities arise. Therefore correct parsing and unparsing of messages is of the utmost importance when developing secure and reliable systems. Part of the challenge developers face is to correctly encode data during unparsing and decode it during parsing. This paper presents McHammerCoder, an (un)parser and encoding generator supporting textual and binary languages. Those (un)parsers automatically apply the generated encoding, that is derived from the language's grammar. Therefore manually defining and applying encoding is not required to effectively prevent injections when using McHammerCoder. By specifying the communication language within a grammar, McHammerCoder provides developers with correct input and output handling code for their custom language.
The best practice to prevent Cross Site Scripting (XSS) attacks is to apply encoders to sanitize untrusted data. To balance security and functionality, encoders should be applied to match the web page context, such as HTML body, JavaScript, and style sheets. A common programming error is the use of a wrong encoder to sanitize untrusted data, leaving the application vulnerable. We present a security unit testing approach to detect XSS vulnerabilities caused by improper encoding of untrusted data. Unit tests for the XSS vulnerability are automatically constructed out of each web page and then evaluated by a unit test execution framework. A grammar-based attack generator is used to automatically generate test inputs. We evaluate our approach on a large open source medical records application, demonstrating that we can detect many 0-day XSS vulnerabilities with very low false positives, and that the grammar-based attack generator has better test coverage than industry best practices.