Biblio
Although sequence-to-sequence attentional neural machine translation (NMT) has achieved great progress recently, it is confronted with two challenges: learning optimal model parameters for long parallel sentences and well exploiting different scopes of contexts. In this paper, partially inspired by the idea of segmenting a long sentence into short clauses, each of which can be easily translated by NMT, we propose a hierarchy-to-sequence attentional NMT model to handle these two challenges. Our encoder takes the segmented clause sequence as input and explores a hierarchical neural network structure to model words, clauses, and sentences at different levels, particularly with two layers of recurrent neural networks modeling semantic compositionality at the word and clause level. Correspondingly, the decoder sequentially translates segmented clauses and simultaneously applies two types of attention models to capture contexts of interclause and intraclause for translation prediction. In this way, we can not only improve parameter learning, but also well explore different scopes of contexts for translation. Experimental results on Chinese-English and English-German translation demonstrate the superiorities of the proposed model over the conventional NMT model.
In this paper, we consider the problem of decentralized verification for large-scale cascade interconnections of linear subsystems such that dissipativity properties of the overall system are guaranteed with minimum knowledge of the dynamics. In order to achieve compositionality, we distribute the verification process among the individual subsystems, which utilize limited information received locally from their immediate neighbors. Furthermore, to obviate the need for full knowledge of the subsystem parameters, each decentralized verification rule employs a model-free learning structure; a reinforcement learning algorithm that allows for online evaluation of the appropriate storage function that can be used to verify dissipativity of the system up to that point. Finally, we show how the interconnection can be extended by adding learning-enabled subsystems while ensuring dissipativity.
While recent studies have shed light on the expressivity, complexity and compositionality of convolutional networks, the real inductive bias of the family of functions reachable by gradient descent on natural data is still unknown. By exploiting symmetries in the preactivation space of convolutional layers, we present preliminary empirical evidence of regularities in the preimage of trained rectifier networks, in terms of arrangements of polytopes, and relate it to the nonlinear transformations applied by the network to its input.
Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN.
With the popularity of smart devices and the widespread use of the Wi-Fi-based indoor localization, edge computing is becoming the mainstream paradigm of processing massive sensing data to acquire indoor localization service. However, these data which were conveyed to train the localization model unintentionally contain some sensitive information of users/devices, and were released without any protection may cause serious privacy leakage. To solve this issue, we propose a lightweight differential privacy-preserving mechanism for the edge computing environment. We extend ε-differential privacy theory to a mature machine learning localization technology to achieve privacy protection while training the localization model. Experimental results on multiple real-world datasets show that, compared with the original localization technology without privacy-preserving, our proposed scheme can achieve high accuracy of indoor localization while providing differential privacy guarantee. Through regulating the value of ε, the data quality loss of our method can be controlled up to 8.9% and the time consumption can be almost negligible. Therefore, our scheme can be efficiently applied in the edge networks and provides some guidance on indoor localization privacy protection in the edge computing.
This paper investigates the effectiveness of reinforcement learning (RL) model in clustering as an approach to achieve higher network scalability in distributed cognitive radio networks. Specifically, it analyzes the effects of RL parameters, namely the learning rate and discount factor in a volatile environment, which consists of member nodes (or secondary users) that launch attacks with various probabilities of attack. The clusterhead, which resides in an operating region (environment) that is characterized by the probability of attacks, countermeasures the malicious SUs by leveraging on a RL model. Simulation results have shown that in a volatile operating environment, the RL model with learning rate α= 1 provides the highest network scalability when the probability of attacks ranges between 0.3 and 0.7, while the discount factor γ does not play a significant role in learning in an operating environment that is volatile due to attacks.