Biblio

List
Filter

Found 431 results

Filters: Keyword is Neural networks [Clear All Filters]

2022-11-18

Alali, Mohammad, Shimim, Farshina Nazrul, Shahooei, Zagros, Bahramipanah, Maryam. 2021. Intelligent Line Congestion Prognosis in Active Distribution System Using Artificial Neural Network. 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT). :1–5.

This paper proposes an intelligent line congestion prognosis scheme based on wide-area measurements, which accurately identifies an impending congestion and the problem causing the congestion. Due to the increasing penetration of renewable energy resources and uncertainty of load/generation patterns in the Active Distribution Networks (ADNs), power line congestion is one of the issues that could happen during peak load conditions or high-power injection by renewable energy resources. Congestion would have devastating effects on both the economical and technical operation of the grid. Hence, it is crucial to accurately predict congestions to alleviate the problem in-time and command proper control actions; such as, power redispatch, incorporating ancillary services and energy storage systems, and load curtailment. We use neural network methods in this work due to their outstanding performance in predicting the nonlinear behavior of the power system. Bayesian Regularization, along with Levenberg-Marquardt algorithm, is used to train the proposed neural networks to predict an impending congestion and its cause. The proposed method is validated using the IEEE 13-bus test system. Utilizing the proposed method, extreme control actions (i.e., protection actions and load curtailment) can be avoided. This method will improve the distribution grid resiliency and ensure the continuous supply of power to the loads.

Li, Pengzhen, Koyuncu, Erdem, Seferoglu, Hulya. 2021. Respipe: Resilient Model-Distributed DNN Training at Edge Networks. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :3660–3664.

The traditional approach to distributed deep neural network (DNN) training is data-distributed learning, which partitions and distributes data to workers. This approach, although has good convergence properties, has high communication cost, which puts a strain especially on edge systems and increases delay. An emerging approach is model-distributed learning, where a training model is distributed across workers. Model-distributed learning is a promising approach to reduce communication and storage costs, which is crucial for edge systems. In this paper, we design ResPipe, a novel resilient model-distributed DNN training mechanism against delayed/failed workers. We analyze the communication cost of ResPipe and demonstrate the trade-off between resiliency and communication cost. We implement ResPipe in a real testbed consisting of Android-based smartphones, and show that it improves the convergence rate and accuracy of training for convolutional neural networks (CNNs).

Goldstein, Brunno F., Ferreira, Victor C., Srinivasan, Sudarshan, Das, Dipankar, Nery, Alexandre S., Kundu, Sandip, França, Felipe M. G.. 2021. A Lightweight Error-Resiliency Mechanism for Deep Neural Networks. 2021 22nd International Symposium on Quality Electronic Design (ISQED). :311–316.

In recent years, Deep Neural Networks (DNNs) have made inroads into a number of applications involving pattern recognition - from facial recognition to self-driving cars. Some of these applications, such as self-driving cars, have real-time requirements, where specialized DNN hardware accelerators help meet those requirements. Since DNN execution time is dominated by convolution, Multiply-and-Accumulate (MAC) units are at the heart of these accelerators. As hardware accelerators push the performance limits with strict power constraints, reliability is often compromised. In particular, power-constrained DNN accelerators are more vulnerable to transient and intermittent hardware faults due to particle hits, manufacturing variations, and fluctuations in power supply voltage and temperature. Methods such as hardware replication have been used to deal with these reliability problems in the past. Unfortunately, the duplication approach is untenable in a power constrained environment. This paper introduces a low-cost error-resiliency scheme that targets MAC units employed in conventional DNN accelerators. We evaluate the reliability improvements from the proposed architecture using a set of 6 CNNs over varying bit error rates (BER) and demonstrate that our proposed solution can achieve more than 99% of fault coverage with a 5-bits arithmetic code, complying with the ASIL-D level of ISO26262 standards with a negligible area and power overhead. Additionally, we evaluate the proposed detection mechanism coupled with a word masking correction scheme, demonstrating no loss of accuracy up to a BER of 10-2.

2022-11-08

HeydariGorji, Ali, Rezaei, Siavash, Torabzadehkashi, Mahdi, Bobarshad, Hossein, Alves, Vladimir, Chou, Pai H.. 2020. HyperTune: Dynamic Hyperparameter Tuning for Efficient Distribution of DNN Training Over Heterogeneous Systems. 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). :1–8.

Distributed training is a novel approach to accelerating training of Deep Neural Networks (DNN), but common training libraries fall short of addressing the distributed nature of heterogeneous processors or interruption by other workloads on the shared processing nodes. This paper describes distributed training of DNN on computational storage devices (CSD), which are NAND flash-based, high-capacity data storage with internal processing engines. A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy by eliminating the unnecessary data movement between the storage device and the host processor. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks by dynamically tuning the training hyperparameters in heterogeneous systems to maintain the maximum overall processing speed in term of processed images per second and energy efficiency. Experimental results on image classification training benchmarks show up to 3.1x improvement in performance and 2.45x reduction in energy consumption when using Stannis plus CSD compare to the generic systems.

Boo, Yoonho, Shin, Sungho, Sung, Wonyong. 2020. Quantized Neural Networks: Characterization and Holistic Optimization. 2020 IEEE Workshop on Signal Processing Systems (SiPS). :1–6.

Quantized deep neural networks (QDNNs) are necessary for low-power, high throughput, and embedded applications. Previous studies mostly focused on developing optimization methods for the quantization of given models. However, quantization sensitivity depends on the model architecture. Also, the characteristics of weight and activation quantization are quite different. This study proposes a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Synthesized data is used to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization.

Yang, Shaofei, Liu, Longjun, Li, Baoting, Sun, Hongbin, Zheng, Nanning. 2020. Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators. 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). :315–319.

In this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67×, 2.42×, 1.52× without other overhead on convolution layers.

Wshah, Safwan, Shadid, Reem, Wu, Yuhao, Matar, Mustafa, Xu, Beilei, Wu, Wencheng, Lin, Lei, Elmoudi, Ramadan. 2020. Deep Learning for Model Parameter Calibration in Power Systems. 2020 IEEE International Conference on Power Systems Technology (POWERCON). :1–6.

In power systems, having accurate device models is crucial for grid reliability, availability, and resiliency. Existing model calibration methods based on mathematical approaches often lead to multiple solutions due to the ill-posed nature of the problem, which would require further interventions from the field engineers in order to select the optimal solution. In this paper, we present a novel deep-learning-based approach for model parameter calibration in power systems. Our study focused on the generator model as an example. We studied several deep-learning-based approaches including 1-D Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU), which were trained to estimate model parameters using simulated Phasor Measurement Unit (PMU) data. Quantitative evaluations showed that our proposed methods can achieve high accuracy in estimating the model parameters, i.e., achieved a 0.0079 MSE on the testing dataset. We consider these promising results to be the basis for further exploration and development of advanced tools for model validation and calibration.

Shomron, Gil, Weiser, Uri. 2020. Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). :256–269.

Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hard-ware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of zero-valued bits that may cause inefficiencies when executed on hard-ware. Inspired by conventional CPU simultaneous multithreading (SMT) that increases computer resource utilization by sharing them across several threads, we propose non-blocking SMT (NB-SMT) designated for DNN accelerators. Like conventional SMT, NB-SMT shares hardware resources among several execution flows. Yet, unlike SMT, NB-SMT is non-blocking, as it handles structural hazards by exploiting the algorithmic resiliency of DNNs. Instead of opportunistically dispatching instructions while they wait in a reservation station for available hardware, NB-SMT temporarily reduces the computation precision to accommodate all threads at once, enabling a non-blocking operation. We demonstrate NB-SMT applicability using SySMT, an NB-SMT-enabled output-stationary systolic array (OS-SA). Compared with a conventional OS-SA, a 2-threaded SySMT consumes 1.4× the area and delivers 2× speedup with 33% energy savings and less than 1% accuracy degradation of state-of-the-art CNNs with ImageNet. A 4-threaded SySMT consumes 2.5× the area and delivers, for example, 3.4× speedup and 39%×energy savings with 1% accuracy degradation of 40%-pruned ResNet-18.

2022-11-02

Shubham, Kumar, Venkatesh, Gopalakrishnan, Sachdev, Reijul, Akshi, Jayagopi, Dinesh Babu, Srinivasaraghavan, G.. 2021. Learning a Deep Reinforcement Learning Policy Over the Latent Space of a Pre-trained GAN for Semantic Age Manipulation. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.

Learning a disentangled representation of the latent space has become one of the most fundamental problems studied in computer vision. Recently, many Generative Adversarial Networks (GANs) have shown promising results in generating high fidelity images. However, studies to understand the semantic layout of the latent space of pre-trained models are still limited. Several works train conditional GANs to generate faces with required semantic attributes. Unfortunately, in these attempts, the generated output is often not as photo-realistic as the unconditional state-of-the-art models. Besides, they also require large computational resources and specific datasets to generate high fidelity images. In our work, we have formulated a Markov Decision Process (MDP) over the latent space of a pre-trained GAN model to learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds. Further, we have defined a semantic age manipulation scheme using a locally linear approximation over the latent space. Results show that our learned policy samples high fidelity images with required age alterations, while preserving the identity of the person.

Basioti, Kalliopi, Moustakides, George V.. 2021. Generative Adversarial Networks: A Likelihood Ratio Approach. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.

We are interested in the design of generative networks. The training of these mathematical structures is mostly performed with the help of adversarial (min-max) optimization problems. We propose a simple methodology for constructing such problems assuring, at the same time, consistency of the corresponding solution. We give characteristic examples developed by our method, some of which can be recognized from other applications, and some are introduced here for the first time. We present a new metric, the likelihood ratio, that can be employed online to examine the convergence and stability during the training of different Generative Adversarial Networks (GANs). Finally, we compare various possibilities by applying them to well-known datasets using neural networks of different configurations and sizes.

Zhang, Minghao, He, Lingmin, Wang, Xiuhui. 2021. Image Translation based on Attention Residual GAN. 2021 2nd International Conference on Artificial Intelligence and Computer Engineering (ICAICE). :802–805.

Using Generative Adversarial Networks (GAN) to translate images is a significant field in computer vision. There are partial distortion, artifacts and detail loss in the images generated by current image translation algorithms. In order to solve this problem, this paper adds attention-based residual neural network to the generator of GAN. Attention-based residual neural network can improve the representation ability of the generator by weighting the channels of the feature map. Experiment results on the Facades dataset show that Attention Residual GAN can translate images with excellent quality.

2022-10-13

Singh, Shweta, Singh, M.P., Pandey, Ramprakash. 2020. Phishing Detection from URLs Using Deep Learning Approach. 2020 5th International Conference on Computing, Communication and Security (ICCCS). :1—4.

Today, the Internet covers worldwide. All over the world, people prefer an E-commerce platform to buy or sell their products. Therefore, cybercrime has become the center of attraction for cyber attackers in cyberspace. Phishing is one such technique where the unidentified structure of the Internet has been used by attackers/criminals that intend to deceive users with the use of the illusory website and emails for obtaining their credentials (like account numbers, passwords, and PINs). Consequently, the identification of a phishing or legitimate web page is a challenging issue due to its semantic structure. In this paper, a phishing detection system is implemented using deep learning techniques to prevent such attacks. The system works on URLs by applying a convolutional neural network (CNN) to detect the phishing webpage. In paper [19] the proposed model has achieved 97.98% accuracy whereas our proposed system achieved accuracy of 98.00% which is better than earlier model. This system doesn’t require any feature engineering as the CNN extract features from the URLs automatically through its hidden layers. This is other advantage of the proposed system over earlier reported in [19] as the feature engineering is a very time-consuming task.

2022-10-12

Li, Chunzhi. 2021. A Phishing Detection Method Based on Data Mining. 2021 3rd International Conference on Applied Machine Learning (ICAML). :202—205.

Data mining technology is a very important technology in the current era of data explosion. With the informationization of society and the transparency and openness of information, network security issues have become the focus of concern of people all over the world. This paper wants to compare the accuracy of multiple machine learning methods and two deep learning frameworks when using lexical features to detect and classify malicious URLs. As a result, this paper shows that the Random Forest, which is an ensemble learning method for classification, is superior to 8 other machine learning methods in this paper. Furthermore, the Random Forest is even superior to some popular deep neural network models produced by famous frameworks such as TensorFlow and PyTorch when using lexical features to detect and classify malicious URLs.

2022-10-06

Zhu, Xiaoyan, Zhang, Yu, Zhu, Lei, Hei, Xinhong, Wang, Yichuan, Hu, Feixiong, Yao, Yanni. 2021. Chinese named entity recognition method for the field of network security based on RoBERTa. 2021 International Conference on Networking and Network Applications (NaNA). :420–425.

As the mobile Internet is developing rapidly, people who use cell phones to access the Internet dominate, and the mobile Internet has changed the development environment of online public opinion and made online public opinion events spread more widely. In the online environment, any kind of public issues may become a trigger for the generation of public opinion and thus need to be controlled for network supervision. The method in this paper can identify entities from the event texts obtained from mobile Today's Headlines, People's Daily, etc., and informatize security of public opinion in event instances, thus strengthening network supervision and control in mobile, and providing sufficient support for national security event management. In this paper, we present a SW-BiLSTM-CRF model, as well as a model combining the RoBERTa pre-trained model with the classical neural network BiLSTM model. Our experiments show that this approach provided achieves quite good results on Chinese emergency corpus, with accuracy and F1 values of 87.21% and 78.78%, respectively.

2022-09-30

Alqurashi, Saja, Shirazi, Hossein, Ray, Indrakshi. 2021. On the Performance of Isolation Forest and Multi Layer Perceptron for Anomaly Detection in Industrial Control Systems Networks. 2021 8th International Conference on Internet of Things: Systems, Management and Security (IOTSMS). :1–6.

With an increasing number of adversarial attacks against Industrial Control Systems (ICS) networks, enhancing the security of such systems is invaluable. Although attack prevention strategies are often in place, protecting against all attacks, especially zero-day attacks, is becoming impossible. Intrusion Detection Systems (IDS) are needed to detect such attacks promptly. Machine learning-based detection systems, especially deep learning algorithms, have shown promising results and outperformed other approaches. In this paper, we study the efficacy of a deep learning approach, namely, Multi Layer Perceptron (MLP), in detecting abnormal behaviors in ICS network traffic. We focus on very common reconnaissance attacks in ICS networks. In such attacks, the adversary focuses on gathering information about the targeted network. To evaluate our approach, we compare MLP with isolation Forest (i Forest), a statistical machine learning approach. Our proposed deep learning approach achieves an accuracy of more than 99% while i Forest achieves only 75%. This helps to reinforce the promise of using deep learning techniques for anomaly detection.

2022-09-29

Duman, Atahan, Sogukpinar, Ibrahim. 2021. Deep Learning Based Event Correlation Analysis in Information Systems. 2021 6th International Conference on Computer Science and Engineering (UBMK). :209–214.

Information systems and applications provide indispensable services at every stage of life, enabling us to carry out our activities more effectively and efficiently. Today, information technology systems produce many alarm and event records. These produced records often have a relationship with each other, and when this relationship is captured correctly, many interruptions that will harm institutions can be prevented before they occur. For example, an increase in the disk I/O speed of a server or a problem may cause the business software running on that server to slow down and cause different results in this slowness. Here, an institution’s accurate analysis and management of all event records, and rule-based analysis of the resulting records in certain time periods and depending on certain rules will ensure efficient and effective management of millions of alarms. In addition, it will be possible to prevent possible problems by removing the relationships between events. Events that occur in IT systems are a kind of footprint. It is also vital to keep a record of the events in question, and when necessary, these event records can be analyzed to analyze the efficiency of the systems, harmful interferences, system failure tendency, etc. By understanding the undesirable situations such as taking the necessary precautions, possible losses can be prevented. In this study, the model developed for fault prediction in systems by performing event log analysis in information systems is explained and the experimental results obtained are given.

2022-09-20

Shaomei, Lv, Xiangyan, Zeng, Long, Huang, Lan, Wu, Wei, Jiang. 2021. Passenger Volume Interval Prediction based on MTIGM (1,1) and BP Neural Network. 2021 33rd Chinese Control and Decision Conference (CCDC). :6013—6018.

The ternary interval number contains more comprehensive information than the exact number, and the prediction of the ternary interval number is more conducive to intelligent decision-making. In order to reduce the overfitting problem of the neural network model, a combination prediction method of the BP neural network and the matrix GM (1, 1) model for the ternary interval number sequence is proposed in the paper, and based on the proposed method to predict the passenger volume. The matrix grey model for the ternary interval number sequence (MTIGM (1, 1)) can stably predict the overall development trend of a time series. Considering the integrity of interval numbers, the BP neural network model is established by combining the lower, middle and upper boundary points of the ternary interval numbers. The combined weights of MTIGM (1, 1) and the BP neural network are determined based on the grey relational degree. The combined method is used to predict the total passenger volume and railway passenger volume of China, and the prediction effect is better than MTIGM (1, 1) and BP neural network.

2022-09-09

Yucheng, Zeng, Yongjiayou, Zeng, Yuhan, Zeng, Ruihan, Tao. 2020. Research on the Evaluation of Supply Chain Financial Risk under the Domination of 3PL Based on BP Neural Network. 2020 2nd International Conference on Economic Management and Model Engineering (ICEMME). :886—893.

The rise of supply chain finance has provided effective assistance to SMEs with financing difficulties. This study mainly explores the financial risk evaluation of supply chain under the leadership of 3PL. According to the risk identification, 27 comprehensive rating indicators were established, and then the model under the BP neural network was constructed through empirical data. The actual verification results show that the model performs very well in risk assessment which helps 3PL companies to better evaluate the business risks of supply chain finance, so as to take more effective risk management measures.

Gonçalves, Luís, Vimieiro, Renato. 2021. Approaching authorship attribution as a multi-view supervised learning task. 2021 International Joint Conference on Neural Networks (IJCNN). :1—8.

Authorship attribution is the problem of identifying the author of texts based on the author's writing style. It is usually assumed that the writing style contains traits inaccessible to conscious manipulation and can thus be safely used to identify the author of a text. Several style markers have been proposed in the literature, nevertheless, there is still no consensus on which best represent the choices of authors. Here we assume an agnostic viewpoint on the dispute for the best set of features that represents an author's writing style. We rather investigate how different sources of information may unveil different aspects of an author's style, complementing each other to improve the overall process of authorship attribution. For this we model authorship attribution as a multi-view learning task. We assess the effectiveness of our proposal applying it to a set of well-studied corpora. We compare the performance of our proposal to the state-of-the-art approaches for authorship attribution. We thoroughly analyze how the multi-view approach improves on methods that use a single data source. We confirm that our approach improves both in accuracy and consistency of the methods and discuss how these improvements are beneficial for linguists and domain specialists.

2022-08-26

Xia, Hongbing, Bao, Jinzhou, Guo, Ping. 2021. Asymptotically Stable Fault Tolerant Control for Nonlinear Systems Through Differential Game Theory. 2021 17th International Conference on Computational Intelligence and Security (CIS). :262—266.

This paper investigates an asymptotically stable fault tolerant control (FTC) method for nonlinear continuous-time systems (NCTS) with actuator failures via differential game theory (DGT). Based on DGT, the FTC problem can be regarded as a two-player differential game problem with control player and fault player, which is solved by utilizing adaptive dynamic programming technique. Using a critic-only neural network, the cost function is approximated to obtain the solution of the Hamilton-Jacobi-Isaacs equation (HJIE). Then, the FTC strategy can be obtained based on the saddle point of HJIE, and ensures the satisfactory control performance for NCTS. Furthermore, the closed-loop NCTS can be guaranteed to be asymptotically stable, rather than ultimately uniformly bounded in corresponding existing methods. Finally, a simulation example is provided to verify the safe and reliable fault tolerance performance of the designed control method.

Spyros, Chatzivasileiadis. 2020. From Decision Trees and Neural Networks to MILP: Power System Optimization Considering Dynamic Stability Constraints. 2020 European Control Conference (ECC). :594–594.

This work introduces methods that unlock a series of applications for decision trees and neural networks in power system optimization. Capturing constraints that were impossible to capture before in a scalable way, we use decision trees (or neural networks) to extract an accurate representation of the non-convex feasible region which is characterized by both algebraic and differential equations. Applying an exact transformation, we convert the information encoded in the decision trees and the neural networks to linear decision rules that we incorporate as conditional constraints in an optimization problem (MILP or MISOCP). Our approach introduces a framework to unify security considerations with electricity market operations, capturing not only steady-state but also dynamic stability constraints in power system optimization, and has the potential to eliminate redispatching costs, leading to savings of millions of euros per year.

Rajan, Mohammad Hasnain, Rebello, Keith, Sood, Yajur, Wankhade, Sunil B.. 2021. Graph-Based Transfer Learning for Conversational Agents. 2021 6th International Conference on Communication and Electronics Systems (ICCES). :1335–1341.

Graphs have proved to be a promising data structure to solve complex problems in various domains. Graphs store data in an associative manner which is analogous to the manner in which humans store memories in the brain. Generathe chatbots lack the ability to recall details revealed by the user in long conversations. To solve this problem, we have used graph-based memory to recall-related conversations from the past. Thus, providing context feature derived from query systems to generative systems such as OpenAI GPT. Using graphs to detect important details from the past reduces the total amount of processing done by the neural network. As there is no need to keep on passingthe entire history of the conversation. Instead, we pass only the last few pairs of utterances and the related details from the graph. This paper deploys this system and also demonstrates the ability to deploy such systems in real-world applications. Through the effective usage of knowledge graphs, the system is able to reduce the time complexity from O(n) to O(1) as compared to similar non-graph based implementations of transfer learning- based conversational agents.

Wadekar, Isha. 2021. Artificial Conversational Agent using Robust Adversarial Reinforcement Learning. 2021 International Conference on Computer Communication and Informatics (ICCCI). :1–7.

Reinforcement learning (R.L.) is an effective and practical means for resolving problems where the broker possesses no information or knowledge about the environment. The agent acquires knowledge that is conditioned on two components: trial-and-error and rewards. An R.L. agent determines an effective approach by interacting directly with the setting and acquiring information regarding the circumstances. However, many modern R.L.-based strategies neglect to theorise considering there is an enormous rift within the simulation and the physical world due to which policy-learning tactics displease that stretches from simulation to physical world Even if design learning is achieved in the physical world, the knowledge inadequacy leads to failed generalization policies from suiting to test circumstances. The intention of robust adversarial reinforcement learning(RARL) is where an agent is instructed to perform in the presence of a destabilizing opponent(adversary agent) that connects impedance to the system. The combined trained adversary is reinforced so that the actual agent i.e. the protagonist is equipped rigorously.

2022-08-12

Gepperth, Alexander, Pfülb, Benedikt. 2021. Image Modeling with Deep Convolutional Gaussian Mixture Models. 2021 International Joint Conference on Neural Networks (IJCNN). :1–9.

In this conceptual work, we present Deep Convolutional Gaussian Mixture Models (DCGMMs): a new formulation of deep hierarchical Gaussian Mixture Models (GMMs) that is particularly suitable for describing and generating images. Vanilla (i.e., flat) GMMs require a very large number of components to describe images well, leading to long training times and memory issues. DCGMMs avoid this by a stacked architecture of multiple GMM layers, linked by convolution and pooling operations. This allows to exploit the compositionality of images in a similar way as deep CNNs do. DCGMMs can be trained end-to-end by Stochastic Gradient Descent. This sets them apart from vanilla GMMs which are trained by Expectation-Maximization, requiring a prior k-means initialization which is infeasible in a layered structure. For generating sharp images with DCGMMs, we introduce a new gradient-based technique for sampling through non-invertible operations like convolution and pooling. Based on the MNIST and FashionMNIST datasets, we validate the DCGMMs model by demonstrating its superiority over flat GMMs for clustering, sampling and outlier detection.

Chen, Wenhu, Gan, Zhe, Li, Linjie, Cheng, Yu, Wang, William, Liu, Jingjing. 2021. Meta Module Network for Compositional Visual Reasoning. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). :655–664.

Neural Module Network (NMN) exhibits strong interpretability and compositionality thanks to its handcrafted neural modules with explicit multi-hop reasoning capability. However, most NMNs suffer from two critical draw-backs: 1) scalability: customized module for specific function renders it impractical when scaling up to a larger set of functions in complex tasks; 2) generalizability: rigid pre-defined module inventory makes it difficult to generalize to unseen functions in new tasks/domains. To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically. The instance modules are then woven into an execution graph for complex visual reasoning, inheriting the strong explainability and compositionality of NMN. With such a flexible instantiation mechanism, the parameters of instance modules are inherited from the central meta module, retaining the same model complexity as the function set grows, which promises better scalability. Meanwhile, as functions are encoded into the embedding space, unseen functions can be readily represented based on its structural similarity with previously observed ones, which ensures better generalizability. Experiments on GQA and CLEVR datasets validate the superiority of MMN over state-of-the-art NMN designs. Synthetic experiments on held-out unseen functions from GQA dataset also demonstrate the strong generalizability of MMN. Our code and model are released in Github1.