Biblio

List
Filter

Found 105 results

Filters: Keyword is convolutional neural nets [Clear All Filters]

2021-04-08

Zhang, J., Liao, Y., Zhu, X., Wang, H., Ding, J.. 2020. A Deep Learning Approach in the Discrete Cosine Transform Domain to Median Filtering Forensics. IEEE Signal Processing Letters. 27:276—280.

This letter presents a novel median filtering forensics approach, based on a convolutional neural network (CNN) with an adaptive filtering layer (AFL), which is built in the discrete cosine transform (DCT) domain. Using the proposed AFL, the CNN can determine the main frequency range closely related with the operational traces. Then, to automatically learn the multi-scale manipulation features, a multi-scale convolutional block is developed, exploring a new multi-scale feature fusion strategy based on the maxout function. The resultant features are further processed by a convolutional stream with pooling and batch normalization operations, and finally fed into the classification layer with the Softmax function. Experimental results show that our proposed approach is able to accurately detect the median filtering manipulation and outperforms the state-of-the-art schemes, especially in the scenarios of low image resolution and serious compression loss.

Mayer, O., Stamm, M. C.. 2020. Forensic Similarity for Digital Images. IEEE Transactions on Information Forensics and Security. 15:1331—1346.

In this paper, we introduce a new digital image forensics approach called forensic similarity, which determines whether two image patches contain the same forensic trace or different forensic traces. One benefit of this approach is that prior knowledge, e.g., training samples, of a forensic trace is not required to make a forensic similarity decision on it in the future. To do this, we propose a two-part deep-learning system composed of a convolutional neural network-based feature extractor and a three-layer neural network, called the similarity network. This system maps the pairs of image patches to a score indicating whether they contain the same or different forensic traces. We evaluated the system accuracy of determining whether two image patches were captured by the same or different camera model and manipulated by the same or a different editing operation and the same or a different manipulation parameter, given a particular editing operation. Experiments demonstrate applicability to a variety of forensic traces and importantly show efficacy on “unknown” forensic traces that were not used to train the system. Experiments also show that the proposed system significantly improves upon prior art, reducing error rates by more than half. Furthermore, we demonstrated the utility of the forensic similarity approach in two practical applications: forgery detection and localization, and database consistency verification.

Rhee, K. H.. 2020. Composition of Visual Feature Vector Pattern for Deep Learning in Image Forensics. IEEE Access. 8:188970—188980.

In image forensics, to determine whether the image is impurely transformed, it extracts and examines the features included in the suspicious image. In general, the features extracted for the detection of forgery images are based on numerical values, so it is somewhat unreasonable to use in the CNN structure for image classification. In this paper, the extraction method of a feature vector is using a least-squares solution. Treat a suspicious image like a matrix and its solution to be coefficients as the feature vector. Get two solutions from two images of the original and its median filter residual (MFR). Subsequently, the two features were formed into a visualized pattern and then fed into CNN deep learning to classify the various transformed images. A new structure of the CNN net layer was also designed by hybrid with the inception module and the residual block to classify visualized feature vector patterns. The performance of the proposed image forensics detection (IFD) scheme was measured with the seven transformed types of image: average filtered (window size: 3 × 3), gaussian filtered (window size: 3 × 3), JPEG compressed (quality factor: 90, 70), median filtered (window size: 3 × 3, 5 × 5), and unaltered. The visualized patterns are fed into the image input layer of the designed CNN hybrid model. Throughout the experiment, the accuracy of median filtering detection was 98% over. Also, the area under the curve (AUC) by sensitivity (TP: true positive rate) and 1-specificity (FP: false positive rate) results of the proposed IFD scheme approached to `1' on the designed CNN hybrid model. Experimental results show high efficiency and performance to classify the various transformed images. Therefore, the grade evaluation of the proposed scheme is “Excellent (A)”.

2021-03-30

Pyatnisky, I. A., Sokolov, A. N.. 2020. Assessment of the Applicability of Autoencoders in the Problem of Detecting Anomalies in the Work of Industrial Control Systems.. 2020 Global Smart Industry Conference (GloSIC). :234—239.

Deep learning methods are increasingly becoming solutions to complex problems, including the search for anomalies. While fully-connected and convolutional neural networks have already found their application in classification problems, their applicability to the problem of detecting anomalies is limited. In this regard, it is proposed to use autoencoders, previously used only in problems of reducing the dimension and removing noise, as a method for detecting anomalies in the industrial control system. A new method based on autoencoders is proposed for detecting anomalies in the operation of industrial control systems (ICS). Several neural networks based on auto-encoders with different architectures were trained, and the effectiveness of each of them in the problem of detecting anomalies in the work of process control systems was evaluated. Auto-encoders can detect the most complex and non-linear dependencies in the data, and as a result, can show the best quality for detecting anomalies. In some cases, auto-encoders require fewer machine resources.

2021-03-29

Begaj, S., Topal, A. O., Ali, M.. 2020. Emotion Recognition Based on Facial Expressions Using Convolutional Neural Network (CNN). 2020 International Conference on Computing, Networking, Telecommunications Engineering Sciences Applications (CoNTESA). :58—63.

Over the last few years, there has been an increasing number of studies about facial emotion recognition because of the importance and the impact that it has in the interaction of humans with computers. With the growing number of challenging datasets, the application of deep learning techniques have all become necessary. In this paper, we study the challenges of Emotion Recognition Datasets and we also try different parameters and architectures of the Conventional Neural Networks (CNNs) in order to detect the seven emotions in human faces, such as: anger, fear, disgust, contempt, happiness, sadness and surprise. We have chosen iCV MEFED (Multi-Emotion Facial Expression Dataset) as the main dataset for our study, which is relatively new, interesting and very challenging.

Singh, S., Nasoz, F.. 2020. Facial Expression Recognition with Convolutional Neural Networks. 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). :0324—0328.

Emotions are a powerful tool in communication and one way that humans show their emotions is through their facial expressions. One of the challenging and powerful tasks in social communications is facial expression recognition, as in non-verbal communication, facial expressions are key. In the field of Artificial Intelligence, Facial Expression Recognition (FER) is an active research area, with several recent studies using Convolutional Neural Networks (CNNs). In this paper, we demonstrate the classification of FER based on static images, using CNNs, without requiring any pre-processing or feature extraction tasks. The paper also illustrates techniques to improve future accuracy in this area by using pre-processing, which includes face detection and illumination correction. Feature extraction is used to extract the most prominent parts of the face, including the jaw, mouth, eyes, nose, and eyebrows. Furthermore, we also discuss the literature review and present our CNN architecture, and the challenges of using max-pooling and dropout, which eventually aided in better performance. We obtained a test accuracy of 61.7% on FER2013 in a seven-classes classification task compared to 75.2% in state-of-the-art classification.

Pranav, E., Kamal, S., Chandran, C. Satheesh, Supriya, M. H.. 2020. Facial Emotion Recognition Using Deep Convolutional Neural Network. 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). :317—320.

The rapid growth of artificial intelligence has contributed a lot to the technology world. As the traditional algorithms failed to meet the human needs in real time, Machine learning and deep learning algorithms have gained great success in different applications such as classification systems, recommendation systems, pattern recognition etc. Emotion plays a vital role in determining the thoughts, behaviour and feeling of a human. An emotion recognition system can be built by utilizing the benefits of deep learning and different applications such as feedback analysis, face unlocking etc. can be implemented with good accuracy. The main focus of this work is to create a Deep Convolutional Neural Network (DCNN) model that classifies 5 different human facial emotions. The model is trained, tested and validated using the manually collected image dataset.

John, A., MC, A., Ajayan, A. S., Sanoop, S., Kumar, V. R.. 2020. Real-Time Facial Emotion Recognition System With Improved Preprocessing and Feature Extraction. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). :1328—1333.

Human emotion recognition plays a vital role in interpersonal communication and human-machine interaction domain. Emotions are expressed through speech, hand gestures and by the movements of other body parts and through facial expression. Facial emotions are one of the most important factors in human communication that help us to understand, what the other person is trying to communicate. People understand only one-third of the message verbally, and two-third of it is through non-verbal means. There are many face emotion recognition (FER) systems present right now, but in real-life scenarios, they do not perform efficiently. Though there are many which claim to be a near-perfect system and to achieve the results in favourable and optimal conditions. The wide variety of expressions shown by people and the diversity in facial features of different people will not aid in the process of coming up with a system that is definite in nature. Hence developing a reliable system without any flaws showed by the existing systems is a challenging task. This paper aims to build an enhanced system that can analyse the exact facial expression of a user at that particular time and generate the corresponding emotion. Datasets like JAFFE and FER2013 were used for performance analysis. Pre-processing methods like facial landmark and HOG were incorporated into a convolutional neural network (CNN), and this has achieved good accuracy when compared with the already existing models.

Ozdemir, M. A., Elagoz, B., Soy, A. Alaybeyoglu, Akan, A.. 2020. Deep Learning Based Facial Emotion Recognition System. 2020 Medical Technologies Congress (TIPTEKNO). :1—4.

In this study, it was aimed to recognize the emotional state from facial images using the deep learning method. In the study, which was approved by the ethics committee, a custom data set was created using videos taken from 20 male and 20 female participants while simulating 7 different facial expressions (happy, sad, surprised, angry, disgusted, scared, and neutral). Firstly, obtained videos were divided into image frames, and then face images were segmented using the Haar library from image frames. The size of the custom data set obtained after the image preprocessing is more than 25 thousand images. The proposed convolutional neural network (CNN) architecture which is mimics of LeNet architecture has been trained with this custom dataset. According to the proposed CNN architecture experiment results, the training loss was found as 0.0115, the training accuracy was found as 99.62%, the validation loss was 0.0109, and the validation accuracy was 99.71%.

Jia, C., Li, C. L., Ying, Z.. 2020. Facial expression recognition based on the ensemble learning of CNNs. 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). :1—5.

As a part of body language, facial expression is a psychological state that reflects the current emotional state of the person. Recognition of facial expressions can help to understand others and enhance communication with others. We propose a facial expression recognition method based on convolutional neural network ensemble learning in this paper. Our model is composed of three sub-networks, and uses the SVM classifier to Integrate the output of the three networks to get the final result. The recognition accuracy of the model's expression on the FER2013 dataset reached 71.27%. The results show that the method has high test accuracy and short prediction time, and can realize real-time, high-performance facial recognition.

Xu, X., Ruan, Z., Yang, L.. 2020. Facial Expression Recognition Based on Graph Neural Network. 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC). :211—214.

Facial expressions are one of the most powerful, natural and immediate means for human being to present their emotions and intensions. In this paper, we present a novel method for fully automatic facial expression recognition. The facial landmarks are detected for characterizing facial expressions. A graph convolutional neural network is proposed for feature extraction and facial expression recognition classification. The experiments were performed on the three facial expression databases. The result shows that the proposed FER method can achieve good recognition accuracy up to 95.85% using the proposed method.

Al-Janabi, S. I. Ali, Al-Janabi, S. T. Faraj, Al-Khateeb, B.. 2020. Image Classification using Convolution Neural Network Based Hash Encoding and Particle Swarm Optimization. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI). :1–5.

Image Retrieval (IR) has become one of the main problems facing computer society recently. To increase computing similarities between images, hashing approaches have become the focus of many programmers. Indeed, in the past few years, Deep Learning (DL) has been considered as a backbone for image analysis using Convolutional Neural Networks (CNNs). This paper aims to design and implement a high-performance image classifier that can be used in several applications such as intelligent vehicles, face recognition, marketing, and many others. This work considers experimentation to find the sequential model's best configuration for classifying images. The best performance has been obtained from two layers' architecture; the first layer consists of 128 nodes, and the second layer is composed of 32 nodes, where the accuracy reached up to 0.9012. The proposed classifier has been achieved using CNN and the data extracted from the CIFAR-10 dataset by the inception model, which are called the Transfer Values (TRVs). Indeed, the Particle Swarm Optimization (PSO) algorithm is used to reduce the TRVs. In this respect, the work focus is to reduce the TRVs to obtain high-performance image classifier models. Indeed, the PSO algorithm has been enhanced by using the crossover technique from genetic algorithms. This led to a reduction of the complexity of models in terms of the number of parameters used and the execution time.

Moti, Z., Hashemi, S., Jahromi, A. N.. 2020. A Deep Learning-based Malware Hunting Technique to Handle Imbalanced Data. 2020 17th International ISC Conference on Information Security and Cryptology (ISCISC). :48–53.

Nowadays, with the increasing use of computers and the Internet, more people are exposed to cyber-security dangers. According to antivirus companies, malware is one of the most common threats of using the Internet. Therefore, providing a practical solution is critical. Current methods use machine learning approaches to classify malware samples automatically. Despite the success of these approaches, the accuracy and efficiency of these techniques are still inadequate, especially for multiple class classification problems and imbalanced training data sets. To mitigate this problem, we use deep learning-based algorithms for classification and generation of new malware samples. Our model is based on the opcode sequences, which are given to the model without any pre-processing. Besides, we use a novel generative adversarial network to generate new opcode sequences for oversampling minority classes. Also, we propose the model that is a combination of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) to classify malware samples. CNN is used to consider short-term dependency between features; while, LSTM is used to consider longer-term dependence. The experiment results show our method could classify malware to their corresponding family effectively. Our model achieves 98.99% validation accuracy.

2021-03-18

Bi, X., Liu, X.. 2020. Chinese Character Captcha Sequential Selection System Based on Convolutional Neural Network. 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL). :554—559.

To ensure security, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is widely used in people's online lives. This paper presents a Chinese character captcha sequential selection system based on convolutional neural network (CNN). Captchas composed of English and digits can already be identified with extremely high accuracy, but Chinese character captcha recognition is still challenging. The task we need to complete is to identify Chinese characters with different colors and different fonts that are not on a straight line with rotation and affine transformation on pictures with complex backgrounds, and then perform word order restoration on the identified Chinese characters. We divide the task into several sub-processes: Chinese character detection based on Faster R-CNN, Chinese character recognition and word order recovery based on N-Gram. In the Chinese character recognition sub-process, we have made outstanding contributions. We constructed a single Chinese character data set and built a 10-layer convolutional neural network. Eventually we achieved an accuracy of 98.43%, and completed the task perfectly.

2021-03-09

Lee, T., Chang, L., Syu, C.. 2020. Deep Learning Enabled Intrusion Detection and Prevention System over SDN Networks. 2020 IEEE International Conference on Communications Workshops (ICC Workshops). :1—6.

The Software Defined Network (SDN) provides higher programmable functionality for network configuration and management dynamically. Moreover, SDN introduces a centralized management approach by dividing the network into control and data planes. In this paper, we introduce a deep learning enabled intrusion detection and prevention system (DL-IDPS) to prevent secure shell (SSH) brute-force attacks and distributed denial-of-service (DDoS) attacks in SDN. The packet length in SDN switch has been collected as a sequence for deep learning models to identify anomalous and malicious packets. Four deep learning models, including Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and Stacked Auto-encoder (SAE), are implemented and compared for the proposed DL-IDPS. The experimental results show that the proposed MLP based DL-IDPS has the highest accuracy which can achieve nearly 99% and 100% accuracy to prevent SSH Brute-force and DDoS attacks, respectively.

Yerima, S. Y., Alzaylaee, M. K.. 2020. Mobile Botnet Detection: A Deep Learning Approach Using Convolutional Neural Networks. 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA). :1—8.

Android, being the most widespread mobile operating systems is increasingly becoming a target for malware. Malicious apps designed to turn mobile devices into bots that may form part of a larger botnet have become quite common, thus posing a serious threat. This calls for more effective methods to detect botnets on the Android platform. Hence, in this paper, we present a deep learning approach for Android botnet detection based on Convolutional Neural Networks (CNN). Our proposed botnet detection system is implemented as a CNN-based model that is trained on 342 static app features to distinguish between botnet apps and normal apps. The trained botnet detection model was evaluated on a set of 6,802 real applications containing 1,929 botnets from the publicly available ISCX botnet dataset. The results show that our CNN-based approach had the highest overall prediction accuracy compared to other popular machine learning classifiers. Furthermore, the performance results observed from our model were better than those reported in previous studies on machine learning based Android botnet detection.

Cui, W., Li, X., Huang, J., Wang, W., Wang, S., Chen, J.. 2020. Substitute Model Generation for Black-Box Adversarial Attack Based on Knowledge Distillation. 2020 IEEE International Conference on Image Processing (ICIP). :648–652.

Although deep convolutional neural network (CNN) performs well in many computer vision tasks, its classification mechanism is very vulnerable when it is exposed to the perturbation of adversarial attacks. In this paper, we proposed a new algorithm to generate the substitute model of black-box CNN models by using knowledge distillation. The proposed algorithm distills multiple CNN teacher models to a compact student model as the substitution of other black-box CNN models to be attacked. The black-box adversarial samples can be consequently generated on this substitute model by using various white-box attacking methods. According to our experiments on ResNet18 and DenseNet121, our algorithm boosts the attacking success rate (ASR) by 20% by training the substitute model based on knowledge distillation.

2021-03-01

Tan, R., Khan, N., Guan, L.. 2020. Locality Guided Neural Networks for Explainable Artificial Intelligence. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

In current deep network architectures, deeper layers in networks tend to contain hundreds of independent neurons which makes it hard for humans to understand how they interact with each other. By organizing the neurons by correlation, humans can observe how clusters of neighbouring neurons interact with each other. In this paper, we propose a novel algorithm for back propagation, called Locality Guided Neural Network (LGNN) for training networks that preserves locality between neighbouring neurons within each layer of a deep network. Heavily motivated by Self-Organizing Map (SOM), the goal is to enforce a local topology on each layer of a deep network such that neighbouring neurons are highly correlated with each other. This method contributes to the domain of Explainable Artificial Intelligence (XAI), which aims to alleviate the black-box nature of current AI methods and make them understandable by humans. Our method aims to achieve XAI in deep learning without changing the structure of current models nor requiring any post processing. This paper focuses on Convolutional Neural Networks (CNNs), but can theoretically be applied to any type of deep learning architecture. In our experiments, we train various VGG and Wide ResNet (WRN) networks for image classification on CIFAR100. In depth analyses presenting both qualitative and quantitative results demonstrate that our method is capable of enforcing a topology on each layer while achieving a small increase in classification accuracy.

Taylor, E., Shekhar, S., Taylor, G. W.. 2020. Response Time Analysis for Explainability of Visual Processing in CNNs. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). :1555–1558.

Explainable artificial intelligence (XAI) methods rely on access to model architecture and parameters that is not always feasible for most users, practitioners, and regulators. Inspired by cognitive psychology, we present a case for response times (RTs) as a technique for XAI. RTs are observable without access to the model. Moreover, dynamic inference models performing conditional computation generate variable RTs for visual learning tasks depending on hierarchical representations. We show that MSDNet, a conditional computation model with early-exit architecture, exhibits slower RT for images with more complex features in the ObjectNet test set, as well as the human phenomenon of scene grammar, where object recognition depends on intrascene object-object relationships. These results cast light on MSDNet's feature space without opening the black box and illustrate the promise of RT methods for XAI.

2021-02-23

Al-Emadi, S., Al-Mohannadi, A., Al-Senaid, F.. 2020. Using Deep Learning Techniques for Network Intrusion Detection. 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT). :171—176.

In recent years, there has been a significant increase in network intrusion attacks which raises a great concern from the privacy and security aspects. Due to the advancement of the technology, cyber-security attacks are becoming very complex such that the current detection systems are not sufficient enough to address this issue. Therefore, an implementation of an intelligent and effective network intrusion detection system would be crucial to solve this problem. In this paper, we use deep learning techniques, namely, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to design an intelligent detection system which is able to detect different network intrusions. Additionally, we evaluate the performance of the proposed solution using different evaluation matrices and we present a comparison between the results of our proposed solution to find the best model for the network intrusion detection system.

Liu, J., Xiao, K., Luo, L., Li, Y., Chen, L.. 2020. An intrusion detection system integrating network-level intrusion detection and host-level intrusion detection. 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). :122—129.

With the rapid development of Internet, the issue of cyber security has increasingly gained more attention. An intrusion Detection System (IDS) is an effective technique to defend cyber-attacks and reduce security losses. However, the challenge of IDS lies in the diversity of cyber-attackers and the frequently-changing data requiring a flexible and efficient solution. To address this problem, machine learning approaches are being applied in the IDS field. In this paper, we propose an efficient scalable neural-network-based hybrid IDS framework with the combination of Host-level IDS (HIDS) and Network-level IDS (NIDS). We applied the autoencoders (AE) to NIDS and designed HIDS using word embedding and convolutional neural network. To evaluate the IDS, many experiments are performed on the public datasets NSL-KDD and ADFA. It can detect many attacks and reduce the security risk with high efficiency and excellent scalability.

2021-02-16

Wang, L., Liu, Y.. 2020. A DDoS Attack Detection Method Based on Information Entropy and Deep Learning in SDN. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 1:1084—1088.

Software Defined Networking (SDN) decouples the control plane and the data plane and solves the difficulty of new services deployment. However, the threat of a single point of failure is also introduced at the same time. The attacker can launch DDoS attacks towards the controller through switches. In this paper, a DDoS attack detection method based on information entropy and deep learning is proposed. Firstly, suspicious traffic can be inspected through information entropy detection by the controller. Then, fine-grained packet-based detection is executed by the convolutional neural network (CNN) model to distinguish between normal traffic and attack traffic. Finally, the controller performs the defense strategy to intercept the attack. The experiments indicate that the accuracy of this method reaches 98.98%, which has the potential to detect DDoS attack traffic effectively in the SDN environment.

2021-02-08

Zhang, J.. 2020. DeepMal: A CNN-LSTM Model for Malware Detection Based on Dynamic Semantic Behaviours. 2020 International Conference on Computer Information and Big Data Applications (CIBDA). :313–316.

Malware refers to any software accessing or being installed in a system without the authorisation of administrators. Various malware has been widely used for cyber-criminals to accomplish their evil intentions and goals. To combat the increasing amount and reduce the threat of malicious programs, a novel deep learning framework, which uses NLP techniques for reference, combines CNN and LSTM neurones to capture the locally spatial correlations and learn from sequential longterm dependency is proposed. Hence, high-level abstractions and representations are automatically extracted for the malware classification task. The classification accuracy improves from 0.81 (best one by Random Forest) to approximately 1.0.

2021-02-01

Wang, H., Li, Y., Wang, Y., Hu, H., Yang, M.-H.. 2020. Collaborative Distillation for Ultra-Resolution Universal Style Transfer. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :1857–1866.

Universal style transfer methods typically leverage rich representations from deep Convolutional Neural Network (CNN) models (e.g., VGG-19) pre-trained on large collections of images. Despite the effectiveness, its application is heavily constrained by the large model size to handle ultra-resolution images given limited memory. In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters. The main idea is underpinned by a finding that the encoder-decoder pairs construct an exclusive collaborative relationship, which is regarded as a new kind of knowledge for style transfer models. Moreover, to overcome the feature size mismatch when applying collaborative distillation, a linear embedding loss is introduced to drive the student network to learn a linear embedding of the teacher's features. Extensive experiments show the effectiveness of our method when applied to different universal style transfer approaches (WCT and AdaIN), even if the model size is reduced by 15.5 times. Especially, on WCT with the compressed models, we achieve ultra-resolution (over 40 megapixels) universal style transfer on a 12GB GPU for the first time. Further experiments on optimization-based stylization scheme show the generality of our algorithm on different stylization paradigms. Our code and trained models are available at https://github.com/mingsun-tse/collaborative-distillation.

Jiang, H., Du, M., Whiteside, D., Moursy, O., Yang, Y.. 2020. An Approach to Embedding a Style Transfer Model into a Mobile APP. 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). :307–316.

The prevalence of photo processing apps suggests the demands of picture editing. As an implementation of the convolutional neural network, style transfer has been deep investigated and there are supported materials to realize it on PC platform. However, few approaches are mentioned to deploy a style transfer model on the mobile and meet the requirements of mobile users. The traditional style transfer model takes hours to proceed, therefore, based on a Perceptual Losses algorithm [1], we created a feedforward neural network for each style and the proceeding time was reduced to a few seconds. The training data were generated from a pre-trained convolutional neural network model, VGG-19. The algorithm took thousandth time and generated similar output as the original. Furthermore, we optimized the model and deployed the model with TensorFlow Mobile library. We froze the model and adopted a bitmap to scale the inputs to 720×720 and reverted back to the original resolution. The reverting process may create some blur but it can be regarded as a feature of art. The generated images have reliable quality and the waiting time is independent of the content and pattern of input images. The main factor that influences the proceeding time is the input resolution. The average waiting time of our model on the mobile phone, HUAWEI P20 Pro, is less than 2 seconds for 720p images and around 2.8 seconds for 1080p images, which are ten times slower than that on the PC GPU, Tesla T40. The performance difference depends on the architecture of the model.