Biblio
Filters: Keyword is Training [Clear All Filters]
A Fog-Augmented Machine Learning based SMS Spam Detection and Classification System. 2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC). :325–330.
.
2020. Smart cities and societies are driving unprecedented technological and socioeconomic growth in everyday life albeit making us increasingly vulnerable to infinitely and incomprehensibly diverse threats. Short Message Service (SMS) spam is one such threat that can affect mobile security by propagating malware on mobile devices. A security breach could also cause a mobile device to send spam messages. Many works have focused on classifying incoming SMS messages. This paper proposes a tool to detect spam from outgoing SMS messages, although the work can be applied to both incoming and outgoing SMS messages. Specifically, we develop a system that comprises multiple machine learning (ML) based classifiers built by us using three classification methods – Naïve Bayes (NB), Support Vector Machine (SVM), and Naïve Bayes Multinomial (NBM)- and five preprocessing and feature extraction methods. The system is built to allow its execution in cloud, fog or edge layers, and is evaluated using 15 datasets built by 4 widely-used public SMS datasets. The system detects spam SMSs and gives recommendations on the spam filters and classifiers to be used based on user preferences including classification accuracy, True Negatives (TN), and computational resource requirements.
Forensic Similarity for Digital Images. IEEE Transactions on Information Forensics and Security. 15:1331—1346.
.
2020. In this paper, we introduce a new digital image forensics approach called forensic similarity, which determines whether two image patches contain the same forensic trace or different forensic traces. One benefit of this approach is that prior knowledge, e.g., training samples, of a forensic trace is not required to make a forensic similarity decision on it in the future. To do this, we propose a two-part deep-learning system composed of a convolutional neural network-based feature extractor and a three-layer neural network, called the similarity network. This system maps the pairs of image patches to a score indicating whether they contain the same or different forensic traces. We evaluated the system accuracy of determining whether two image patches were captured by the same or different camera model and manipulated by the same or a different editing operation and the same or a different manipulation parameter, given a particular editing operation. Experiments demonstrate applicability to a variety of forensic traces and importantly show efficacy on “unknown” forensic traces that were not used to train the system. Experiments also show that the proposed system significantly improves upon prior art, reducing error rates by more than half. Furthermore, we demonstrated the utility of the forensic similarity approach in two practical applications: forgery detection and localization, and database consistency verification.
A Fully-integrated Gesture and Gait Processing SoC for Rehabilitation with ADC-less Mixed-signal Feature Extraction and Deep Neural Network for Classification and Online Training. 2020 IEEE Custom Integrated Circuits Conference (CICC). :1–4.
.
2020. An ultra-low-power gesture and gait classification SoC is presented for rehabilitation application featuring (1) mixed-signal feature extraction and integrated low-noise amplifier eliminating expensive ADC and digital feature extraction, (2) an integrated distributed deep neural network (DNN) ASIC supporting a scalable multi-chip neural network for sensor fusion with distortion resiliency for low-cost front end modules, (3) onchip learning of DNN engine allowing in-situ training of user specific operations. A 12-channel 65nm CMOS test chip was fabricated with 1μW power per channel, less than 3ms computation latency, on-chip training for user-specific DNN model and multi-chip networking capability.
Hardware Trojan Detection Based on SRC. 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC). :472–475.
.
2020. The security of integrated circuits (IC) plays a very significant role on military, economy, communication and other industries. Due to the globalization of the integrated circuit (IC) from design to manufacturing process, the IC chip is vulnerable to be implanted malicious circuit, which is known as hardware Trojan (HT). When the HT is activated, it will modify the functionality, reduce the reliability of IC, and even leak confidential information about the system and seriously threatens national security. The HT detection theory and method is hotspot in the security of integrated circuit. However, most methods are focusing on the simulated data. Moreover, the measurement data of the real circuit are greatly affected by the measurement noise and process disturbances and few methods are available with small size of the Trojan circuit. In this paper, the problem of detection was cast as signal representation among multiple linear regression and sparse representation-based classifier (SRC) were first applied for Trojan detection. We assume that the training samples from a single class do lie on a subspace, and the test samples can be represented by the single class. The proposed SRC HT detection method on real integrated circuit shows high accuracy and efficiency.
Hash Retrieval Method for Recaptured Images Based on Convolutional Neural Network. 2020 2nd World Symposium on Artificial Intelligence (WSAI). :79–83.
.
2020. For the purpose of outdoor advertising market researching, AD images are recaptured and uploaded everyday for statistics. But the quality of the recaptured advertising images are often affected by conditions such as angle, distance, and light during the shooting process, which consequently reduce either the speed or the accuracy of the retrieving algorithm. In this paper, we proposed a hash retrieval method based on convolutional neural networks for recaptured images. The basic idea is to add a hash layer to the convolutional neural network and then extract the binary hash code output by the hash layer to perform image retrieval in lowdimensional Hamming space. Experimental results show that the retrieval performance is improved compared with the current commonly used hash retrieval methods.
A Hierarchical Fine-Tuning Based Approach for Multi-Label Text Classification. 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). :51–54.
.
2020. Hierarchical Text classification has recently become increasingly challenging with the growing number of classification labels. In this paper, we propose a hierarchical fine-tuning based approach for hierarchical text classification. We use the ordered neurons LSTM (ONLSTM) model by combining the embedding of text and parent category for hierarchical text classification with a large number of categories, which makes full use of the connection between the upper-level and lower-level labels. Extensive experiments show that our model outperforms the state-of-the-art hierarchical model at a lower computation cost.
High Accuracy Phishing Detection Based on Convolutional Neural Networks. 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS). :1—6.
.
2020. The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this paper compares favourably to the state-of-the art in deep learning based phishing website detection.
HIGhER: Improving instruction following with Hindsight Generation for Experience Replay. 2020 IEEE Symposium Series on Computational Intelligence (SSCI). :225–232.
.
2020. Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios. This joint learning problem is alleviated through expert demonstrations, auxiliary losses, or neural inductive biases. In this paper, we propose an orthogonal approach called Hindsight Generation for Experience Replay (HIGhER) that extends the Hindsight Experience Replay approach to the language-conditioned policy setting. Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, HIGhER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We show the efficiency of our approach in the BabyAI environment, and demonstrate how it complements other instruction following methods.
A Hybrid Feature Extraction Network for Intrusion Detection Based on Global Attention Mechanism. 2020 International Conference on Computer Information and Big Data Applications (CIBDA). :481—485.
.
2020. The widespread application of 5G will make intrusion detection of large-scale network traffic a mere need. However, traditional intrusion detection cannot meet the requirements by manually extracting features, and the existing AI methods are also relatively inefficient. Therefore, when performing intrusion detection tasks, they have significant disadvantages of high false alarm rates and low recognition performance. For this challenge, this paper proposes a novel hybrid network, RULA-IDS, which can perform intrusion detection tasks by great amount statistical data from the network monitoring system. RULA-IDS consists of the fully connected layer, the feature extraction layer, the global attention mechanism layer and the SVM classification layer. In the feature extraction layer, the residual U-Net and LSTM are used to extract the spatial and temporal features of the network traffic attributes. It is worth noting that we modified the structure of U-Net to suit the intrusion detection task. The global attention mechanism layer is then used to selectively retain important information from a large number of features and focus on those. Finally, the SVM is used as a classifier to output results. The experimental results show that our method outperforms existing state-of-the-art intrusion detection methods, and the accuracies of training and testing are improved to 97.01% and 98.19%, respectively, and presents stronger robustness during training and testing.
Identification of Computer Displays Through Their Electromagnetic Emissions Using Support Vector Machines. 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA). :1–5.
.
2020. As a TEMPEST information security problem, electromagnetic emissions from the computer displays can be captured, and reconstructed using signal processing techniques. It is necessary to identify the display type to intercept the image of the display. To determine the display type not only significant for attackers but also for protectors to prevent display compromising emanations. This study relates to the identification of the display type using Support Vector Machines (SVM) from electromagnetic emissions emitted from computer displays. After measuring the emissions using receiver measurement system, the signals were processed and training/test data sets were formed and the classification performance of the displays was examined with the SVM. Moreover, solutions for a better classification under real conditions have been proposed. Thus, one of the important step of the display image capture can accomplished by automatically identification the display types. The performance of the proposed method was evaluated in terms of confusion matrix and accuracy, precision, F1-score, recall performance measures.
Identifying Vulnerable IoT Applications Using Deep Learning. 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). :582–586.
.
2020. This paper presents an approach for the identification of vulnerable IoT applications using deep learning algorithms. The approach focuses on a category of vulnerabilities that leads to sensitive information leakage which can be identified using taint flow analysis. First, we analyze the source code of IoT apps in order to recover tokens along their frequencies and tainted flows. Second, we develop, Token2Vec, which transforms the source code tokens into vectors. We have also developed Flow2Vec, which transforms the identified tainted flows into vectors. Third, we use the recovered vectors to train a deep learning algorithm to build a model for the identification of tainted apps. We have evaluated the approach on two datasets and the experiments show that the proposed approach of combining tainted flows features with the base benchmark that uses token frequencies only, has improved the accuracy of the prediction models from 77.78% to 92.59% for Corpus1 and 61.11% to 87.03% for Corpus2.
Image Classification using Convolution Neural Network Based Hash Encoding and Particle Swarm Optimization. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI). :1–5.
.
2020. Image Retrieval (IR) has become one of the main problems facing computer society recently. To increase computing similarities between images, hashing approaches have become the focus of many programmers. Indeed, in the past few years, Deep Learning (DL) has been considered as a backbone for image analysis using Convolutional Neural Networks (CNNs). This paper aims to design and implement a high-performance image classifier that can be used in several applications such as intelligent vehicles, face recognition, marketing, and many others. This work considers experimentation to find the sequential model's best configuration for classifying images. The best performance has been obtained from two layers' architecture; the first layer consists of 128 nodes, and the second layer is composed of 32 nodes, where the accuracy reached up to 0.9012. The proposed classifier has been achieved using CNN and the data extracted from the CIFAR-10 dataset by the inception model, which are called the Transfer Values (TRVs). Indeed, the Particle Swarm Optimization (PSO) algorithm is used to reduce the TRVs. In this respect, the work focus is to reduce the TRVs to obtain high-performance image classifier models. Indeed, the PSO algorithm has been enhanced by using the crossover technique from genetic algorithms. This led to a reduction of the complexity of models in terms of the number of parameters used and the execution time.
Improving DGA-Based Malicious Domain Classifiers for Malware Defense with Adversarial Machine Learning. 2020 IEEE 4th Conference on Information Communication Technology (CICT). :1–6.
.
2020. Domain Generation Algorithms (DGAs) are used by adversaries to establish Command and Control (C&C) server communications during cyber attacks. Blacklists of known/identified C&C domains are used as one of the defense mechanisms. However, static blacklists generated by signature-based approaches can neither keep up nor detect never-seen-before malicious domain names. To address this weakness, we applied a DGA-based malicious domain classifier using the Long Short-Term Memory (LSTM) method with a novel feature engineering technique. Our model's performance shows a greater accuracy compared to a previously reported model. Additionally, we propose a new adversarial machine learning-based method to generate never-before-seen malware-related domain families. We augment the training dataset with new samples to make the training of the models more effective in detecting never-before-seen malicious domain names. To protect blacklists of malicious domain names against adversarial access and modifications, we devise secure data containers to store and transfer blacklists.
Intelligent SDN Traffic Classification Using Deep Learning: Deep-SDN. 2020 2nd International Conference on Computer Communication and the Internet (ICCCI). :184–189.
.
2020. Accurate traffic classification is fundamentally important for various network activities such as fine-grained network management and resource utilisation. Port-based approaches, deep packet inspection and machine learning are widely used techniques to classify and analyze network traffic flows. However, over the past several years, the growth of Internet traffic has been explosive due to the greatly increased number of Internet users. Therefore, both port-based and deep packet inspection approaches have become inefficient due to the exponential growth of the Internet applications that incurs high computational cost. The emerging paradigm of software-defined networking has reshaped the network architecture by detaching the control plane from the data plane to result in a centralised network controller that maintains a global view over the whole network on its domain. In this paper, we propose a new deep learning model for software-defined networks that can accurately identify a wide range of traffic applications in a short time, called Deep-SDN. The performance of the proposed model was compared against the state-of-the-art and better results were reported in terms of accuracy, precision, recall, and f-measure. It has been found that 96% as an overall accuracy can be achieved with the proposed model. Based on the obtained results, some further directions are suggested towards achieving further advances in this research area.
LSTM-based Frequency Hopping Sequence Prediction. 2020 International Conference on Wireless Communications and Signal Processing (WCSP). :472–477.
.
2020. The continuous change of communication frequency brings difficulties to the reconnaissance and prediction of non-cooperative communication. The core of this communication process is the frequency-hopping (FH) sequence with pseudo-random characteristics, which controls carrier frequency hopping. However, FH sequence is always generated by a certain model and is a kind of time sequence with certain regularity. Long Short-Term Memory (LSTM) neural network in deep learning has been proved to have strong ability to solve time series problems. Therefore, in this paper, we establish LSTM model to implement FH sequence prediction. The simulation results show that LSTM-based scheme can effectively predict frequency point by point based on historical HF frequency data. Further, we achieve frequency interval prediction based on frequency point prediction.
Machine Learning Based Recommendation System. 2020 10th International Conference on Cloud Computing, Data Science Engineering (Confluence). :660–664.
.
2020. Recommender system helps people in decision making by asking their preferences about various items and recommends other items that have not been rated yet and are similar to their taste. A traditional recommendation system aims at generating a set of recommendations based on inter-user similarity that will satisfy the target user. Positive preferences as well as negative preferences of the users are taken into account so as to find strongly related users. Weighted entropy is usedz as a similarity measure to determine the similar taste users. The target user is asked to fill in the ratings so as to identify the closely related users from the knowledge base and top N recommendations are produced accordingly. Results show a considerable amount of improvement in accuracy after using weighted entropy and opposite preferences as a similarity measure.
Malware Detection for Industrial Internet Based on GAN. 2020 IEEE International Conference on Information Technology,Big Data and Artificial Intelligence (ICIBA). 1:475–481.
.
2020. This thesis focuses on the detection of malware in industrial Internet. The basic flow of the detection of malware contains feature extraction and sample identification. API graph can effectively represent the behavior information of malware. However, due to the high algorithm complexity of solving the problem of subgraph isomorphism, the efficiency of analysis based on graph structure feature is low. Due to the different scales of API graph of different malicious codes, the API graph needs to be normalized. Considering the difficulties of sample collection and manual marking, it is necessary to expand the number of malware samples in industrial Internet. This paper proposes a method that combines PageRank with TF-IDF to process the API graph. Besides, this paper proposes a method to construct the adversarial samples of malwares based on GAN.
A Malware Similarity Analysis Method Based on Network Control Structure Graph. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :295–300.
.
2020. Recently, graph-based malware similarity analysis has been widely used in the field of malware detection. However, the wide application of code obfuscation, polymorphism, and deformation changes the structure of malicious code, which brings great challenges to the malware similarity analysis. To solve these problems, in this paper, we present a new approach to malware similarity analysis based on the network control structure graph (NCSG). This method analyzed the behavior of malware by application program interface (API) association and constructed NCSG. The graph could reflect the command-and-control(C&C) logic of malware. Therefore, it can resist the interference of code obfuscation technology. The structural features extracted from NCSG will be used as the basis of similarity analysis for training the detection model. Finally, we tested the dataset constructed from five known malware family samples, and the experimental results showed that the accuracy of this method for malware variation analysis reached 92.75%. In conclusion, the malware similarity analysis based on NCSG has a strong application value for identifying the same family of malware.
A New Black Box Attack Generating Adversarial Examples Based on Reinforcement Learning. 2020 Information Communication Technologies Conference (ICTC). :141–146.
.
2020. Machine learning can be misled by adversarial examples, which is formed by making small changes to the original data. Nowadays, there are kinds of methods to produce adversarial examples. However, they can not apply non-differentiable models, reduce the amount of calculations, and shorten the sample generation time at the same time. In this paper, we propose a new black box attack generating adversarial examples based on reinforcement learning. By using deep Q-learning network, we can train the substitute model and generate adversarial examples at the same time. Experimental results show that this method only needs 7.7ms to produce an adversarial example, which solves the problems of low efficiency, large amount of calculation and inapplicable to non-differentiable model.
Noise Reduction Framework for Distantly Supervised Relation Extraction with Human in the Loop. 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). :1–4.
.
2020. Distant supervision is a widely used data labeling method for relation extraction. While aligning knowledge base with the corpus, distant supervision leads to a mass of wrong labels which are defined as noise. The pattern-based denoising model has achieved great progress in selecting trustable sentences (instances). However, the writing of relation-specific patterns heavily relies on expert’s knowledge and is a high labor intensity work. To solve these problems, we propose a noise reduction framework, NOIR, to iteratively select trustable sentences with a little help of a human. Under the guidance of experts, the iterative process can avoid semantic drift. Besides, NOIR can help experts discover relation-specific tokens that are hard to think of. Experimental results on three real-world datasets show the effectiveness of the proposed method compared with state-of-the-art methods.
OC-FakeDect: Classifying Deepfakes Using One-class Variational Autoencoder. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). :2794—2803.
.
2020. An image forgery method called Deepfakes can cause security and privacy issues by changing the identity of a person in a photo through the replacement of his/her face with a computer-generated image or another person's face. Therefore, a new challenge of detecting Deepfakes arises to protect individuals from potential misuses. Many researchers have proposed various binary-classification based detection approaches to detect deepfakes. However, binary-classification based methods generally require a large amount of both real and fake face images for training, and it is challenging to collect sufficient fake images data in advance. Besides, when new deepfakes generation methods are introduced, little deepfakes data will be available, and the detection performance may be mediocre. To overcome these data scarcity limitations, we formulate deepfakes detection as a one-class anomaly detection problem. We propose OC-FakeDect, which uses a one-class Variational Autoencoder (VAE) to train only on real face images and detects non-real images such as deepfakes by treating them as anomalies. Our preliminary result shows that our one class-based approach can be promising when detecting Deepfakes, achieving a 97.5% accuracy on the NeuralTextures data of the well-known FaceForensics++ benchmark dataset without using any fake images for the training process.
Polymorphic Adversarial DDoS attack on IDS using GAN. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–6.
.
2020. Intrusion Detection systems are important tools in preventing malicious traffic from penetrating into networks and systems. Recently, Intrusion Detection Systems are rapidly enhancing their detection capabilities using machine learning algorithms. However, these algorithms are vulnerable to new unknown types of attacks that can evade machine learning IDS. In particular, they may be vulnerable to attacks based on Generative Adversarial Networks (GAN). GANs have been widely used in domains such as image processing, natural language processing to generate adversarial data of different types such as graphics, videos, texts, etc. We propose a model using GAN to generate adversarial DDoS attacks that can change the attack profile and can be undetected. Our simulation results indicate that by continuous changing of attack profile, defensive systems that use incremental learning will still be vulnerable to new attacks.
A Practical Black-Box Attack Against Autonomous Speech Recognition Model. GLOBECOM 2020 - 2020 IEEE Global Communications Conference. :1–6.
.
2020. With the wild applications of machine learning (ML) technology, automatic speech recognition (ASR) has made great progress in recent years. Despite its great potential, there are various evasion attacks of ML-based ASR, which could affect the security of applications built upon ASR. Up to now, most studies focus on white-box attacks in ASR, and there is almost no attention paid to black-box attacks where attackers can only query the target model to get output labels rather than probability vectors in audio domain. In this paper, we propose an evasion attack against ASR in the above-mentioned situation, which is more feasible in realistic scenarios. Specifically, we first train a substitute model by using data augmentation, which ensures that we have enough samples to train with a small number of times to query the target model. Then, based on the substitute model, we apply Differential Evolution (DE) algorithm to craft adversarial examples and implement black-box attack against ASR models from the Speech Commands dataset. Extensive experiments are conducted, and the results illustrate that our approach achieves untargeted attacks with over 70% success rate while still maintaining the authenticity of the original data well.
Spectrum Occupancy Prediction Exploiting Time and Frequency Correlations Through 2D-LSTM. 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring). :1–5.
.
2020. The identification of spectrum opportunities is a pivotal requirement for efficient spectrum utilization in cognitive radio systems. Spectrum prediction offers a convenient means for revealing such opportunities based on the previously obtained occupancies. As spectrum occupancy states are correlated over time, spectrum prediction is often cast as a predictable time-series process using classical or deep learning-based models. However, this variety of methods exploits time-domain correlation and overlooks the existing correlation over frequency. In this paper, differently from previous works, we investigate a more realistic scenario by exploiting correlation over time and frequency through a 2D-long short-term memory (LSTM) model. Extensive experimental results show a performance improvement over conventional spectrum prediction methods in terms of accuracy and computational complexity. These observations are validated over the real-world spectrum measurements, assuming a frequency range between 832-862 MHz where most of the telecom operators in Turkey have private uplink bands.
Is Spiking Secure? A Comparative Study on the Security Vulnerabilities of Spiking and Deep Neural Networks 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.
.
2020. Spiking Neural Networks (SNNs) claim to present many advantages in terms of biological plausibility and energy efficiency compared to standard Deep Neural Networks (DNNs). Recent works have shown that DNNs are vulnerable to adversarial attacks, i.e., small perturbations added to the input data can lead to targeted or random misclassifications. In this paper, we aim at investigating the key research question: "Are SNNs secure?" Towards this, we perform a comparative study of the security vulnerabilities in SNNs and DNNs w.r.t. the adversarial noise. Afterwards, we propose a novel black-box attack methodology, i.e., without the knowledge of the internal structure of the SNN, which employs a greedy heuristic to automatically generate imperceptible and robust adversarial examples (i.e., attack images) for the given SNN. We perform an in-depth evaluation for a Spiking Deep Belief Network (SDBN) and a DNN having the same number of layers and neurons (to obtain a fair comparison), in order to study the efficiency of our methodology and to understand the differences between SNNs and DNNs w.r.t. the adversarial examples. Our work opens new avenues of research towards the robustness of the SNNs, considering their similarities to the human brain's functionality.