Visible to the public Biblio

Filters: Keyword is supervised learning  [Clear All Filters]
2021-12-20
Janapriya, N., Anuradha, K., Srilakshmi, V..  2021.  Adversarial Deep Learning Models With Multiple Adversaries. 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA). :522–525.
Adversarial machine learning calculations handle adversarial instance age, producing bogus data information with the ability to fool any machine learning model. As the word implies, “foe” refers to a rival, whereas “rival” refers to a foe. In order to strengthen the machine learning models, this section discusses about the weakness of machine learning models and how effectively the misinterpretation occurs during the learning cycle. As definite as it is, existing methods such as creating adversarial models and devising powerful ML computations, frequently ignore semantics and the general skeleton including ML section. This research work develops an adversarial learning calculation by considering the coordinated portrayal by considering all the characteristics and Convolutional Neural Networks (CNN) explicitly. Figuring will most likely express minimal adjustments via data transport represented over positive and negative class markings, as well as a specific subsequent data flow misclassified by CNN. The final results recommend a certain game theory and formative figuring, which obtain incredible favored ensuring about significant learning models against the execution of shortcomings, which are reproduced as attack circumstances against various adversaries.
2021-11-29
Yin, Yifei, Zulkernine, Farhana, Dahan, Samuel.  2020.  Determining Worker Type from Legal Text Data Using Machine Learning. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :444–450.
This project addresses a classic employment law question in Canada and elsewhere using machine learning approach: how do we know whether a worker is an employee or an independent contractor? This is a central issue for self-represented litigants insofar as these two legal categories entail very different rights and employment protections. In this interdisciplinary research study, we collaborated with the Conflict Analytics Lab to develop machine learning models aimed at determining whether a worker is an employee or an independent contractor. We present a number of supervised learning models including a neural network model that we implemented using data labeled by law researchers and compared the accuracy of the models. Our neural network model achieved an accuracy rate of 91.5%. A critical discussion follows to identify the key features in the data that influence the accuracy of our models and provide insights about the case outcomes.
2021-09-21
Yan, Fan, Liu, Jia, Gu, Liang, Chen, Zelong.  2020.  A Semi-Supervised Learning Scheme to Detect Unknown DGA Domain Names Based on Graph Analysis. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1578–1583.
A large amount of malware families use the domain generation algorithms (DGA) to randomly generate a large amount of domain names. It is a good way to bypass conventional blacklists of domain names, because we cannot predict which of the randomly generated domain names are selected for command and control (C&C) communications. An effective approach for detecting known DGA families is to investigate the malware with reverse engineering to find the adopted generation algorithms. As reverse engineering cannot handle the variants of DGA families, some researches leverage supervised learning to find new variants. However, the explainability of supervised learning is low and cannot find previously unseen DGA families. In this paper, we propose a graph-based semi-supervised learning scheme to track the evolution of known DGA families and find previously unseen DGA families. With a domain relation graph, we can clearly figure out how new variants relate to known DGA domain names, which induces better explainability. We deployed the proposed scheme on real network scenarios and show that the proposed scheme can not only comprehensively and precisely find known DGA families, but also can find new DGA families which have not seen before.
Jin, Xiang, Xing, Xiaofei, Elahi, Haroon, Wang, Guojun, Jiang, Hai.  2020.  A Malware Detection Approach Using Malware Images and Autoencoders. 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). :1–6.
Most machine learning-based malware detection systems use various supervised learning methods to classify different instances of software as benign or malicious. This approach provides no information regarding the behavioral characteristics of malware. It also requires a large amount of training data and is prone to labeling difficulties and can reduce accuracy due to redundant training data. Therefore, we propose a malware detection method based on deep learning, which uses malware images and a set of autoencoders to detect malware. The method is to design an autoencoder to learn the functional characteristics of malware, and then to observe the reconstruction error of autoencoder to realize the classification and detection of malware and benign software. The proposed approach achieves 93% accuracy and comparatively better F1-score values while detecting malware and needs little training data when compared with traditional malware detection systems.
Dalal, Kushal Rashmikant.  2020.  Analysing the Role of Supervised and Unsupervised Machine Learning in IoT. 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC). :75–79.
To harness the value of data generated from IoT, there is a crucial requirement of new mechanisms. Machine learning (ML) is among the most suitable paradigms of computation which embeds strong intelligence within IoT devices. Various ML techniques are being widely utilised for improving network security in IoT. These techniques include reinforcement learning, semi-supervised learning, supervised learning, and unsupervised learning. This report aims to critically analyse the role played by supervised and unsupervised ML for the enhancement of IoT security.
2021-09-07
Atasever, Süreyya, Öz\c celık, İlker, Sa\u giro\u glu, \c Seref.  2020.  An Overview of Machine Learning Based Approaches in DDoS Detection. 2020 28th Signal Processing and Communications Applications Conference (SIU). :1–4.
Many detection approaches have been proposed to address growing threat of Distributed Denial of Service (DDoS) attacks on the Internet. The attack detection is the initial step in most of the mitigation systems. This study examined the methods used to detect DDoS attacks with the focus on learning based approaches. These approaches were compared based on their efficiency, operating load and scalability. Finally, it is discussed in details.
2021-06-24
Saletta, Martina, Ferretti, Claudio.  2020.  A Neural Embedding for Source Code: Security Analysis and CWE Lists. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :523—530.
In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses. By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses. Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code. Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.
2021-05-18
Zeng, Jingxiang, Nie, Xiaofan, Chen, Liwei, Li, Jinfeng, Du, Gewangzi, Shi, Gang.  2020.  An Efficient Vulnerability Extrapolation Using Similarity of Graph Kernel of PDGs. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1664–1671.
Discovering the potential vulnerabilities in software plays a crucial role in ensuring the security of computer system. This paper proposes a method that can assist security auditors with the analysis of source code. When security auditors identify new vulnerabilities, our method can be adopted to make a list of recommendations that may have the same vulnerabilities for the security auditors. Our method relies on graph representation to automatically extract the mode of PDG(program dependence graph, a structure composed of control dependence and data dependence). Besides, it can be applied to the vulnerability extrapolation scenario, thus reducing the amount of audit code. We worked on an open-source vulnerability test set called Juliet. According to the evaluation results, the clustering effect produced is satisfactory, so that the feature vectors extracted by the Graph2Vec model are applied to labeling and supervised learning indicators are adopted to assess the model for its ability to extract features. On a total of 12,000 small data sets, the training score of the model can reach up to 99.2%, and the test score can reach a maximum of 85.2%. Finally, the recommendation effect of our work is verified as satisfactory.
2021-05-13
Wenhui, Sun, Kejin, Wang, Aichun, Zhu.  2020.  The Development of Artificial Intelligence Technology And Its Application in Communication Security. 2020 International Conference on Computer Engineering and Application (ICCEA). :752—756.
Artificial intelligence has been widely used in industries such as smart manufacturing, medical care and home furnishings. Among them, the value of the application in communication security is very important. This paper makes a further exploration of the artificial intelligence technology and its application, and gives a detailed analysis of its development, standardization and the application.
2021-05-05
Hallaji, Ehsan, Razavi-Far, Roozbeh, Saif, Mehrdad.  2020.  Detection of Malicious SCADA Communications via Multi-Subspace Feature Selection. 2020 International Joint Conference on Neural Networks (IJCNN). :1—8.
Security maintenance of Supervisory Control and Data Acquisition (SCADA) systems has been a point of interest during recent years. Numerous research works have been dedicated to the design of intrusion detection systems for securing SCADA communications. Nevertheless, these data-driven techniques are usually dependant on the quality of the monitored data. In this work, we propose a novel feature selection approach, called MSFS, to tackle undesirable quality of data caused by feature redundancy. In contrast to most feature selection techniques, the proposed method models each class in a different subspace, where it is optimally discriminated. This has been accomplished by resorting to ensemble learning, which enables the usage of multiple feature sets in the same feature space. The proposed method is then utilized to perform intrusion detection in smaller subspaces, which brings about efficiency and accuracy. Moreover, a comparative study is performed on a number of advanced feature selection algorithms. Furthermore, a dataset obtained from the SCADA system of a gas pipeline is employed to enable a realistic simulation. The results indicate the proposed approach extensively improves the detection performance in terms of classification accuracy and standard deviation.
2021-03-04
Wang, Y., Wang, Z., Xie, Z., Zhao, N., Chen, J., Zhang, W., Sui, K., Pei, D..  2020.  Practical and White-Box Anomaly Detection through Unsupervised and Active Learning. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1—9.

To ensure quality of service and user experience, large Internet companies often monitor various Key Performance Indicators (KPIs) of their systems so that they can detect anomalies and identify failure in real time. However, due to a large number of various KPIs and the lack of high-quality labels, existing KPI anomaly detection approaches either perform well only on certain types of KPIs or consume excessive resources. Therefore, to realize generic and practical KPI anomaly detection in the real world, we propose a KPI anomaly detection framework named iRRCF-Active, which contains an unsupervised and white-box anomaly detector based on Robust Random Cut Forest (RRCF), and an active learning component. Specifically, we novelly propose an improved RRCF (iRRCF) algorithm to overcome the drawbacks of applying original RRCF in KPI anomaly detection. Besides, we also incorporate the idea of active learning to make our model benefit from high-quality labels given by experienced operators. We conduct extensive experiments on a large-scale public dataset and a private dataset collected from a large commercial bank. The experimental resulta demonstrate that iRRCF-Active performs better than existing traditional statistical methods, unsupervised learning methods and supervised learning methods. Besides, each component in iRRCF-Active has also been demonstrated to be effective and indispensable.

2021-02-23
Liao, D., Huang, S., Tan, Y., Bai, G..  2020.  Network Intrusion Detection Method Based on GAN Model. 2020 International Conference on Computer Communication and Network Security (CCNS). :153—156.

The existing network intrusion detection methods have less label samples in the training process, and the detection accuracy is not high. In order to solve this problem, this paper designs a network intrusion detection method based on the GAN model by using the adversarial idea contained in the GAN. The model enhances the original training set by continuously generating samples, which expanding the label sample set. In order to realize the multi-classification of samples, this paper transforms the previous binary classification model of the generated adversarial network into a supervised learning multi-classification model. The loss function of training is redefined, so that the corresponding training method and parameter setting are obtained. Under the same experimental conditions, several performance indicators are used to compare the detection ability of the proposed method, the original classification model and other models. The experimental results show that the method proposed in this paper is more stable, robust, accurate detection rate, has good generalization ability, and can effectively realize network intrusion detection.

2021-02-22
Haile, J., Havens, S..  2020.  Identifying Ubiquitious Third-Party Libraries in Compiled Executables Using Annotated and Translated Disassembled Code with Supervised Machine Learning. 2020 IEEE Security and Privacy Workshops (SPW). :157–162.
The size and complexity of the software ecosystem is a major challenge for vendors, asset owners and cybersecurity professionals who need to understand the security posture of these systems. Annotated and Translated Disassembled Code is a graph based datastore designed to organize firmware and software analysis data across builds, packages and systems, providing a highly scalable platform enabling automated binary software analysis tasks including corpora construction and storage for machine learning. This paper describes an approach for the identification of ubiquitous third-party libraries in firmware and software using Annotated and Translated Disassembled Code and supervised machine learning. Annotated and Translated Disassembled Code provide matched libraries, function names and addresses of previously unidentified code in software as it is being automatically analyzed. This data can be ingested by other software analysis tools to improve accuracy and save time. Defenders can add the identified libraries to their vulnerability searches and add effective detection and mitigation into their operating environment.
2020-12-11
Fan, M., Luo, X., Liu, J., Wang, M., Nong, C., Zheng, Q., Liu, T..  2019.  Graph Embedding Based Familial Analysis of Android Malware using Unsupervised Learning. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). :771—782.

The rapid growth of Android malware has posed severe security threats to smartphone users. On the basis of the familial trait of Android malware observed by previous work, the familial analysis is a promising way to help analysts better focus on the commonalities of malware samples within the same families, thus reducing the analytical workload and accelerating malware analysis. The majority of existing approaches rely on supervised learning and face three main challenges, i.e., low accuracy, low efficiency, and the lack of labeled dataset. To address these challenges, we first construct a fine-grained behavior model by abstracting the program semantics into a set of subgraphs. Then, we propose SRA, a novel feature that depicts the similarity relationships between the Structural Roles of sensitive API call nodes in subgraphs. An SRA is obtained based on graph embedding techniques and represented as a vector, thus we can effectively reduce the high complexity of graph matching. After that, instead of training a classifier with labeled samples, we construct malware link network based on SRAs and apply community detection algorithms on it to group the unlabeled samples into groups. We implement these ideas in a system called GefDroid that performs Graph embedding based familial analysis of AnDroid malware using unsupervised learning. Moreover, we conduct extensive experiments to evaluate GefDroid on three datasets with ground truth. The results show that GefDroid can achieve high agreements (0.707-0.883 in term of NMI) between the clustering results and the ground truth. Furthermore, GefDroid requires only linear run-time overhead and takes around 8.6s to analyze a sample on average, which is considerably faster than the previous work.

2020-11-02
Pan, C., Huang, J., Gong, J., Yuan, X..  2019.  Few-Shot Transfer Learning for Text Classification With Lightweight Word Embedding Based Models. IEEE Access. 7:53296–53304.
Many deep learning architectures have been employed to model the semantic compositionality for text sequences, requiring a huge amount of supervised data for parameters training, making it unfeasible in situations where numerous annotated samples are not available or even do not exist. Different from data-hungry deep models, lightweight word embedding-based models could represent text sequences in a plug-and-play way due to their parameter-free property. In this paper, a modified hierarchical pooling strategy over pre-trained word embeddings is proposed for text classification in a few-shot transfer learning way. The model leverages and transfers knowledge obtained from some source domains to recognize and classify the unseen text sequences with just a handful of support examples in the target problem domain. The extensive experiments on five datasets including both English and Chinese text demonstrate that the simple word embedding-based models (SWEMs) with parameter-free pooling operations are able to abstract and represent the semantic text. The proposed modified hierarchical pooling method exhibits significant classification performance in the few-shot transfer learning tasks compared with other alternative methods.
Sharma, Sachin, Ghanshala, Kamal Kumar, Mohan, Seshadri.  2018.  A Security System Using Deep Learning Approach for Internet of Vehicles (IoV). 2018 9th IEEE Annual Ubiquitous Computing, Electronics Mobile Communication Conference (UEMCON). :1—5.

The Internet of Vehicles (IoV) will connect not only mobile devices with vehicles, but it will also connect vehicles with each other, and with smart offices, buildings, homes, theaters, shopping malls, and cities. The IoV facilitates optimal and reliable communication services to connected vehicles in smart cities. The backbone of connected vehicles communication is the critical V2X infrastructures deployment. The spectrum utilization depends on the demand by the end users and the development of infrastructure that includes efficient automation techniques together with the Internet of Things (IoT). The infrastructure enables us to build smart environments for spectrum utilization, which we refer to as Smart Spectrum Utilization (SSU). This paper presents an integrated system consisting of SSU with IoV. However, the tasks of securing IoV and protecting it from cyber attacks present considerable challenges. This paper introduces an IoV security system using deep learning approach to develop secure applications and reliable services. Deep learning composed of unsupervised learning and supervised learning, could optimize the IoV security system. The deep learning methodology is applied to monitor security threats. Results from simulations show that the monitoring accuracy of the proposed security system is superior to that of the traditional system.

2020-09-04
Khan, Aasher, Rehman, Suriya, Khan, Muhammad U.S, Ali, Mazhar.  2019.  Synonym-based Attack to Confuse Machine Learning Classifiers Using Black-box Setting. 2019 4th International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST). :1—7.
Twitter being the most popular content sharing platform is giving rise to automated accounts called “bots”. Majority of the users on Twitter are bots. Various machine learning (ML) algorithms are designed to detect bots avoiding the vulnerability constraints of ML-based models. This paper contributes to exploit vulnerabilities of machine learning (ML) algorithms through black-box attack. An adversarial text sequence misclassifies the results of deep learning (DL) classifiers for bot detection. Literature shows that ML models are vulnerable to attacks. The aim of this paper is to compromise the accuracy of ML-based bot detection algorithms by replacing original words in tweets with their synonyms. Our results show 7.2% decrease in the accuracy for bot tweets, therefore classifying bot tweets as legitimate tweets.
2020-07-16
Ayub, Md. Ahsan, Smith, Steven, Siraj, Ambareen.  2019.  A Protocol Independent Approach in Network Covert Channel Detection. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :165—170.

Network covert channels are used in various cyberattacks, including disclosure of sensitive information and enabling stealth tunnels for botnet commands. With time and technology, covert channels are becoming more prevalent, complex, and difficult to detect. The current methods for detection are protocol and pattern specific. This requires the investment of significant time and resources into application of various techniques to catch the different types of covert channels. This paper reviews several patterns of network storage covert channels, describes generation of network traffic dataset with covert channels, and proposes a generic, protocol-independent approach for the detection of network storage covert channels using a supervised machine learning technique. The implementation of the proposed generic detection model can lead to a reduction of necessary techniques to prevent covert channel communication in network traffic. The datasets we have generated for experimentation represent storage covert channels in the IP, TCP, and DNS protocols and are available upon request for future research in this area.

2020-07-13
Agrawal, Shriyansh, Sanagavarapu, Lalit Mohan, Reddy, YR.  2019.  FACT - Fine grained Assessment of web page CredibiliTy. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON). :1088–1097.
With more than a trillion web pages, there is a plethora of content available for consumption. Search Engine queries invariably lead to overwhelming information, parts of it relevant and some others irrelevant. Often the information provided can be conflicting, ambiguous, and inconsistent contributing to the loss of credibility of the content. In the past, researchers have proposed approaches for credibility assessment and enumerated factors influencing the credibility of web pages. In this work, we detailed a WEBCred framework for automated genre-aware credibility assessment of web pages. We developed a tool based on the proposed framework to extract web page features instances and identify genre a web page belongs to while assessing it's Genre Credibility Score ( GCS). We validated our approach on `Information Security' dataset of 8,550 URLs with 171 features across 7 genres. The supervised learning algorithm, Gradient Boosted Decision Tree classified genres with 88.75% testing accuracy over 10 fold cross-validation, an improvement over the current benchmark. We also examined our approach on `Health' domain web pages and had comparable results. The calculated GCS correlated 69% with crowdsourced Web Of Trust ( WOT) score and 13% with algorithm based Alexa ranking across 5 Information security groups. This variance in correlation states that our GCS approach aligns with human way ( WOT) as compared to algorithmic way (Alexa) of web assessment in both the experiments.
2020-07-03
Yan, Haonan, Li, Hui, Xiao, Mingchi, Dai, Rui, Zheng, Xianchun, Zhao, Xingwen, Li, Fenghua.  2019.  PGSM-DPI: Precisely Guided Signature Matching of Deep Packet Inspection for Traffic Analysis. 2019 IEEE Global Communications Conference (GLOBECOM). :1—6.

In the field of network traffic analysis, Deep Packet Inspection (DPI) technology is widely used at present. However, the increase in network traffic has brought tremendous processing pressure on the DPI. Consequently, detection speed has become the bottleneck of the entire application. In order to speed up the traffic detection of DPI, a lot of research works have been applied to improve signature matching algorithms, which is the most influential factor in DPI performance. In this paper, we present a novel method from a different angle called Precisely Guided Signature Matching (PGSM). Instead of matching packets with signature directly, we use supervised learning to automate the rules of specific protocol in PGSM. By testing the performance of a packet in the rules, the target packet could be decided when and which signatures should be matched with. Thus, the PGSM method reduces the number of aimless matches which are useless and numerous. After proposing PGSM, we build a framework called PGSM-DPI to verify the effectiveness of guidance rules. The PGSM-DPI framework consists of PGSM method and open source DPI library. The framework is running on a distributed platform with better throughput and computational performance. Finally, the experimental results demonstrate that our PGSM-DPI can reduce 59.23% original DPI time and increase 21.31% throughput. Besides, all source codes and experimental results can be accessed on our GitHub.

2020-06-12
Min, Congwen, Li, Yi, Fang, Li, Chen, Ping.  2019.  Conditional Generative Adversarial Network on Semi-supervised Learning Task. 2019 IEEE 5th International Conference on Computer and Communications (ICCC). :1448—1452.

Semi-supervised learning has recently gained increasingly attention because it can combine abundant unlabeled data with carefully labeled data to train deep neural networks. However, common semi-supervised methods deeply rely on the quality of pseudo labels. In this paper, we proposed a new semi-supervised learning method based on Generative Adversarial Network (GAN), by using discriminator to learn the feature of both labeled and unlabeled data, instead of generating pseudo labels that cannot all be correct. Our approach, semi-supervised conditional GAN (SCGAN), builds upon the conditional GAN model, extending it to semi-supervised learning by changing the discriminator's output to a classification output and a real or false output. We evaluate our approach with basic semi-supervised model on MNIST dataset. It shows that our approach achieves the classification accuracy with 84.15%, outperforming the basic semi-supervised model with 72.94%, when labeled data are 1/600 of all data.

2020-05-18
Nambiar, Sindhya K, Leons, Antony, Jose, Soniya, Arunsree.  2019.  Natural Language Processing Based Part of Speech Tagger using Hidden Markov Model. 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :782–785.
In various natural language processing applications, PART-OF-SPEECH (POS) tagging is performed as a preprocessing step. For making POS tagging accurate, various techniques have been explored. But in Indian languages, not much work has been done. This paper describes the methods to build a Part of speech tagger by using hidden markov model. Supervised learning approach is implemented in which, already tagged sentences in malayalam is used to build hidden markov model.
2020-05-15
Jeyasudha, J., Usha, G..  2018.  Detection of Spammers in the Reconnaissance Phase by machine learning techniques. 2018 3rd International Conference on Inventive Computation Technologies (ICICT). :216—220.

Reconnaissance phase is where attackers identify their targets and how to collect information from professional social networks which can be used to select and exploit targeted employees to penetrate in an organization. Here, a framework is proposed for the early detection of attackers in the reconnaissance phase, highlighting the common characteristic behavior among attackers in professional social networks. And to create artificial honeypot profiles within the organizational social network which can be used to detect a potential incoming threat. By analyzing the dataset of social Network profiles in combination of machine learning techniques, A DspamRPfast model is proposed for the creation of a classifier system to predict the probabilities of the profiles being fake or malicious and to filter them out using XGBoost and for the faster classification and greater accuracy of 84.8%.

2020-02-10
Dan, Kenya, Kitagawa, Naoya, Sakuraba, Shuji, Yamai, Nariyoshi.  2019.  Spam Domain Detection Method Using Active DNS Data and E-Mail Reception Log. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:896–899.

E-mail is widespread and an essential communication technology in modern times. Since e-mail has problems with spam mails and spoofed e-mails, countermeasures are required. Although SPF, DKIM and DMARC have been proposed as sender domain authentication, these mechanisms cannot detect non-spoofing spam mails. To overcome this issue, this paper proposes a method to detect spam domains by supervised learning with features extracted from e-mail reception log and active DNS data, such as the result of Sender Authentication, the Sender IP address, the number of each DNS record, and so on. As a result of the experiment, our method can detect spam domains with 88.09% accuracy and 97.11% precision. We confirmed that our method can detect spam domains with detection accuracy 19.40% higher than the previous study by utilizing not only active DNS data but also e-mail reception log in combination.

2020-01-28
Bernardi, Mario Luca, Cimitile, Marta, Martinelli, Fabio, Mercaldo, Francesco.  2019.  Keystroke Analysis for User Identification Using Deep Neural Networks. 2019 International Joint Conference on Neural Networks (IJCNN). :1–8.

The current authentication systems based on password and pin code are not enough to guarantee attacks from malicious users. For this reason, in the last years, several studies are proposed with the aim to identify the users basing on their typing dynamics. In this paper, we propose a deep neural network architecture aimed to discriminate between different users using a set of keystroke features. The idea behind the proposed method is to identify the users silently and continuously during their typing on a monitored system. To perform such user identification effectively, we propose a feature model able to capture the typing style that is specific to each given user. The proposed approach is evaluated on a large dataset derived by integrating two real-world datasets from existing studies. The merged dataset contains a total of 1530 different users each writing a set of different typing samples. Several deep neural networks, with an increasing number of hidden layers and two different sets of features, are tested with the aim to find the best configuration. The final best classifier scores a precision equal to 0.997, a recall equal to 0.99 and an accuracy equal to 99% using an MLP deep neural network with 9 hidden layers. Finally, the performances obtained by using the deep learning approach are also compared with the performance of traditional decision-trees machine learning algorithm, attesting the effectiveness of the deep learning-based classifiers in the domain of keystroke analysis.