Biblio

Found 100 results

Filters: Keyword is classification  [Clear All Filters]
2020-08-24
Sarma, Subramonian Krishna.  2019.  Optimized Activation Function on Deep Belief Network for Attack Detection in IoT. 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :702–708.
This paper mainly focuses on presenting a novel attack detection system to thread out the risk issues in IoT. The presented attack detection system links the interconnection of DevOps as it creates the correlation between development and IT operations. Further, the presented attack detection model ensures the operational security of different applications. In view of this, the implemented system incorporates two main stages named Proposed Feature Extraction process and Classification. The data from every application is processed with the initial stage of feature extraction, which concatenates the statistical and higher-order statistical features. After that, these extracted features are supplied to classification process, where determines the presence of attacks. For this classification purpose, this paper aims to deploy the optimized Deep Belief Network (DBN), where the activation function is tuned optimally. Furthermore, the optimal tuning is done by a renowned meta-heuristic algorithm called Lion Algorithm (LA). Finally, the performance of proposed work is compared and proved over other conventional methods.
2019-11-26
Patil, Srushti, Dhage, Sudhir.  2019.  A Methodical Overview on Phishing Detection along with an Organized Way to Construct an Anti-Phishing Framework. 2019 5th International Conference on Advanced Computing Communication Systems (ICACCS). :588-593.

Phishing is a security attack to acquire personal information like passwords, credit card details or other account details of a user by means of websites or emails. Phishing websites look similar to the legitimate ones which make it difficult for a layman to differentiate between them. As per the reports of Anti Phishing Working Group (APWG) published in December 2018, phishing against banking services and payment processor was high. Almost all the phishy URLs use HTTPS and use redirects to avoid getting detected. This paper presents a focused literature survey of methods available to detect phishing websites. A comparative study of the in-use anti-phishing tools was accomplished and their limitations were acknowledged. We analyzed the URL-based features used in the past to improve their definitions as per the current scenario which is our major contribution. Also, a step wise procedure of designing an anti-phishing model is discussed to construct an efficient framework which adds to our contribution. Observations made out of this study are stated along with recommendations on existing systems.

2020-02-10
Shyry, S. Prayla, Charan K, Venkat Sai, Kumar, V. Sudheer.  2019.  Spam Mail Detection and Prevention at Server Side. 2019 Innovations in Power and Advanced Computing Technologies (i-PACT). 1:1–6.

Spam is a genuine and irritating issue for quite a longtime. Despite the fact that a lot of arrangements have been advanced, there still remains a considerable measure to be advanced in separating spam messages all the more proficiently. These days a noteworthy issue in spam separating also as content characterization in common dialect handling is the colossal size of vector space because of the various element terms, which is normally the reason for broad figuring and moderate order. Extracting semantic implications from the substance of writings and utilizing these as highlight terms to develop the vector space, rather than utilizing words as highlight terms in convention ways, could decrease the component of vectors viably and advance the characterization in the meantime. In spite of the fact that there are a wide range of techniques to square spam messages, a large portion of program designers just mean to square spam messages from being conveyed to their customers. In this paper, we present an effective way to deal with keep spam messages from being exchanged.In this work, a Collaborative filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content.

2020-01-13
Verma, Abhishek, Ranga, Virender.  2019.  ELNIDS: Ensemble Learning based Network Intrusion Detection System for RPL based Internet of Things. 2019 4th International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU). :1–6.
Internet of Things is realized by a large number of heterogeneous smart devices which sense, collect and share data with each other over the internet in order to control the physical world. Due to open nature, global connectivity and resource constrained nature of smart devices and wireless networks the Internet of Things is susceptible to various routing attacks. In this paper, we purpose an architecture of Ensemble Learning based Network Intrusion Detection System named ELNIDS for detecting routing attacks against IPv6 Routing Protocol for Low-Power and Lossy Networks. We implement four different ensemble based machine learning classifiers including Boosted Trees, Bagged Trees, Subspace Discriminant and RUSBoosted Trees. To evaluate proposed intrusion detection model we have used RPL-NIDDS17 dataset which contains packet traces of Sinkhole, Blackhole, Sybil, Clone ID, Selective Forwarding, Hello Flooding and Local Repair attacks. Simulation results show the effectiveness of the proposed architecture. We observe that ensemble of Boosted Trees achieve the highest Accuracy of 94.5% while Subspace Discriminant method achieves the lowest Accuracy of 77.8 % among classifier validation methods. Similarly, an ensemble of RUSBoosted Trees achieves the highest Area under ROC value of 0.98 while lowest Area under ROC value of 0.87 is achieved by an ensemble of Subspace Discriminant among all classifier validation methods. All the implemented classifiers show acceptable performance results.
2020-05-22
Horzyk, Adrian, Starzyk, Janusz A..  2019.  Associative Data Model in Search for Nearest Neighbors and Similar Patterns. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :933—940.
This paper introduces a biologically inspired associative data model and structure for finding nearest neighbors and similar patterns. The method can be used as an alternative to the classical approaches to accelerate the search for such patterns using various priorities for attributes according to the Sebestyen measure. The presented structure, together with algorithms developed in this paper can be useful in various computational intelligence tasks like pattern matching, recognition, clustering, classification, multi-criterion search etc. This approach is particularly useful for the on-line operation of associative neural network graphs. Graphs that dynamically develop their structure during learning on training data. The results of experiments show that the associative approach can substantially accelerate the nearest neighbor search and that associative structures can also be used as a model for KNN tasks. Finally, this paper presents how the associative structures can be used to self-organize data and represent knowledge about them in the associative way, which yields new search approaches described in this paper.
2020-05-08
Chaudhary, Anshika, Mittal, Himangi, Arora, Anuja.  2019.  Anomaly Detection using Graph Neural Networks. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon). :346—350.

Conventional methods for anomaly detection include techniques based on clustering, proximity or classification. With the rapidly growing social networks, outliers or anomalies find ingenious ways to obscure themselves in the network and making the conventional techniques inefficient. In this paper, we utilize the ability of Deep Learning over topological characteristics of a social network to detect anomalies in email network and twitter network. We present a model, Graph Neural Network, which is applied on social connection graphs to detect anomalies. The combinations of various social network statistical measures are taken into account to study the graph structure and functioning of the anomalous nodes by employing deep neural networks on it. The hidden layer of the neural network plays an important role in finding the impact of statistical measure combination in anomaly detection.

2020-06-22
Adesuyi, Tosin A., Kim, Byeong Man.  2019.  Preserving Privacy in Convolutional Neural Network: An ∊-tuple Differential Privacy Approach. 2019 IEEE 2nd International Conference on Knowledge Innovation and Invention (ICKII). :570–573.
Recent breakthrough in neural network has led to the birth of Convolutional neural network (CNN) which has been found to be very efficient especially in the areas of image recognition and classification. This success is traceable to the availability of large datasets and its capability to learn salient and complex data features which subsequently produce a reusable output model (Fθ). The Fθ are often made available (e.g. on cloud as-a-service) for others (client) to train their data or do transfer learning, however, an adversary can perpetrate a model inversion attack on the model Fθ to recover training data, hence compromising the sensitivity of the model buildup data. This is possible because CNN as a variant of deep neural network does memorize most of its training data during learning. Consequently, this has pose a privacy concern especially when a medical or financial data are used as model buildup data. Existing researches that proffers privacy preserving approach however suffer from significant accuracy degradation and this has left privacy preserving model on a theoretical desk. In this paper, we proposed an ϵ-tuple differential privacy approach that is based on neuron impact factor estimation to preserve privacy of CNN model without significant accuracy degradation. We experiment our approach on two large datasets and the result shows no significant accuracy degradation.
2020-08-24
Liang, Dai, Pan, Peisheng.  2019.  Research on Intrusion Detection Based on Improved DBN-ELM. 2019 International Conference on Communications, Information System and Computer Engineering (CISCE). :495–499.
To leverage the feature extraction of DBN and the fast classification and good generalization of ELM, an improved method of DBN-ELM is proposed for intrusion detection. The improved model uses deep belief network (DBN) to train NSL-KDD dataset and feed them back to the extreme learning machine (ELM) for classification. A classifier is connected at each intermediate level of the DBN-ELM. By majority voting on the output of classifier and ELM, the final output is calculated by integration. Experiments show that the improved model increases the classification confidence and accuracy of the classifier. The model has been benchmarked on the NSL-KDD dataset, and the accuracy of the model has been improved to 97.82%, while the false alarm rate has been reduced to 1.81%. Proposed improved model has been also compared with DBN, ELM, DBN-ELM and achieves competitive accuracy.
2020-08-28
Perry, Lior, Shapira, Bracha, Puzis, Rami.  2019.  NO-DOUBT: Attack Attribution Based On Threat Intelligence Reports. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :80—85.

The task of attack attribution, i.e., identifying the entity responsible for an attack, is complicated and usually requires the involvement of an experienced security expert. Prior attempts to automate attack attribution apply various machine learning techniques on features extracted from the malware's code and behavior in order to identify other similar malware whose authors are known. However, the same malware can be reused by multiple actors, and the actor who performed an attack using a malware might differ from the malware's author. Moreover, information collected during an incident may contain many clues about the identity of the attacker in addition to the malware used. In this paper, we propose a method of attack attribution based on textual analysis of threat intelligence reports, using state of the art algorithms and models from the fields of machine learning and natural language processing (NLP). We have developed a new text representation algorithm which captures the context of the words and requires minimal feature engineering. Our approach relies on vector space representation of incident reports derived from a small collection of labeled reports and a large corpus of general security literature. Both datasets have been made available to the research community. Experimental results show that the proposed representation can attribute attacks more accurately than the baselines' representations. In addition, we show how the proposed approach can be used to identify novel previously unseen threat actors and identify similarities between known threat actors.

2020-01-02
Mar\'ın, Gonzalo, Casas, Pedro, Capdehourat, Germán.  2019.  Deep in the Dark - Deep Learning-Based Malware Traffic Detection Without Expert Knowledge. 2019 IEEE Security and Privacy Workshops (SPW). :36–42.

With the ever-growing occurrence of networking attacks, robust network security systems are essential to prevent and mitigate their harming effects. In recent years, machine learning-based systems have gain popularity for network security applications, usually considering the application of shallow models, where a set of expert handcrafted features are needed to pre-process the data before training. The main problem with this approach is that handcrafted features can fail to perform well given different kinds of scenarios and problems. Deep Learning models can solve this kind of issues using their ability to learn feature representations from input raw or basic, non-processed data. In this paper we explore the power of deep learning models on the specific problem of detection and classification of malware network traffic, using different representations for the input data. As a major advantage as compared to the state of the art, we consider raw measurements coming directly from the stream of monitored bytes as the input to the proposed models, and evaluate different raw-traffic feature representations, including packet and flow-level ones. Our results suggest that deep learning models can better capture the underlying statistics of malicious traffic as compared to classical, shallow-like models, even while operating in the dark, i.e., without any sort of expert handcrafted inputs.

2020-12-11
Slawinski, M., Wortman, A..  2019.  Applications of Graph Integration to Function Comparison and Malware Classification. 2019 4th International Conference on System Reliability and Safety (ICSRS). :16—24.

We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.

2020-03-30
Hu, Zhengbing, Vasiliu, Yevhen, Smirnov, Oleksii, Sydorenko, Viktoriia, Polishchuk, Yuliia.  2019.  Abstract Model of Eavesdropper and Overview on Attacks in Quantum Cryptography Systems. 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). 1:399–405.
In today's world, it's almost impossible to find a sphere of human life in which information technologies would not be used. On the one hand, it simplifies human life - virtually everyone carries a mini-computer in his pocket and it allows to perform many operations, that took a lot of time, in minutes. In addition, IT has simplified and promptly developed areas such as medicine, banking, document circulation, military, and many other infrastructures of the state. Nevertheless, even today, privacy remains a major problem in many information transactions. One of the most important directions for ensuring the information confidentiality in open communication networks has been and remains its protection by cryptographic methods. Although it is known that traditional cryptography methods give reasons to doubt in their reliability, quantum cryptography has proven itself as a more reliable information security technology. As far is it quite new direction there is no sufficiently complete classification of attacks on quantum cryptography methods, in view of this new extended classification of attacks on quantum protocols and quantum cryptosystems is proposed in this work. Classification takes into account the newest attacks (which use devices loopholes) on quantum key distribution equipment. These attacks have been named \textbackslashtextless; \textbackslashtextless; quantum hacking\textbackslashtextgreater\textbackslashtextgreater. Such classification may be useful for choosing commercially available quantum key distribution system. Also abstract model of eavesdropper in quantum systems was created and it allows to determine a set of various nature measures that need to be further implemented to provide reliable security with the help of specific quantum systems.
2020-05-11
Cui, Zhicheng, Zhang, Muhan, Chen, Yixin.  2018.  Deep Embedding Logistic Regression. 2018 IEEE International Conference on Big Knowledge (ICBK). :176–183.
Logistic regression (LR) is used in many areas due to its simplicity and interpretability. While at the same time, those two properties limit its classification accuracy. Deep neural networks (DNNs), instead, achieve state-of-the-art performance in many domains. However, the nonlinearity and complexity of DNNs make it less interpretable. To balance interpretability and classification performance, we propose a novel nonlinear model, Deep Embedding Logistic Regression (DELR), which augments LR with a nonlinear dimension-wise feature embedding. In DELR, each feature embedding is learned through a deep and narrow neural network and LR is attached to decide feature importance. A compact and yet powerful model, DELR offers great interpretability: it can tell the importance of each input feature, yield meaningful embedding of categorical features, and extract actionable changes, making it attractive for tasks such as market analysis and clinical prediction.
2019-03-15
Keshishzadeh, Sarineh, Fallah, Ali, Rashidi, Saeid.  2018.  Electroencephalogram Based Biometrics: A Fractional Fourier Transform Approach. Proceedings of the 2018 2Nd International Conference on Biometric Engineering and Applications. :1-5.
The non-stationary nature of the human Electroencephalogram (EEG) has caused problems in EEG based biometrics. Stationary signals analysis is done simply with Discrete Fourier Transform (DFT), while it is not possible to analyze non-stationary signals with DFT, as it does not have the ability to show the occurrence time of different frequency components. The Fractional Fourier Transform (FrFT), as a generalization of Fourier Transform (FT), has the ability to exhibit the variable frequency nature of non-stationary signals. In this paper, Discrete Fractional Fourier Transform (DFrFT) with different fractional orders is proposed as a novel feature extraction technique for EEG based human verification with different number of channels. The proposed method in its' best performance achieved 0.22% Equal Error Rate (EER) with three EEG channels of 104 subjects.
2019-03-04
Elbez, Ghada, Keller, Hubert B., Hagenmeyer, Veit.  2018.  A New Classification of Attacks Against the Cyber-Physical Security of Smart Grids. Proceedings of the 13th International Conference on Availability, Reliability and Security. :63:1–63:6.
Modern critical infrastructures such as Smart Grids (SGs) rely heavily on Information and Communication Technology (ICT) systems to monitor and control operations and states within large-scale facilities. The potential offered by SGs includes an effective integration of renewables, a demand-response action and a dynamic pricing system. The increasing use of ICT for the communication infrastructure of modern power systems offers advantages but can give rise to cyber attacks that compromise the security of the SG. To deal efficiently with the security concerns of SGs, a survey of the different attacks that consider the physical as well as the cyber characteristics of modern power grids is required. In the present paper, first the specific differences between SGs with respect to both Information Technology (IT) systems and conventional energy grids are discussed. Thereafter, the specific security requirements of SGs are presented in order to raise awareness of the new security challenges. Finally, a new classification of cyber attacks, based on the architecture of the SG, is proposed and details for each category are provided. The new classification is distinguished by its focus on the cyber-physical security of the SG in particular, which gives a comprehensive overview of the different threats. Thus, this new classification forms the necessary knowledge-basis for the design of respective countermeasures.
2019-02-18
Zhu, Mengeheng, Shi, Hong.  2018.  A Novel Support Vector Machine Algorithm for Missing Data. Proceedings of the 2Nd International Conference on Innovation in Artificial Intelligence. :48–53.
Missing data problem often occurs in data analysis. The most common way to solve this problem is imputation. But imputation methods are only suitable for dealing with a low proportion of missing data, when assuming that missing data satisfies MCAR (Missing Completely at Random) or MAR (Missing at Random). In this paper, considering the reasons for missing data, we propose a novel support vector machine method using a new kernel function to solve the problem with a relatively large proportion of missing data. This method makes full use of observed data to reduce the error caused by filling a large number of missing values. We validate our method on 4 data sets from UCI Repository of Machine Learning. The accuracy, F-score, Kappa statistics and recall are used to evaluate the performance. Experimental results show that our method achieve significant improvement in terms of classification results compared with common imputation methods, even when the proportion of missing data is high.
2019-06-24
Copty, Fady, Danos, Matan, Edelstein, Orit, Eisner, Cindy, Murik, Dov, Zeltser, Benjamin.  2018.  Accurate Malware Detection by Extreme Abstraction. Proceedings of the 34th Annual Computer Security Applications Conference. :101–111.

Modern malware applies a rich arsenal of evasion techniques to render dynamic analysis ineffective. In turn, dynamic analysis tools take great pains to hide themselves from malware; typically this entails trying to be as faithful as possible to the behavior of a real run. We present a novel approach to malware analysis that turns this idea on its head, using an extreme abstraction of the operating system that intentionally strays from real behavior. The key insight is that the presence of malicious behavior is sufficient evidence of malicious intent, even if the path taken is not one that could occur during a real run of the sample. By exploring multiple paths in a system that only approximates the behavior of a real system, we can discover behavior that would often be hard to elicit otherwise. We aggregate features from multiple paths and use a funnel-like configuration of machine learning classifiers to achieve high accuracy without incurring too much of a performance penalty. We describe our system, TAMALES (The Abstract Malware Analysis LEarning System), in detail and present machine learning results using a 330K sample set showing an FPR (False Positive Rate) of 0.10% with a TPR (True Positive Rate) of 99.11%, demonstrating that extreme abstraction can be extraordinarily effective in providing data that allows a classifier to accurately detect malware.

2019-07-01
Kebande, V. R., Kigwana, I., Venter, H. S., Karie, N. M., Wario, R. D..  2018.  CVSS Metric-Based Analysis, Classification and Assessment of Computer Network Threats and Vulnerabilities. 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD). :1–10.

This paper provides a Common Vulnerability Scoring System (CVSS) metric-based technique for classifying and analysing the prevailing Computer Network Security Vulnerabilities and Threats (CNSVT). The problem that is addressed in this paper, is that, at the time of writing this paper, there existed no effective approaches for analysing and classifying CNSVT for purposes of assessments based on CVSS metrics. The authors of this paper have achieved this by generating a CVSS metric-based dynamic Vulnerability Analysis Classification Countermeasure (VACC) criterion that is able to rank vulnerabilities. The CVSS metric-based VACC has allowed the computation of vulnerability Similarity Measure (VSM) using the Hamming and Euclidean distance metric functions. Nevertheless, the CVSS-metric based on VACC also enabled the random measuring of the VSM for a selected number of vulnerabilities based on the [Ma-Ma], [Ma-Mi], [Mi-Ci], [Ma-Ci] ranking score. This is a technique that is aimed at allowing security experts to be able to conduct proper vulnerability detection and assessments across computer-based networks based on the perceived occurrence by checking the probability that given threats will occur or not. The authors have also proposed high-level countermeasures of the vulnerabilities that have been listed. The authors have evaluated the CVSS-metric based VACC and the results are promising. Based on this technique, it is worth noting that these propositions can help in the development of stronger computer and network security tools.

2019-12-16
Malviya, Vikas, Rai, Sawan, Gupta, Atul.  2018.  Development of a Plugin Based Extensible Feature Extraction Framework. Proceedings of the 33rd Annual ACM Symposium on Applied Computing. :1840–1847.

An important ingredient for a successful recipe for solving machine learning problems is the availability of a suitable dataset. However, such a dataset may have to be extracted from a large unstructured and semi-structured data like programming code, scripts, and text. In this work, we propose a plug-in based, extensible feature extraction framework for which we have prototyped as a tool. The proposed framework is demonstrated by extracting features from two different sources of semi-structured and unstructured data. The semi-structured data comprised of web page and script based data whereas the other data was taken from email data for spam filtering. The usefulness of the tool was also assessed on the aspect of ease of programming.

2019-06-24
Naeem, H., Guo, B., Naeem, M. R..  2018.  A light-weight malware static visual analysis for IoT infrastructure. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD). :240–244.

Recently a huge trend on the internet of things (IoT) and an exponential increase in automated tools are helping malware producers to target IoT devices. The traditional security solutions against malware are infeasible due to low computing power for large-scale data in IoT environment. The number of malware and their variants are increasing due to continuous malware attacks. Consequently, the performance improvement in malware analysis is critical requirement to stop rapid expansion of malicious attacks in IoT environment. To solve this problem, the paper proposed a novel framework for classifying malware in IoT environment. To achieve flne-grained malware classification in suggested framework, the malware image classification system (MICS) is designed for representing malware image globally and locally. MICS first converts the suspicious program into the gray-scale image and then captures hybrid local and global malware features to perform malware family classification. Preliminary experimental outcomes of MICS are quite promising with 97.4% classification accuracy on 9342 windows suspicious programs of 25 families. The experimental results indicate that proposed framework is quite capable to process large-scale IoT malware.

2019-11-04
Altay, Osman, Ulas, Mustafa.  2018.  Location Determination by Processing Signal Strength of Wi-Fi Routers in the Indoor Environment with Linear Discriminant Classifier. 2018 6th International Symposium on Digital Forensic and Security (ISDFS). :1-4.

Location determination in the indoor areas as well as in open areas is important for many applications. But location determination in the indoor areas is a very difficult process compared to open areas. The Global Positioning System (GPS) signals used for position detection is not effective in the indoor areas. Wi-Fi signals are a widely used method for localization detection in the indoor area. In the indoor areas, localization can be used for many different purposes, such as intelligent home systems, locations of people, locations of products in the depot. In this study, it was tried to determine localization for with the classification method for 4 different areas by using Wi-Fi signal values obtained from different routers for indoor location determination. Linear discriminant analysis (LDA) classification was used for classification. In the test using 10k fold cross-validation, 97.2% accuracy value was calculated.

2019-06-10
Farooq, H. M., Otaibi, N. M..  2018.  Optimal Machine Learning Algorithms for Cyber Threat Detection. 2018 UKSim-AMSS 20th International Conference on Computer Modelling and Simulation (UKSim). :32-37.

With the exponential hike in cyber threats, organizations are now striving for better data mining techniques in order to analyze security logs received from their IT infrastructures to ensure effective and automated cyber threat detection. Machine Learning (ML) based analytics for security machine data is the next emerging trend in cyber security, aimed at mining security data to uncover advanced targeted cyber threats actors and minimizing the operational overheads of maintaining static correlation rules. However, selection of optimal machine learning algorithm for security log analytics still remains an impeding factor against the success of data science in cyber security due to the risk of large number of false-positive detections, especially in the case of large-scale or global Security Operations Center (SOC) environments. This fact brings a dire need for an efficient machine learning based cyber threat detection model, capable of minimizing the false detection rates. In this paper, we are proposing optimal machine learning algorithms with their implementation framework based on analytical and empirical evaluations of gathered results, while using various prediction, classification and forecasting algorithms.

2019-11-26
Hassanpour, Reza, Dogdu, Erdogan, Choupani, Roya, Goker, Onur, Nazli, Nazli.  2018.  Phishing E-Mail Detection by Using Deep Learning Algorithms. Proceedings of the ACMSE 2018 Conference. :45:1-45:1.

Phishing e-mails are considered as spam e-mails, which aim to collect sensitive personal information about the users via network. Since the main purpose of this behavior is mostly to harm users financially, it is vital to detect these phishing or spam e-mails immediately to prevent unauthorized access to users' vital information. To detect phishing e-mails, using a quicker and robust classification method is important. Considering the billions of e-mails on the Internet, this classification process is supposed to be done in a limited time to analyze the results. In this work, we present some of the early results on the classification of spam email using deep learning and machine methods. We utilize word2vec to represent emails instead of using the popular keyword or other rule-based methods. Vector representations are then fed into a neural network to create a learning model. We have tested our method on an open dataset and found over 96% accuracy levels with the deep learning classification methods in comparison to the standard machine learning algorithms.

2019-02-25
Ali, S. S., Maqsood, J..  2018.  .Net library for SMS spam detection using machine learning: A cross platform solution. 2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :470–476.

Short Message Service is now-days the most used way of communication in the electronic world. While many researches exist on the email spam detection, we haven't had the insight knowledge about the spam done within the SMS's. This might be because the frequency of spam in these short messages is quite low than the emails. This paper presents different ways of analyzing spam for SMS and a new pre-processing way to get the actual dataset of spam messages. This dataset was then used on different algorithm techniques to find the best working algorithm in terms of both accuracy and recall. Random Forest algorithm was then implemented in a real world application library written in C\# for cross platform .Net development. This library is capable of using a prebuild model for classifying a new dataset for spam and ham.

2019-03-04
Aborisade, O., Anwar, M..  2018.  Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :269–276.

At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.