Visible to the public Biblio

Found 1057 results

Filters: Keyword is machine learning  [Clear All Filters]
2018-01-23
Lu, Marisa, Bose, Gautam, Lee, Austin, Scupelli, Peter.  2017.  Knock Knock to Unlock: A Human-centered Novel Authentication Method for Secure System Fluidity. Proceedings of the Eleventh International Conference on Tangible, Embedded, and Embodied Interaction. :729–732.

When a person gets to a door and wants to get in, what do they do? They knock. In our system, the user's specific knock pattern authenticates their identity, and opens the door for them. The system empowers people's intuitive actions and responses to affect the world around them in a new way. We leverage IOT, and physical computing to make more technology feel like less. From there, the system of a knock based entrance creates affordances in social interaction for shared spaces wherein ownership fluidity and accessibility needs to be balanced with security

Kilgallon, S., Rosa, L. De La, Cavazos, J..  2017.  Improving the effectiveness and efficiency of dynamic malware analysis with machine learning. 2017 Resilience Week (RWS). :30–36.

As the malware threat landscape is constantly evolving and over one million new malware strains are being generated every day [1], early automatic detection of threats constitutes a top priority of cybersecurity research, and amplifies the need for more advanced detection and classification methods that are effective and efficient. In this paper, we present the application of machine learning algorithms to predict the length of time malware should be executed in a sandbox to reveal its malicious intent. We also introduce a novel hybrid approach to malware classification based on static binary analysis and dynamic analysis of malware. Static analysis extracts information from a binary file without executing it, and dynamic analysis captures the behavior of malware in a sandbox environment. Our experimental results show that by turning the aforementioned problems into machine learning problems, it is possible to get an accuracy of up to 90% on the prediction of the malware analysis run time and up to 92% on the classification of malware families.

Nagano, Yuta, Uda, Ryuya.  2017.  Static Analysis with Paragraph Vector for Malware Detection. Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication. :80:1–80:7.

Malware damages computers and the threat is a serious problem. Malware can be detected by pattern matching method or dynamic heuristic method. However, it is difficult to detect all new malware subspecies perfectly by existing methods. In this paper, we propose a new method which automatically detects new malware subspecies by static analysis of execution files and machine learning. The method can distinguish malware from benignware and it can also classify malware subspecies into malware families. We combine static analysis of execution files with machine learning classifier and natural language processing by machine learning. Information of DLL Import, assembly code and hexdump are acquired by static analysis of execution files of malware and benignware to create feature vectors. Paragraph vectors of information by static analysis of execution files are created by machine learning of PV-DBOW model for natural language processing. Support vector machine and classifier of k-nearest neighbor algorithm are used in our method, and the classifier learns paragraph vectors of information by static analysis. Unknown execution files are classified into malware or benignware by pre-learned SVM. Moreover, malware subspecies are also classified into malware families by pre-learned k-nearest. We evaluate the accuracy of the classification by experiments. We think that new malware subspecies can be effectively detected by our method without existing methods for malware analysis such as generic method and dynamic heuristic method.

Taubmann, Benjamin, Kolosnjaji, Bojan.  2017.  Architecture for Resource-Aware VMI-based Cloud Malware Analysis. Proceedings of the 4th Workshop on Security in Highly Connected IT Systems. :43–48.
Virtual machine introspection (VMI) is a technology with many possible applications, such as malware analysis and intrusion detection. However, this technique is resource intensive, as inspecting program behavior includes recording of a high number of events caused by the analyzed binary and related processes. In this paper we present an architecture that leverages cloud resources for virtual machine-based malware analysis in order to train a classifier for detecting cloud-specific malware. This architecture is designed while having in mind the resource consumption when applying the VMI-based technology in production systems, in particular the overhead of tracing a large set of system calls. In order to minimize the data acquisition overhead, we use a data-driven approach from the area of resource-aware machine learning. This approach enables us to optimize the trade-off between malware detection performance and the overhead of our VMI-based tracing system.
Saeed, S., Mahendran, N., Zulehner, A., Wille, R., Karri, R..  2017.  Identifying Reversible Circuit Synthesis Approaches to Enable IP Piracy Attacks. 2017 IEEE International Conference on Computer Design (ICCD). :537–540.

Reversible circuits are vulnerable to intellectual property and integrated circuit piracy. To show these vulnerabilities, a detailed understanding on how to identify the function embedded in a reversible circuit is crucial. To obtain the embedded function, one needs to know the synthesis approach used to generate the reversible circuit in the first place. We present a machine learning based scheme to identify the synthesis approach using telltale signs in the design.

2018-01-16
Hesamifard, Ehsan, Takabi, Hassan, Ghasemi, Mehdi, Jones, Catherine.  2017.  Privacy-preserving Machine Learning in Cloud. Proceedings of the 2017 on Cloud Computing Security Workshop. :39–43.

Machine learning algorithms based on deep neural networks (NN) have achieved remarkable results and are being extensively used in different domains. On the other hand, with increasing growth of cloud services, several Machine Learning as a Service (MLaaS) are offered where training and deploying machine learning models are performed on cloud providers' infrastructure. However, machine learning algorithms require access to raw data which is often privacy sensitive and can create potential security and privacy risks. To address this issue, we develop new techniques to provide solutions for applying deep neural network algorithms to the encrypted data. In this paper, we show that it is feasible and practical to train neural networks using encrypted data and to make encrypted predictions, and also return the predictions in an encrypted form. We demonstrate applicability of the proposed techniques and evaluate its performance. The empirical results show that it provides accurate privacy-preserving training and classification.

Rukavitsyn, A., Borisenko, K., Shorov, A..  2017.  Self-learning method for DDoS detection model in cloud computing. 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :544–547.

Cloud Computing has many significant benefits like the provision of computing resources and virtual networks on demand. However, there is the problem to assure the security of these networks against Distributed Denial-of-Service (DDoS) attack. Over the past few decades, the development of protection method based on data mining has attracted many researchers because of its effectiveness and practical significance. Most commonly these detection methods use prelearned models or models based on rules. Because of this the proposed DDoS detection methods often failure in dynamically changing cloud virtual networks. In this paper, we purposed self-learning method allows to adapt a detection model to network changes. This is minimized the false detection and reduce the possibility to mark legitimate users as malicious and vice versa. The developed method consists of two steps: collecting data about the network traffic by Netflow protocol and relearning the detection model with the new data. During the data collection we separate the traffic on legitimate and malicious. The separated traffic is labeled and sent to the relearning pool. The detection model is relearned by a data from the pool of current traffic. The experiment results show that proposed method could increase efficiency of DDoS detection systems is using data mining.

He, Z., Zhang, T., Lee, R. B..  2017.  Machine Learning Based DDoS Attack Detection from Source Side in Cloud. 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud). :114–120.

Denial of service (DOS) attacks are a serious threat to network security. These attacks are often sourced from virtual machines in the cloud, rather than from the attacker's own machine, to achieve anonymity and higher network bandwidth. Past research focused on analyzing traffic on the destination (victim's) side with predefined thresholds. These approaches have significant disadvantages. They are only passive defenses after the attack, they cannot use the outbound statistical features of attacks, and it is hard to trace back to the attacker with these approaches. In this paper, we propose a DOS attack detection system on the source side in the cloud, based on machine learning techniques. This system leverages statistical information from both the cloud server's hypervisor and the virtual machines, to prevent network packages from being sent out to the outside network. We evaluate nine machine learning algorithms and carefully compare their performance. Our experimental results show that more than 99.7% of four kinds of DOS attacks are successfully detected. Our approach does not degrade performance and can be easily extended to broader DOS attacks.

2018-01-10
Buber, E., Dırı, B., Sahingoz, O. K..  2017.  Detecting phishing attacks from URL by using NLP techniques. 2017 International Conference on Computer Science and Engineering (UBMK). :337–342.

Nowadays, cyber attacks affect many institutions and individuals, and they result in a serious financial loss for them. Phishing Attack is one of the most common types of cyber attacks which is aimed at exploiting people's weaknesses to obtain confidential information about them. This type of cyber attack threats almost all internet users and institutions. To reduce the financial loss caused by this type of attacks, there is a need for awareness of the users as well as applications with the ability to detect them. In the last quarter of 2016, Turkey appears to be second behind China with an impact rate of approximately 43% in the Phishing Attack Analysis report between 45 countries. In this study, firstly, the characteristics of this type of attack are explained, and then a machine learning based system is proposed to detect them. In the proposed system, some features were extracted by using Natural Language Processing (NLP) techniques. The system was implemented by examining URLs used in Phishing Attacks before opening them with using some extracted features. Many tests have been applied to the created system, and it is seen that the best algorithm among the tested ones is the Random Forest algorithm with a success rate of 89.9%.

Bhattacharjee, S. Das, Talukder, A., Al-Shaer, E., Doshi, P..  2017.  Prioritized active learning for malicious URL detection using weighted text-based features. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :107–112.

Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.

2017-12-28
Stuckman, J., Walden, J., Scandariato, R..  2017.  The Effect of Dimensionality Reduction on Software Vulnerability Prediction Models. IEEE Transactions on Reliability. 66:17–37.

Statistical prediction models can be an effective technique to identify vulnerable components in large software projects. Two aspects of vulnerability prediction models have a profound impact on their performance: 1) the features (i.e., the characteristics of the software) that are used as predictors and 2) the way those features are used in the setup of the statistical learning machinery. In a previous work, we compared models based on two different types of features: software metrics and term frequencies (text mining features). In this paper, we broaden the set of models we compare by investigating an array of techniques for the manipulation of said features. These techniques fall under the umbrella of dimensionality reduction and have the potential to improve the ability of a prediction model to localize vulnerabilities. We explore the role of dimensionality reduction through a series of cross-validation and cross-project prediction experiments. Our results show that in the case of software metrics, a dimensionality reduction technique based on confirmatory factor analysis provided an advantage when performing cross-project prediction, yielding the best F-measure for the predictions in five out of six cases. In the case of text mining, feature selection can make the prediction computationally faster, but no dimensionality reduction technique provided any other notable advantage.

Gangadhar, S., Sterbenz, J. P. G..  2017.  Machine learning aided traffic tolerance to improve resilience for software defined networks. 2017 9th International Workshop on Resilient Networks Design and Modeling (RNDM). :1–7.

Software Defined Networks (SDNs) have gained prominence recently due to their flexible management and superior configuration functionality of the underlying network. SDNs, with OpenFlow as their primary implementation, allow for the use of a centralised controller to drive the decision making for all the supported devices in the network and manage traffic through routing table changes for incoming flows. In conventional networks, machine learning has been shown to detect malicious intrusion, and classify attacks such as DoS, user to root, and probe attacks. In this work, we extend the use of machine learning to improve traffic tolerance for SDNs. To achieve this, we extend the functionality of the controller to include a resilience framework, ReSDN, that incorporates machine learning to be able to distinguish DoS attacks, focussing on a neptune attack for our experiments. Our model is trained using the MIT KDD 1999 dataset. The system is developed as a module on top of the POX controller platform and evaluated using the Mininet simulator.

Vu, Q. H., Ruta, D., Cen, L..  2017.  An ensemble model with hierarchical decomposition and aggregation for highly scalable and robust classification. 2017 Federated Conference on Computer Science and Information Systems (FedCSIS). :149–152.

This paper introduces an ensemble model that solves the binary classification problem by incorporating the basic Logistic Regression with the two recent advanced paradigms: extreme gradient boosted decision trees (xgboost) and deep learning. To obtain the best result when integrating sub-models, we introduce a solution to split and select sets of features for the sub-model training. In addition to the ensemble model, we propose a flexible robust and highly scalable new scheme for building a composite classifier that tries to simultaneously implement multiple layers of model decomposition and outputs aggregation to maximally reduce both bias and variance (spread) components of classification errors. We demonstrate the power of our ensemble model to solve the problem of predicting the outcome of Hearthstone, a turn-based computer game, based on game state information. Excellent predictive performance of our model has been acknowledged by the second place scored in the final ranking among 188 competing teams.

2017-12-20
Heartfield, R., Loukas, G., Gan, D..  2017.  An eye for deception: A case study in utilizing the human-as-a-security-sensor paradigm to detect zero-day semantic social engineering attacks. 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA). :371–378.

In a number of information security scenarios, human beings can be better than technical security measures at detecting threats. This is particularly the case when a threat is based on deception of the user rather than exploitation of a specific technical flaw, as is the case of spear-phishing, application spoofing, multimedia masquerading and other semantic social engineering attacks. Here, we put the concept of the human-as-a-security-sensor to the test with a first case study on a small number of participants subjected to different attacks in a controlled laboratory environment and provided with a mechanism to report these attacks if they spot them. A key challenge is to estimate the reliability of each report, which we address with a machine learning approach. For comparison, we evaluate the ability of known technical security countermeasures in detecting the same threats. This initial proof of concept study shows that the concept is viable.

Abdelhamid, N., Thabtah, F., Abdel-jaber, H..  2017.  Phishing detection: A recent intelligent machine learning comparison based on models content and features. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :72–77.

In the last decade, numerous fake websites have been developed on the World Wide Web to mimic trusted websites, with the aim of stealing financial assets from users and organizations. This form of online attack is called phishing, and it has cost the online community and the various stakeholders hundreds of million Dollars. Therefore, effective counter measures that can accurately detect phishing are needed. Machine learning (ML) is a popular tool for data analysis and recently has shown promising results in combating phishing when contrasted with classic anti-phishing approaches, including awareness workshops, visualization and legal solutions. This article investigates ML techniques applicability to detect phishing attacks and describes their pros and cons. In particular, different types of ML techniques have been investigated to reveal the suitable options that can serve as anti-phishing tools. More importantly, we experimentally compare large numbers of ML techniques on real phishing datasets and with respect to different metrics. The purpose of the comparison is to reveal the advantages and disadvantages of ML predictive models and to show their actual performance when it comes to phishing attacks. The experimental results show that Covering approach models are more appropriate as anti-phishing solutions, especially for novice users, because of their simple yet effective knowledge bases in addition to their good phishing detection rate.

Shirazi, H., Haefner, K., Ray, I..  2017.  Fresh-Phish: A Framework for Auto-Detection of Phishing Websites. 2017 IEEE International Conference on Information Reuse and Integration (IRI). :137–143.

Summary form only given. Strong light-matter coupling has been recently successfully explored in the GHz and THz [1] range with on-chip platforms. New and intriguing quantum optical phenomena have been predicted in the ultrastrong coupling regime [2], when the coupling strength Ω becomes comparable to the unperturbed frequency of the system ω. We recently proposed a new experimental platform where we couple the inter-Landau level transition of an high-mobility 2DEG to the highly subwavelength photonic mode of an LC meta-atom [3] showing very large Ω/ωc = 0.87. Our system benefits from the collective enhancement of the light-matter coupling which comes from the scaling of the coupling Ω ∝ √n, were n is the number of optically active electrons. In our previous experiments [3] and in literature [4] this number varies from 104-103 electrons per meta-atom. We now engineer a new cavity, resonant at 290 GHz, with an extremely reduced effective mode surface Seff = 4 × 10-14 m2 (FE simulations, CST), yielding large field enhancements above 1500 and allowing to enter the few (\textbackslashtextless;100) electron regime. It consist of a complementary metasurface with two very sharp metallic tips separated by a 60 nm gap (Fig.1(a, b)) on top of a single triangular quantum well. THz-TDS transmission experiments as a function of the applied magnetic field reveal strong anticrossing of the cavity mode with linear cyclotron dispersion. Measurements for arrays of only 12 cavities are reported in Fig.1(c). On the top horizontal axis we report the number of electrons occupying the topmost Landau level as a function of the magnetic field. At the anticrossing field of B=0.73 T we measure approximately 60 electrons ultra strongly coupled (Ω/ω- \textbackslashtextbar\textbackslashtextbar

Azakami, T., Shibata, C., Uda, R..  2017.  Challenge to Impede Deep Learning against CAPTCHA with Ergonomic Design. 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). 1:637–642.

Once we had tried to propose an unbreakable CAPTCHA and we reached a result that limitation of time is effect to prevent computers from recognizing characters accurately while computers can finally recognize all text-based CAPTCHA in unlimited time. One of the existing usual ways to prevent computers from recognizing characters is distortion, and adding noise is also effective for the prevention. However, these kinds of prevention also make recognition of characters by human beings difficult. As a solution of the problems, an effective text-based CAPTCHA algorithm with amodal completion was proposed by our team. Our CAPTCHA causes computers a large amount of calculation costs while amodal completion helps human beings to recognize characters momentarily. Our CAPTCHA has evolved with aftereffects and combinations of complementary colors. We evaluated our CAPTCHA with deep learning which is attracting the most attention since deep learning is faster and more accurate than existing methods for recognition with computers. In this paper, we add jagged lines to edges of characters since edges are one of the most important parts for recognition in deep learning. In this paper, we also evaluate that how much the jagged lines decrease recognition of human beings and how much they prevent computers from the recognition. We confirm the effects of our method to deep learning.

Lee, W. H., Lee, R. B..  2017.  Implicit Smartphone User Authentication with Sensors and Contextual Machine Learning. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :297–308.

Authentication of smartphone users is important because a lot of sensitive data is stored in the smartphone and the smartphone is also used to access various cloud data and services. However, smartphones are easily stolen or co-opted by an attacker. Beyond the initial login, it is highly desirable to re-authenticate end-users who are continuing to access security-critical services and data. Hence, this paper proposes a novel authentication system for implicit, continuous authentication of the smartphone user based on behavioral characteristics, by leveraging the sensors already ubiquitously built into smartphones. We propose novel context-based authentication models to differentiate the legitimate smartphone owner versus other users. We systematically show how to achieve high authentication accuracy with different design alternatives in sensor and feature selection, machine learning techniques, context detection and multiple devices. Our system can achieve excellent authentication performance with 98.1% accuracy with negligible system overhead and less than 2.4% battery consumption.

2017-12-12
De La Peña Montero, Fabian, Hariri, Salim.  2017.  Autonomic and Integrated Management for Proactive Cyber Security (AIM-PSC). Companion Proceedings of the10th International Conference on Utility and Cloud Computing. :107–112.

The complexity, multiplicity, and impact of cyber-attacks have been increasing at an alarming rate despite the significant research and development investment in cyber security products and tools. The current techniques to detect and protect cyber infrastructures from these smart and sophisticated attacks are mainly characterized as being ad hoc, manual intensive, and too slow. We present in this paper AIM-PSC that is developed jointly by researchers at AVIRTEK and The University of Arizona Center for Cloud and Autonomic Computing that is inspired by biological systems, which can efficiently handle complexity, dynamism and uncertainty. In AIM-PSC system, an online monitoring and multi-level analysis are used to analyze the anomalous behaviors of networks, software systems and applications. By combining the results of different types of analysis using a statistical decision fusion approach we can accurately detect any types of cyber-attacks with high detection and low false alarm rates and proactively respond with corrective actions to mitigate their impacts and stop their propagation.

Nazir, S., Patel, S., Patel, D..  2017.  Autonomic computing meets SCADA security. 2017 IEEE 16th International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC). :498–502.

National assets such as transportation networks, large manufacturing, business and health facilities, power generation, and distribution networks are critical infrastructures. The cyber threats to these infrastructures have increasingly become more sophisticated, extensive and numerous. Cyber security conventional measures have proved useful in the past but increasing sophistication of attacks dictates the need for newer measures. The autonomic computing paradigm mimics the autonomic nervous system and is promising to meet the latest challenges in the cyber threat landscape. This paper provides a brief review of autonomic computing applications for SCADA systems and proposes architecture for cyber security.

Lee, S. Y., Chung, T. M..  2017.  A study on the fast system recovery: Selecting the number of surrogate nodes for fast recovery in industrial IoT environment. 2017 International Conference on Information and Communications (ICIC). :205–207.

This paper is based on the previous research that selects the proper surrogate nodes for fast recovery mechanism in industrial IoT (Internet of Things) Environment which uses a variety of sensors to collect the data and exchange the collected data in real-time for creating added value. We are going to suggest the way that how to decide the number of surrogate node automatically in different deployed industrial IoT Environment so that minimize the system recovery time when the central server likes IoT gateway is in failure. We are going to use the network simulator to measure the recovery time depending on the number of the selected surrogate nodes according to the sub-devices which are connected to the IoT gateway.

2017-12-04
Alejandre, F. V., Cortés, N. C., Anaya, E. A..  2017.  Feature selection to detect botnets using machine learning algorithms. 2017 International Conference on Electronics, Communications and Computers (CONIELECOMP). :1–7.

In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections of botnets at their phase of C&C, that maximizes the detection rate of these botnets. A Genetic Algorithm (GA) was used to select the set of features that gives the highest detection rate. We used the machine learning algorithm C4.5, this algorithm did the classification between connections belonging or not to a botnet. The datasets used in this paper were extracted from the repositories ISOT and ISCX. Some tests were done to get the best parameters in a GA and the algorithm C4.5. We also performed experiments in order to obtain the best set of features for each botnet analyzed (specific), and for each type of botnet (general) too. The results are shown at the end of the paper, in which a considerable reduction of features and a higher detection rate than the related work presented were obtained.

Hongyo, K., Kimura, T., Kudo, T., Inoue, Y., Hirata, K..  2017.  Modeling of countermeasure against self-evolving botnets. 2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW). :227–228.

Machine learning has been widely used and achieved considerable results in various research areas. On the other hand, machine learning becomes a big threat when malicious attackers make use it for the wrong purpose. As such a threat, self-evolving botnets have been considered in the past. The self-evolving botnets autonomously predict vulnerabilities by implementing machine learning with computing resources of zombie computers. Furthermore, they evolve based on the vulnerability, and thus have high infectivity. In this paper, we consider several models of Markov chains to counter the spreading of the self-evolving botnets. Through simulation experiments, this paper shows the behaviors of these models.

Costa, V. G. T. da, Barbon, S., Miani, R. S., Rodrigues, J. J. P. C., Zarpelão, B. B..  2017.  Detecting mobile botnets through machine learning and system calls analysis. 2017 IEEE International Conference on Communications (ICC). :1–6.

Botnets have been a serious threat to the Internet security. With the constant sophistication and the resilience of them, a new trend has emerged, shifting botnets from the traditional desktop to the mobile environment. As in the desktop domain, detecting mobile botnets is essential to minimize the threat that they impose. Along the diverse set of strategies applied to detect these botnets, the ones that show the best and most generalized results involve discovering patterns in their anomalous behavior. In the mobile botnet field, one way to detect these patterns is by analyzing the operation parameters of this kind of applications. In this paper, we present an anomaly-based and host-based approach to detect mobile botnets. The proposed approach uses machine learning algorithms to identify anomalous behaviors in statistical features extracted from system calls. Using a self-generated dataset containing 13 families of mobile botnets and legitimate applications, we were able to test the performance of our approach in a close-to-reality scenario. The proposed approach achieved great results, including low false positive rates and high true detection rates.

2017-11-27
Meng, Q., Shameng, Wen, Chao, Feng, Chaojing, Tang.  2016.  Predicting buffer overflow using semi-supervised learning. 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). :1959–1963.

As everyone knows vulnerability detection is a very difficult and time consuming work, so taking advantage of the unlabeled data sufficiently is needed and helpful. According the above reality, in this paper a method is proposed to predict buffer overflow based on semi-supervised learning. We first employ Antlr to extract AST from C/C++ source files, then according to the 22 buffer overflow attributes taxonomies, a 22-dimension vector is extracted from every function in AST, at last, the vector is leveraged to train a classifier to predict buffer overflow vulnerabilities. The experiment and evaluation indicate our method is correct and efficient.