Biblio
Filters: Keyword is machine learning [Clear All Filters]
Application of Machine Learning for Online Dynamic Security Assessment in Presence of System Variability and Additive Instrumentation Errors. 2019 North American Power Symposium (NAPS). :1—6.
.
2019. Large-scale blackouts that have occurred in the past few decades have necessitated the need to do extensive research in the field of grid security assessment. With the aid of synchrophasor technology, which uses phasor measurement unit (PMU) data, dynamic security assessment (DSA) can be performed online. However, existing applications of DSA are challenged by variability in system conditions and unaccounted for measurement errors. To overcome these challenges, this research develops a DSA scheme to provide security prediction in real-time for load profiles of different seasons in presence of realistic errors in the PMU measurements. The major contributions of this paper are: (1) develop a DSA scheme based on PMU data, (2) consider seasonal load profiles, (3) account for varying penetrations of renewable generation, and (4) compare the accuracy of different machine learning (ML) algorithms for DSA with and without erroneous measurements. The performance of this approach is tested on the IEEE-118 bus system. Comparative analysis of the accuracies of the ML algorithms under different operating scenarios highlights the importance of considering realistic errors and variability in system conditions while creating a DSA scheme.
Assertion Detection in Clinical Natural Language Processing: A Knowledge-Poor Machine Learning Approach. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :37–40.
.
2019. Natural language processing (NLP) have been recently used to extract clinical information from free text in Electronic Health Record (EHR). In clinical NLP one challenge is that the meaning of clinical entities is heavily affected by assertion modifiers such as negation, uncertain, hypothetical, experiencer and so on. Incorrect assertion assignment could cause inaccurate diagnosis of patients' condition or negatively influence following study like disease modeling. Thus, clinical NLP systems which can detect assertion status of given target medical findings (e.g. disease, symptom) in clinical context are highly demanded. Here in this work, we propose a deep-learning system based on word embedding, RNN and attention mechanism (more specifically: Attention-based Bidirectional Long Short-Term Memory networks) for assertion detection in clinical notes. Unlike previous state-of-art methods which require knowledge input or feature engineering, our system is a knowledge poor machine learning system and can be easily extended or transferred to other domains. The evaluation of our system on public benchmarking corpora demonstrates that a knowledge poor deep-learning system can also achieve high performance for detecting negation and assertions comparing to state-of-the-art systems.
Audio Based Drone Detection and Identification using Deep Learning. 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC). :459–464.
.
2019. In recent years, unmanned aerial vehicles (UAVs) have become increasingly accessible to the public due to their high availability with affordable prices while being equipped with better technology. However, this raises a great concern from both the cyber and physical security perspectives since UAVs can be utilized for malicious activities in order to exploit vulnerabilities by spying on private properties, critical areas or to carry dangerous objects such as explosives which makes them a great threat to the society. Drone identification is considered the first step in a multi-procedural process in securing physical infrastructure against this threat. In this paper, we present drone detection and identification methods using deep learning techniques such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Convolutional Recurrent Neural Network (CRNN). These algorithms will be utilized to exploit the unique acoustic fingerprints of the flying drones in order to detect and identify them. We propose a comparison between the performance of different neural networks based on our dataset which features audio recorded samples of drone activities. The major contribution of our work is to validate the usage of these methodologies of drone detection and identification in real life scenarios and to provide a robust comparison of the performance between different deep neural network algorithms for this application. In addition, we are releasing the dataset of drone audio clips for the research community for further analysis.
Autonomic Resource Management for Power, Performance, and Security in Cloud Environment. 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA). :1–4.
.
2019. High performance computing is widely used for large-scale simulations, designs and analysis of critical problems especially through the use of cloud computing systems nowadays because cloud computing provides ubiquitous, on-demand computing capabilities with large variety of hardware configurations including GPUs and FPGAs that are highly used for high performance computing. However, it is well known that inefficient management of such systems results in excessive power consumption affecting the budget, cooling challenges, as well as reducing reliability due to the overheating and hotspots. Furthermore, considering the latest trends in the attack scenarios and crypto-currency based intrusions, security has become a major problem for high performance computing. Therefore, to address both challenges, in this paper we present an autonomic management methodology for both security and power/performance. Our proposed approach first builds knowledge of the environment in terms of power consumption and the security tools' deployment. Next, it provisions virtual resources so that the power consumption can be reduced while maintaining the required performance and deploy the security tools based on the system behavior. Using this approach, we can utilize a wide range of secure resources efficiently in HPC system, cloud computing systems, servers, embedded systems, etc.
Big Data Security Frameworks Meet the Intelligent Transportation Systems Trust Challenges. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :807–813.
.
2019. Many technological cases exploiting data science have been realized in recent years; machine learning, Internet of Things, and stream data processing are examples of this trend. Other advanced applications have focused on capturing the value from streaming data of different objects of transport and traffic management in an Intelligent Transportation System (ITS). In this context, security control and trust level play a decisive role in the sustainable adoption of this trend. However, conceptual work integrating the security approaches of different disciplines into one coherent reference architecture is limited. The contribution of this paper is a reference architecture for ITS security (called SITS). In addition, a classification of Big Data technologies, products, and services to address the ITS trust challenges is presented. We also proposed a novel multi-tier ITS security framework for validating the usability of SITS with business intelligence development in the enterprise domain.
Classification of XSS Attacks by Machine Learning with Frequency of Appearance and Co-occurrence. 2019 53rd Annual Conference on Information Sciences and Systems (CISS). :1–6.
.
2019. Cross site scripting (XSS) attack is one of the attacks on the web. It brings session hijack with HTTP cookies, information collection with fake HTML input form and phishing with dummy sites. As a countermeasure of XSS attack, machine learning has attracted a lot of attention. There are existing researches in which SVM, Random Forest and SCW are used for the detection of the attack. However, in the researches, there are problems that the size of data set is too small or unbalanced, and that preprocessing method for vectorization of strings causes misclassification. The highest accuracy of the classification was 98% in existing researches. Therefore, in this paper, we improved the preprocessing method for vectorization by using word2vec to find the frequency of appearance and co-occurrence of the words in XSS attack scripts. Moreover, we also used a large data set to decrease the deviation of the data. Furthermore, we evaluated the classification results with two procedures. One is an inappropriate procedure which some researchers tend to select by mistake. The other is an appropriate procedure which can be applied to an attack detection filter in the real environment.
A Compositionality Assembled Model for Learning and Recognizing Emotion from Bodily Expression. 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM). :821–826.
.
2019. When we are express our internal status, such as emotions, the human body expression we use follows the compositionality principle. It is a theory in linguistic which proposes that the single components of the bodily presentation as well as the rules used to combine them are the major parts to finish this process. In this paper, such principle is applied to the process of expressing and recognizing emotional states through body expression, in which certain key features can be learned to represent certain primitives of the internal emotional state in the form of basic variables. This is done by a hierarchical recurrent neural learning framework (RNN) because of its nonlinear dynamic bifurcation, so that variables can be learned to represent different hierarchies. In addition, we applied some adaptive learning techniques in machine learning for the requirement of real-time emotion recognition, in which a stable representation can be maintained compared to previous work. The model is examined by comparing the PB values between the training and recognition phases. This hierarchical model shows the rationality of the compositionality hypothesis by the RNN learning and explains how key features can be used and combined in bodily expression to show the emotional state.
Creation of Adversarial Examples with Keeping High Visual Performance. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :52—56.
.
2019. The accuracy of the image classification by the convolutional neural network is exceeding the ability of human being and contributes to various fields. However, the improvement of the image recognition technology gives a great blow to security system with an image such as CAPTCHA. In particular, since the character string CAPTCHA has already added distortion and noise in order not to be read by the computer, it becomes a problem that the human readability is lowered. Adversarial examples is a technique to produce an image letting an image classification by the machine learning be wrong intentionally. The best feature of this technique is that when human beings compare the original image with the adversarial examples, they cannot understand the difference on appearance. However, Adversarial examples that is created with conventional FGSM cannot completely misclassify strong nonlinear networks like CNN. Osadchy et al. have researched to apply this adversarial examples to CAPTCHA and attempted to let CNN misclassify them. However, they could not let CNN misclassify character images. In this research, we propose a method to apply FGSM to the character string CAPTCHAs and to let CNN misclassified them.
DDoS Intrusion Detection Through Machine Learning Ensemble. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). :471–477.
.
2019. Distributed Denial of Service (DDoS) attacks have been the prominent attacks over the last decade. A Network Intrusion Detection System (NIDS) should seamlessly configure to fight against these attackers' new approaches and patterns of DDoS attack. In this paper, we propose a NIDS which can detect existing as well as new types of DDoS attacks. The key feature of our NIDS is that it combines different classifiers using ensemble models, with the idea that each classifier can target specific aspects/types of intrusions, and in doing so provides a more robust defense mechanism against new intrusions. Further, we perform a detailed analysis of DDoS attacks, and based on this domain-knowledge verify the reduced feature set [27, 28] to significantly improve accuracy. We experiment with and analyze NSL-KDD dataset with reduced feature set and our proposed NIDS can detect 99.1% of DDoS attacks successfully. We compare our results with other existing approaches. Our NIDS approach has the learning capability to keep up with new and emerging DDoS attack patterns.
A deep learning approach to trespassing detection using video surveillance data. 2019 IEEE International Conference on Big Data (Big Data). :3535—3544.
.
2019. Railroad trespassing is a dangerous activity with significant security and safety risks. However, regular patrolling of potential trespassing sites is infeasible due to exceedingly high resource demands and personnel costs. This raises the need to design automated trespass detection and early warning prediction techniques leveraging state-of-the-art machine learning. To meet this need, we propose a novel framework for Automated Railroad Trespassing detection System using video surveillance data called ARTS. As the core of our solution, we adopt a CNN-based deep learning architecture capable of video processing. However, these deep learning-based methods, while effective, are known to be computationally expensive and time consuming, especially when applied to a large volume of surveillance data. Leveraging the sparsity of railroad trespassing activity, ARTS corresponds to a dual-stage deep learning architecture composed of an inexpensive pre-filtering stage for activity detection, followed by a high fidelity trespass classification stage employing deep neural network. The resulting dual-stage ARTS architecture represents a flexible solution capable of trading-off accuracy with computational time. We demonstrate the efficacy of our approach on public domain surveillance data achieving 0.87 f1 score while keeping up with the enormous video volume, achieving a practical time and accuracy trade-off.
Deepfake: A Survey on Facial Forgery Technique Using Generative Adversarial Network. 2019 International Conference on Intelligent Computing and Control Systems (ICCS). :852—857.
.
2019. "Deepfake" it is an incipiently emerging face video forgery technique predicated on AI technology which is used for creating the fake video. It takes images and video as source and it coalesces these to make a new video using the generative adversarial network and the output is very convincing. This technique is utilized for generating the unauthentic spurious video and it is capable of making it possible to generate an unauthentic spurious video of authentic people verbally expressing and doing things that they never did by swapping the face of the person in the video. Deepfake can create disputes in countries by influencing their election process by defaming the character of the politician. This technique is now being used for character defamation of celebrities and high-profile politician just by swapping the face with someone else. If it is utilized in unethical ways, this could lead to a serious problem. Someone can use this technique for taking revenge from the person by swapping face in video and then posting it to a social media platform. In this paper, working of Deepfake technique along with how it can swap faces with maximum precision in the video has been presented. Further explained are the different ways through which we can identify if the video is generated by Deepfake and its advantages and drawback have been listed.
Detecting Android Security Vulnerabilities Using Machine Learning and System Calls Analysis. 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C). :109–113.
.
2019. Android operating systems have become a prime target for cyber attackers due to security vulnerabilities in the underlying operating system and application design. Recently, anomaly detection techniques are widely studied for security vulnerabilities detection and classification. However, the ability of the attackers to create new variants of existing malware using various masking techniques makes it harder to deploy these techniques effectively. In this research, we present a robust and effective vulnerabilities detection approach based on anomaly detection in a system calls of benign and malicious Android application. The anomaly in our study is type, frequency, and sequence of system calls that represent a vulnerability. Our system monitors the processes of benign and malicious application and detects security vulnerabilities based on the combination of parameters and metrics, i.e., type, frequency and sequence of system calls to classify the process behavior as benign or malign. The detection algorithm detects the anomaly based on the defined scoring function f and threshold ρ. The system refines the detection process by applying machine learning techniques to find a combination of system call metrics and explore the relationship between security bugs and the pattern of system calls detected. The experiment results show the detection rate of the proposed algorithm based on precision, recall, and f-score for different machine learning algorithms.
Detecting Exploit Websites Using Browser-based Predictive Analytics. 2019 17th International Conference on Privacy, Security and Trust (PST). :1—3.
.
2019. The popularity of Web-based computing has given increase to browser-based cyberattacks. These cyberattacks use websites that exploit various web browser vulnerabilities. To help regular users avoid exploit websites and engage in safe online activities, we propose a methodology of building a machine learning-powered predictive analytical model that will measure the risk of attacks and privacy breaches associated with visiting different websites and performing online activities using web browsers. The model will learn risk levels from historical data and metadata scraped from web browsers.
Detecting SQL Injection Attacks Using Grammar Pattern Recognition and Access Behavior Mining. 2019 IEEE International Conference on Energy Internet (ICEI). :493–498.
.
2019. SQL injection attacks are a kind of the greatest security risks on Web applications. Much research has been done to detect SQL injection attacks by rule matching and syntax tree. However, due to the complexity and variety of SQL injection vulnerabilities, these approaches fail to detect unknown and variable SQL injection attacks. In this paper, we propose a model, ATTAR, to detect SQL injection attacks using grammar pattern recognition and access behavior mining. The most important idea of our model is to extract and analyze features of SQL injection attacks in Web access logs. To achieve this goal, we first extract and customize Web access log fields from Web applications. Then we design a grammar pattern recognizer and an access behavior miner to obtain the grammatical and behavioral features of SQL injection attacks, respectively. Finally, based on two feature sets, machine learning algorithms, e.g., Naive Bayesian, SVM, ID3, Random Forest, and K-means, are used to train and detect our model. We evaluated our model on these two feature sets, and the results show that the proposed model can effectively detect SQL injection attacks with lower false negative rate and false positive rate. In addition, comparing the accuracy of our model based on different algorithms, ID3 and Random Forest have a better ability to detect various kinds of SQL injection attacks.
Efficiently Stealing your Machine Learning Models. Proceedings of the 18th ACM Workshop on Privacy in the Electronic Society. :198–210.
.
2019. Machine Learning as a Service (MLaaS) is a growing paradigm in the Machine Learning (ML) landscape. More and more ML models are being uploaded to the cloud and made accessible from all over the world. Creating good ML models, however, can be expensive and the used data is often sensitive. Recently, Secure Multi-Party Computation (SMPC) protocols for MLaaS have been proposed, which protect sensitive user data and ML models at the expense of substantially higher computation and communication than plaintext evaluation. In this paper, we show that for a subset of ML models used in MLaaS, namely Support Vector Machines (SVMs) and Support Vector Regression Machines (SVRs) which have found many applications to classifying multimedia data such as texts and images, it is possible for adversaries to passively extract the private models even if they are protected by SMPC, using known and newly devised model extraction attacks. We show that our attacks are not only theoretically possible but also practically feasible and cheap, which makes them lucrative to financially motivated attackers such as competitors or customers. We perform model extraction attacks on the homomorphic encryption-based protocol for privacy-preserving SVR-based indoor localization by Zhang et al. (International Workshop on Security 2016). We show that it is possible to extract a highly accurate model using only 854 queries with the estimated cost of \$0.09 on the Amazon ML platform, and our attack would take only 7 minutes over the Internet. Also, we perform our model extraction attacks on SVM and SVR models trained on publicly available state-of-the-art ML datasets.
An Ensemble Approach for Suspicious Traffic Detection from High Recall Network Alerts. {2019 IEEE International Conference on Big Data (Big Data. :4299—4308}}@inproceedings{wu_ensemble_2019.
.
2019. Web services from large-scale systems are prevalent all over the world. However, these systems are naturally vulnerable and incline to be intruded by adversaries for illegal benefits. To detect anomalous events, previous works focus on inspecting raw system logs by identifying the outliers in workflows or relying on machine learning methods. Though those works successfully identify the anomalies, their models use large training set and process whole system logs. To reduce the quantity of logs that need to be processed, high recall suspicious network alert systems can be applied to preprocess system logs. Only the logs that trigger alerts are retrieved for further usage. Due to the universally usage of network traffic alerts among Security Operations Center, anomalies detection problems could be transformed to classify truly suspicious network traffic alerts from false alerts.In this work, we propose an ensemble model to distinguish truly suspicious alerts from false alerts. Our model consists of two sub-models with different feature extraction strategies to ensure the diversity and generalization. We use decision tree based boosters and deep neural networks to build ensemble models for classification. Finally, we evaluate our approach on suspicious network alerts dataset provided by 2019 IEEE BigData Cup: Suspicious Network Event Recognition. Under the metric of AUC scores, our model achieves 0.9068 on the whole testing set.
A Feasibility Study on Machine Learning Techniques for APT Detection and Protection in VANETs. 2019 IEEE 12th International Conference on Global Security, Safety and Sustainability (ICGS3). :212—212.
.
2019. It is estimated that by 2030, 1 in 4 vehicles on the road will be driverless with adoption rates increasing this figure substantially over the next few decades.
Federated Learning with Bayesian Differential Privacy. 2019 IEEE International Conference on Big Data (Big Data). :2587–2596.
.
2019. We consider the problem of reinforcing federated learning with formal privacy guarantees. We propose to employ Bayesian differential privacy, a relaxation of differential privacy for similarly distributed data, to provide sharper privacy loss bounds. We adapt the Bayesian privacy accounting method to the federated setting and suggest multiple improvements for more efficient privacy budgeting at different levels. Our experiments show significant advantage over the state-of-the-art differential privacy bounds for federated learning on image classification tasks, including a medical application, bringing the privacy budget below ε = 1 at the client level, and below ε = 0.1 at the instance level. Lower amounts of noise also benefit the model accuracy and reduce the number of communication rounds.
Fidelity: Towards Measuring the Trustworthiness of Neural Network Classification. 2019 IEEE Conference on Dependable and Secure Computing (DSC). :1–8.
.
2019. With the increasing performance of neural networks on many security-critical tasks, the security concerns of machine learning have become increasingly prominent. Recent studies have shown that neural networks are vulnerable to adversarial examples: carefully crafted inputs with negligible perturbations on legitimate samples could mislead a neural network to produce adversary-selected outputs while humans can still correctly classify them. Therefore, we need an additional measurement on the trustworthiness of the results of a machine learning model, especially in adversarial settings. In this paper, we analyse the root cause of adversarial examples, and propose a new property, namely fidelity, of machine learning models to describe the gap between what a model learns and the ground truth learned by humans. One of its benefits is detecting adversarial attacks. We formally define fidelity, and propose a novel approach to quantify it. We evaluate the quantification of fidelity in adversarial settings on two neural networks. The study shows that involving the fidelity enables a neural network system to detect adversarial examples with true positive rate 97.7%, and false positive rate 1.67% on a studied neural network.
Generating Real Time Cyber Situational Awareness Information Through Social Media Data Mining. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2:502–507.
.
2019. With the rise of the internet many new data sources have emerged that can be used to help us gain insights into the cyber threat landscape and can allow us to better prepare for cyber attacks before they happen. With this in mind, we present an end to end real time cyber situational awareness system which aims to efficiently retrieve security relevant information from the social networking site Twitter.com. This system classifies and aggregates the data retrieved and provides real time cyber situational awareness information based on sentiment analysis and data analytics techniques. This research will assist security analysts to evaluate the level of cyber risk in their organization and proactively take actions to plan and prepare for potential attacks before they happen as well as contribute to the field through a cybersecurity tweet dataset.
Hardware Trojan Detection Combine with Machine Learning: an SVM-based Detection Approach. 2019 IEEE 13th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :202–206.
.
2019. With the application of integrated circuits (ICs) appears in all aspects of life, whether an IC is security and reliable has caused increasing worry which is of significant necessity. An attacker can achieve the malicious purpose by adding or removing some modules, so called hardware Trojans (HTs). In this paper, we use side-channel analysis (SCA) and support vector machine (SVM) classifier to determine whether there is a Trojan in the circuit. We use SAKURA-G circuit board with Xilinx SPARTAN-6 to complete our experiment. Results show that the Trojan detection rate is up to 93% and the classification accuracy is up to 91.8475%.
A Hybrid Density-Based Outlier Detection Model for Privacy in Electronic Patient Record system. 2019 5th International Conference on Information Management (ICIM). :92–96.
.
2019. This research concerns the detection of unauthorised access within hospital networks through the real-time analysis of audit logs. Privacy is a primary concern amongst patients due to the rising adoption of Electronic Patient Record (EPR) systems. There is growing evidence to suggest that patients may withhold information from healthcare providers due to lack of Trust in the security of EPRs. Yet, patient record data must be available to healthcare providers at the point of care. Ensuring privacy and confidentiality of that data is challenging. Roles within healthcare organisations are dynamic and relying on access control is not sufficient. Through proactive monitoring of audit logs, unauthorised accesses can be detected and presented to an analyst for review. Advanced data analytics and visualisation techniques can be used to aid the analysis of big data within EPR audit logs to identify and highlight pertinent data points. Employing a human-in-the-loop model ensures that suspicious activity is appropriately investigated and the data analytics is continuously improving. This paper presents a system that employs a Human-in-the-Loop Machine Learning (HILML) algorithm, in addition to a density-based local outlier detection model. The system is able to detect 145 anomalous behaviours in an unlabelled dataset of 1,007,727 audit logs. This equates to 0.014% of the EPR accesses being labelled as anomalous in a specialist Liverpool (UK) hospital.
Hybrid Model for Web Application Vulnerability Assessment Using Decision Tree and Bayesian Belief Network. 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN). :1–7.
.
2019. In the existing situation, most of the business process are running through web applications. This helps the enterprises to grow their business efficiently which creates a good consumer relationship. But the main problem is that they failed to provide a vulnerable free environment. To overcome this issue in web applications, vulnerability assessment should be made periodically. They are many vulnerability assessment methodologies which occur earlier are not much proactive. So, machine learning is needed to provide a combined solution to determine vulnerability occurrence and percentage of vulnerability occurred in logical web pages. We use Decision Tree and Bayesian Belief Network (BBN) as a collective solution to find either vulnerability occur in web applications and the vulnerability occurred percentage on different logical web pages.
Identification of Darknet Markets’ Bitcoin Addresses by Voting Per-address Classification Results. 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). :154—158.
.
2019. Bitcoin is a decentralized digital currency whose transactions are recorded in a common ledger, so called blockchain. Due to the anonymity and lack of law enforcement, Bitcoin has been misused in darknet markets which deal with illegal products, such as drugs and weapons. Therefore from the security forensics aspect, it is demanded to establish an approach to identify newly emerged darknet markets' transactions and addresses. In this paper, we thoroughly analyze Bitcoin transactions and addresses related to darknet markets and propose a novel identification method of darknet markets' addresses. To improve the identification performance, we propose a voting based method which decides the labels of multiple addresses controlled by the same user based on the number of the majority label. Through the computer simulation with more than 200K Bitcoin addresses, it was shown that our voting based method outperforms the nonvoting based one in terms of precision, recal, and F1 score. We also found that DNM's addresses pay higher fees than others, which significantly improves the classification.
Insights into Malware Detection via Behavioral Frequency Analysis Using Machine Learning. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–6.
.
2019. The most common defenses against malware threats involves the use of signatures derived from instances of known malware. However, the constant evolution of the malware threat landscape necessitates defense against unknown malware, making a signature catalog of known threats insufficient to prevent zero-day vulnerabilities from being exploited. Recent research has applied machine learning approaches to identify malware through artifacts of malicious activity as observed through dynamic behavioral analysis. We have seen that these approaches mimic common malware defenses by simply offering a method of detecting known malware. We contribute a new method of identifying software as malicious or benign through analysis of the frequency of Windows API system function calls. We show that this is a powerful technique for malware detection because it generates learning models which understand the difference between malicious and benign software, rather than producing a malware signature classifier. We contribute a method of systematically comparing machine learning models against different datasets to determine their efficacy in accurately distinguishing the difference between malicious and benign software.