Biblio

List
Filter

Found 1057 results

Filters: Keyword is machine learning [Clear All Filters]

2018-11-28

Suzanna, Sia Xin Yun, Anthony, Li Lianjie. 2017. Hierarchical Module Classification in Mixed-Initiative Conversational Agent System. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. :2535–2538.

Our operational context is a task-oriented dialog system where no single module satisfactorily addresses the range of conversational queries from humans. Such systems must be equipped with a range of technologies to address semantic, factual, task-oriented, open domain conversations using rule-based, semantic-web, traditional machine learning and deep learning. This raises two key challenges. First, the modules need to be managed and selected appropriately. Second, the complexity of troubleshooting on such systems is high. We address these challenges with a mixed-initiative model that controls conversational logic through hierarchical classification. We also developed an interface to increase interpretability for operators and to aggregate module performance.

Zou, Shuai, Kuzushima, Kento, Mitake, Hironori, Hasegawa, Shoichi. 2017. Conversational Agent Learning Natural Gaze and Motion of Multi-Party Conversation from Example. Proceedings of the 5th International Conference on Human Agent Interaction. :405–409.

Recent developments in robotics and virtual reality (VR) are making embodied agents familiar, and social behaviors of embodied conversational agents are essential to create mindful daily lives with conversational agents. Especially, natural nonverbal behaviors are required, such as gaze and gesture movement. We propose a novel method to create an agent with human-like gaze as a listener in multi-party conversation, using Hidden Markov Model (HMM) to learn the behavior from real conversation examples. The model can generate gaze reaction according to users' gaze and utterance. We implemented an agent with proposed method, and created VR environment to interact with the agent. The proposed agent reproduced several features of gaze behavior in example conversations. Impression survey result showed that there is at least a group who felt the proposed agent is similar to human and better than conventional methods.

2018-11-19

Shinya, A., Tung, N. D., Harada, T., Thawonmas, R.. 2017. Object-Specific Style Transfer Based on Feature Map Selection Using CNNs. 2017 Nicograph International (NicoInt). :88–88.

We propose a method for transferring an arbitrary style to only a specific object in an image. Style transfer is the process of combining the content of an image and the style of another image into a new image. Our results show that the proposed method can realize style transfer to specific object.

Papernot, Nicolas, McDaniel, Patrick, Goodfellow, Ian, Jha, Somesh, Celik, Z. Berkay, Swami, Ananthram. 2017. Practical Black-Box Attacks Against Machine Learning. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. :506–519.

Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.

2018-11-14

Xi, Z., Chen, L., Chen, M., Dai, Z., Li, Y.. 2018. Power Mobile Terminal Security Assessment Based on Weights Self-Learning. 2018 10th International Conference on Communication Software and Networks (ICCSN). :502–505.

At present, mobile terminals are widely used in power system and easy to be the target or springboard to attack the power system. It is necessary to have security assessment of power mobile terminal system to enable early warning of potential risks. In the context, this paper builds the security assessment system against to power mobile terminals, with features from security assessment system of general mobile terminals and power application scenarios. Compared with the existing methods, this paper introduces machine learning to the Rank Correlation Analysis method, which relies on expert experience, and uses objective experimental data to optimize the weight parameters of the indicators. From experiments, this paper proves that weights self-learning method can be used to evaluate the security of power mobile terminal system and improve credibility of the result.

Wu, Q., Zhao, W.. 2018. Machine Learning Based Human Activity Detection in a Privacy-Aware Compliance Tracking System. 2018 IEEE International Conference on Electro/Information Technology (EIT). :0673–0676.

In this paper, we report our work on using machine learning techniques to predict back bending activity based on field data acquired in a local nursing home. The data are recorded by a privacy-aware compliance tracking system (PACTS). The objective of PACTS is to detect back-bending activities and issue real-time alerts to the participant when she bends her back excessively, which we hope could help the participant form good habits of using proper body mechanics when performing lifting/pulling tasks. We show that our algorithms can differentiate nursing staffs baseline and high-level bending activities by using human skeleton data without any expert rules.

Zhang, J., Zheng, L., Gong, L., Gu, Z.. 2018. A Survey on Security of Cloud Environment: Threats, Solutions, and Innovation. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). :910–916.

With the extensive application of cloud computing technology developing, security is of paramount importance in Cloud Computing. In the cloud computing environment, surveys have been provided on several intrusion detection techniques for detecting intrusions. We will summarize some literature surveys of various attack taxonomy, which might cause various threats in cloud environment. Such as attacks in virtual machines, attacks on virtual machine monitor, and attacks in tenant network. Besides, we review massive existing solutions proposed in the literature, such as misuse detection techniques, behavior analysis of network traffic, behavior analysis of programs, virtual machine introspection (VMI) techniques, etc. In addition, we have summarized some innovations in the field of cloud security, such as CloudVMI, data mining techniques, artificial intelligence, and block chain technology, etc. At the same time, our team designed and implemented the prototype system of CloudI (Cloud Introspection). CloudI has characteristics of high security, high performance, high expandability and multiple functions.

2018-09-28

Umer, Muhammad Azmi, Mathur, Aditya, Junejo, Khurum Nazir, Adepu, Sridhar. 2017. Integrating Design and Data Centric Approaches to Generate Invariants for Distributed Attack Detection. Proceedings of the 2017 Workshop on Cyber-Physical Systems Security and PrivaCy. :131–136.

Process anomaly is used for detecting cyber-physical attacks on critical infrastructure such as plants for water treatment and electric power generation. Identification of process anomaly is possible using rules that govern the physical and chemical behavior of the process within a plant. These rules, often referred to as invariants, can be derived either directly from plant design or from the data generated in an operational. However, for operational legacy plants, one might consider a data-centric approach for the derivation of invariants. The study reported here is a comparison of design-centric and data-centric approaches to derive process invariants. The study was conducted using the design of, and the data generated from, an operational water treatment plant. The outcome of the study supports the conjecture that neither approach is adequate in itself, and hence, the two ought to be integrated.

Miller, Sean T., Busby-Earle, Curtis. 2017. Multi-Perspective Machine Learning a Classifier Ensemble Method for Intrusion Detection. Proceedings of the 2017 International Conference on Machine Learning and Soft Computing. :7–12.

Today cyber security is one of the most active fields of re- search due to its wide range of impact in business, govern- ment and everyday life. In recent years machine learning methods and algorithms have been quite successful in a num- ber of security areas. In this paper, we explore an approach to classify intrusion called multi-perspective machine learn- ing (MPML). For any given cyber-attack there are multiple methods of detection. Every method of detection is built on one or more network characteristic. These characteristics are then represented by a number of network features. The main idea behind MPML is that, by grouping features that support the same characteristics into feature subsets called perspectives, this will encourage diversity among perspectives (classifiers in the ensemble) and improve the accuracy of prediction. Initial results on the NSL- KDD dataset show at least a 4% improvement over other ensemble methods such as bagging boosting rotation forest and random for- est.

2018-09-12

Sachdeva, A., Kapoor, R., Sharma, A., Mishra, A.. 2017. Categorical Classification and Deletion of Spam Images on Smartphones Using Image Processing and Machine Learning. 2017 International Conference on Machine Learning and Data Science (MLDS). :23–30.

We regularly use communication apps like Facebook and WhatsApp on our smartphones, and the exchange of media, particularly images, has grown at an exponential rate. There are over 3 billion images shared every day on Whatsapp alone. In such a scenario, the management of images on a mobile device has become highly inefficient, and this leads to problems like low storage, manual deletion of images, disorganization etc. In this paper, we present a solution to tackle these issues by automatically classifying every image on a smartphone into a set of predefined categories, thereby segregating spam images from them, allowing the user to delete them seamlessly.

Montieri, A., Ciuonzo, D., Aceto, G., Pescape, A.. 2017. Anonymity Services Tor, I2P, JonDonym: Classifying in the Dark. 2017 29th International Teletraffic Congress (ITC 29). 1:81–89.

Traffic classification, i.e. associating network traffic to the application that generated it, is an important tool for several tasks, spanning on different fields (security, management, traffic engineering, R&D). This process is challenged by applications that preserve Internet users' privacy by encrypting the communication content, and even more by anonymity tools, additionally hiding the source, the destination, and the nature of the communication. In this paper, leveraging a public dataset released in 2017, we provide (repeatable) classification results with the aim of investigating to what degree the specific anonymity tool (and the traffic it hides) can be identified, when compared to the traffic of the other considered anonymity tools, using machine learning approaches based on the sole statistical features. To this end, four classifiers are trained and tested on the dataset: (i) Naïve Bayes, (ii) Bayesian Network, (iii) C4.5, and (iv) Random Forest. Results show that the three considered anonymity networks (Tor, I2P, JonDonym) can be easily distinguished (with an accuracy of 99.99%), telling even the specific application generating the traffic (with an accuracy of 98.00%).

Jang, Uyeong, Wu, Xi, Jha, Somesh. 2017. Objective Metrics and Gradient Descent Algorithms for Adversarial Examples in Machine Learning. Proceedings of the 33rd Annual Computer Security Applications Conference. :262–277.

Fueled by massive amounts of data, models produced by machine-learning (ML) algorithms are being used in diverse domains where security is a concern, such as, automotive systems, finance, health-care, computer vision, speech recognition, natural-language processing, and malware detection. Of particular concern is use of ML in cyberphysical systems, such as driver-less cars and aviation, where the presence of an adversary can cause serious consequences. In this paper we focus on attacks caused by adversarial samples, which are inputs crafted by adding small, often imperceptible, perturbations to force a ML model to misclassify. We present a simple gradient-descent based algorithm for finding adversarial samples, which performs well in comparison to existing algorithms. The second issue that this paper tackles is that of metrics. We present a novel metric based on few computer-vision algorithms for measuring the quality of adversarial samples.

2018-08-23

Li, Q., Xu, B., Li, S., Liu, Y., Cui, D.. 2017. Reconstruction of measurements in state estimation strategy against cyber attacks for cyber physical systems. 2017 36th Chinese Control Conference (CCC). :7571–7576.

To improve the resilience of state estimation strategy against cyber attacks, the Compressive Sensing (CS) is applied in reconstruction of incomplete measurements for cyber physical systems. First, observability analysis is used to decide the time to run the reconstruction and the damage level from attacks. In particular, the dictionary learning is proposed to form the over-completed dictionary by K-Singular Value Decomposition (K-SVD). Besides, due to the irregularity of incomplete measurements, sampling matrix is designed as the measurement matrix. Finally, the simulation experiments on 6-bus power system illustrate that the proposed method achieves the incomplete measurements reconstruction perfectly, which is better than the joint dictionary. When only 29% available measurements are left, the proposed method has generality for four kinds of recovery algorithms.

2018-07-18

Kreimel, Philipp, Eigner, Oliver, Tavolato, Paul. 2017. Anomaly-Based Detection and Classification of Attacks in Cyber-Physical Systems. Proceedings of the 12th International Conference on Availability, Reliability and Security. :40:1–40:6.

Cyber-physical systems are found in industrial and production systems, as well as critical infrastructures. Due to the increasing integration of IP-based technology and standard computing devices, the threat of cyber-attacks on cyber-physical systems has vastly increased. Furthermore, traditional intrusion defense strategies for IT systems are often not applicable in operational environments. In this paper we present an anomaly-based approach for detection and classification of attacks in cyber-physical systems. To test our approach, we set up a test environment with sensors, actuators and controllers widely used in industry, thus, providing system data as close as possible to reality. First, anomaly detection is used to define a model of normal system behavior by calculating outlier scores from normal system operations. This valid behavior model is then compared with new data in order to detect anomalies. Further, we trained an attack model, based on supervised attacks against the test setup, using the naive Bayes classifier. If an anomaly is detected, the classification process tries to classify the anomaly by applying the attack model and calculating prediction confidences for trained classes. To evaluate the statistical performance of our approach, we tested the model by applying an unlabeled dataset, which contains valid and anomalous data. The results show that this approach was able to detect and classify such attacks with satisfactory accuracy.

2018-07-16

Yang, Lei, Li, Fengjun. 2018. Cloud-Assisted Privacy-Preserving Classification for IoT Applications. IEEE Conference on Communications and Network Security.

The explosive proliferation of Internet of Things (IoT) devices is generating an incomprehensible amount of data. Machine learning plays an imperative role in aggregating this data and extracting valuable information for improving operational and decision-making processes. In particular, emerging machine intelligence platforms that host pre-trained machine learning models are opening up new opportunities for IoT industries. While those platforms facilitate customers to analyze IoT data and deliver faster and accurate insights, end users and machine learning service providers (MLSPs) have raised concerns regarding security and privacy of IoT data as well as the pre-trained machine learning models for certain applications such as healthcare, smart energy, etc. In this paper, we propose a cloud-assisted, privacy-preserving machine learning classification scheme over encrypted data for IoT devices. Our scheme is based on a three-party model coupled with a two-stage decryption Paillier-based cryptosystem, which allows a cloud server to interact with MLSPs on behalf of the resource-constrained IoT devices in a privacy-preserving manner, and shift load of computation-intensive classification operations from them. The detailed security analysis and the extensive simulations with different key lengths and number of features and classes demonstrate that our scheme can effectively reduce the overhead for IoT devices in machine learning classification applications.

2018-07-13

Uttam Thakore, University of Illinois at Urbana-Champaign, Ahmed Fawaz, University of Illinois at Urbana-Champaign, William H. Sanders, University of Illinois at Urbana-Champaign. 2018. Detecting Monitor Compromise using Evidential Reasoning.

Stealthy attackers often disable or tamper with system monitors to hide their tracks and evade detection. In this poster, we present a data-driven technique to detect such monitor compromise using evidential reasoning. Leveraging the fact that hiding from multiple, redundant monitors is difficult for an attacker, to identify potential monitor compromise, we combine alerts from different sets of monitors by using Dempster-Shafer theory, and compare the results to find outliers. We describe our ongoing work in this area.

2018-07-06

Du, Xiaojiang. 2004. Using k-nearest neighbor method to identify poison message failure. IEEE Global Telecommunications Conference, 2004. GLOBECOM '04. 4:2113–2117Vol.4.

Poison message failure is a mechanism that has been responsible for large scale failures in both telecommunications and IP networks. The poison message failure can propagate in the network and cause an unstable network. We apply a machine learning, data mining technique in the network fault management area. We use the k-nearest neighbor method to identity the poison message failure. We also propose a "probabilistic" k-nearest neighbor method which outputs a probability distribution about the poison message. Through extensive simulations, we show that the k-nearest neighbor method is very effective in identifying the responsible message type.

Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N. K.. 2015. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. IEEE Journal of Biomedical and Health Informatics. 19:1893–1905.

Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

Zhang, R., Zhu, Q.. 2017. A game-theoretic defense against data poisoning attacks in distributed support vector machines. 2017 IEEE 56th Annual Conference on Decision and Control (CDC). :4582–4587.

With a large number of sensors and control units in networked systems, distributed support vector machines (DSVMs) play a fundamental role in scalable and efficient multi-sensor classification and prediction tasks. However, DSVMs are vulnerable to adversaries who can modify and generate data to deceive the system to misclassification and misprediction. This work aims to design defense strategies for DSVM learner against a potential adversary. We use a game-theoretic framework to capture the conflicting interests between the DSVM learner and the attacker. The Nash equilibrium of the game allows predicting the outcome of learning algorithms in adversarial environments, and enhancing the resilience of the machine learning through dynamic distributed algorithms. We develop a secure and resilient DSVM algorithm with rejection method, and show its resiliency against adversary with numerical experiments.

2018-06-20

Luo, J. S., Lo, D. C. T.. 2017. Binary malware image classification using machine learning with local binary pattern. 2017 IEEE International Conference on Big Data (Big Data). :4664–4667.

Malware classification is a critical part in the cyber-security. Traditional methodologies for the malware classification typically use static analysis and dynamic analysis to identify malware. In this paper, a malware classification methodology based on its binary image and extracting local binary pattern (LBP) features is proposed. First, malware images are reorganized into 3 by 3 grids which is mainly used to extract LBP feature. Second, the LBP is implemented on the malware images to extract features in that it is useful in pattern or texture classification. Finally, Tensorflow, a library for machine learning, is applied to classify malware images with the LBP feature. Performance comparison results among different classifiers with different image descriptors such as GIST, a spatial envelop, and the LBP demonstrate that our proposed approach outperforms others.

Kosmidis, Konstantinos, Kalloniatis, Christos. 2017. Machine Learning and Images for Malware Detection and Classification. Proceedings of the 21st Pan-Hellenic Conference on Informatics. :5:1–5:6.

Detecting malicious code with exact match on collected datasets is becoming a large-scale identification problem due to the existence of new malware variants. Being able to promptly and accurately identify new attacks enables security experts to respond effectively. My proposal is to develop an automated framework for identification of unknown vulnerabilities by leveraging current neural network techniques. This has a significant and immediate value for the security field, as current anti-virus software is typically able to recognize the malware type only after its infection, and preventive measures are limited. Artificial Intelligence plays a major role in automatic malware classification: numerous machine-learning methods, both supervised and unsupervised, have been researched to try classifying malware into families based on features acquired by static and dynamic analysis. The value of automated identification is clear, as feature engineering is both a time-consuming and time-sensitive task, with new malware studied while being observed in the wild.

Searles, R., Xu, L., Killian, W., Vanderbruggen, T., Forren, T., Howe, J., Pearson, Z., Shannon, C., Simmons, J., Cavazos, J.. 2017. Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection. 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). :69–77.

Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.

2018-06-11

Anderson, Blake, McGrew, David. 2017. Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1723–1732.

The application of machine learning for the detection of malicious network traffic has been well researched over the past several decades; it is particularly appealing when the traffic is encrypted because traditional pattern-matching approaches cannot be used. Unfortunately, the promise of machine learning has been slow to materialize in the network security domain. In this paper, we highlight two primary reasons why this is the case: inaccurate ground truth and a highly non-stationary data distribution. To demonstrate and understand the effect that these pitfalls have on popular machine learning algorithms, we design and carry out experiments that show how six common algorithms perform when confronted with real network data. With our experimental results, we identify the situations in which certain classes of algorithms underperform on the task of encrypted malware traffic classification. We offer concrete recommendations for practitioners given the real-world constraints outlined. From an algorithmic perspective, we find that the random forest ensemble method outperformed competing methods. More importantly, feature engineering was decisive; we found that iterating on the initial feature set, and including features suggested by domain experts, had a much greater impact on the performance of the classification system. For example, linear regression using the more expressive feature set easily outperformed the random forest method using a standard network traffic representation on all criteria considered. Our analysis is based on millions of TLS encrypted sessions collected over 12 months from a commercial malware sandbox and two geographically distinct, large enterprise networks.

2018-06-07

Appelt, D., Panichella, A., Briand, L.. 2017. Automatically Repairing Web Application Firewalls Based on Successful SQL Injection Attacks. 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). :339–350.

Testing and fixing Web Application Firewalls (WAFs) are two relevant and complementary challenges for security analysts. Automated testing helps to cost-effectively detect vulnerabilities in a WAF by generating effective test cases, i.e., attacks. Once vulnerabilities have been identified, the WAF needs to be fixed by augmenting its rule set to filter attacks without blocking legitimate requests. However, existing research suggests that rule sets are very difficult to understand and too complex to be manually fixed. In this paper, we formalise the problem of fixing vulnerable WAFs as a combinatorial optimisation problem. To solve it, we propose an automated approach that combines machine learning with multi-objective genetic algorithms. Given a set of legitimate requests and bypassing SQL injection attacks, our approach automatically infers regular expressions that, when added to the WAF's rule set, prevent many attacks while letting legitimate requests go through. Our empirical evaluation based on both open-source and proprietary WAFs shows that the generated filter rules are effective at blocking previously identified and successful SQL injection attacks (recall between 54.6% and 98.3%), while triggering in most cases no or few false positives (false positive rate between 0% and 2%).

Jiang, Jun, Zhao, Xinghui, Wallace, Scott, Cotilla-Sanchez, Eduardo, Bass, Robert. 2017. Mining PMU Data Streams to Improve Electric Power System Resilience. Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. :95–102.

Phasor measurement units (PMUs) provide high-fidelity situational awareness of electric power grid operations. PMU data are used in real-time to inform wide area state estimation, monitor area control error, and event detection. As PMU data becomes more reliable, these devices are finding roles within control systems such as demand response programs and early fault detection systems. As with other cyber physical systems, maintaining data integrity and security are significant challenges for power system operators. In this paper, we present a comprehensive study of multiple machine learning techniques for detecting malicious data injection within PMU data streams. The two datasets used in this study are from the Bonneville Power Administration's PMU network and an inter-university PMU network among three universities, located in the U.S. Pacific Northwest. These datasets contain data from both the transmission level and the distribution level. Our results show that both SVM and ANN are generally effective in detecting spoofed data, and TensorFlow, the newly released tool, demonstrates potential for distributing the training workload and achieving higher performance. We expect these results to shed light on future work of adopting machine learning and data analytics techniques in the electric power industry.