Visible to the public Biblio

Found 570 results

Filters: Keyword is Data models  [Clear All Filters]
2020-12-14
Yu, L., Chen, L., Dong, J., Li, M., Liu, L., Zhao, B., Zhang, C..  2020.  Detecting Malicious Web Requests Using an Enhanced TextCNN. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :768–777.
This paper proposes an approach that combines a deep learning-based method and a traditional machine learning-based method to efficiently detect malicious requests Web servers received. The first few layers of Convolutional Neural Network for Text Classification (TextCNN) are used to automatically extract powerful semantic features and in the meantime transferable statistical features are defined to boost the detection ability, specifically Web request parameter tampering. The semantic features from TextCNN and transferable statistical features from artificially-designing are grouped together to be fed into Support Vector Machine (SVM), replacing the last layer of TextCNN for classification. To facilitate the understanding of abstract features in form of numerical data in vectors extracted by TextCNN, this paper designs trace-back functions that map max-pooling outputs back to words in Web requests. After investigating the current available datasets for Web attack detection, HTTP Dataset CSIC 2010 is selected to test and verify the proposed approach. Compared with other deep learning models, the experimental results demonstrate that the approach proposed in this paper is competitive with the state-of-the-art.
Deng, M., Wu, X., Feng, P., Zeng, W..  2020.  Sparse Support Vector Machine for Network Behavior Anomaly Detection. 2020 IEEE 8th International Conference on Information, Communication and Networks (ICICN). :199–204.
Network behavior anomaly detection (NBAD) require fast mechanisms for learning from the large scale data. However, the training velocity of general machine learning approach is largely limited by the adopted training weights of all features in the NBAD. In this paper, we notice, however, that the related weights matching of NBAD features is sparse, which is not necessary for holding all weights. Hence, in this paper, we consider an efficient support vector machine (SVM) approach for NBAD by imposing 1 -norm. Essentially, we propose to use sparse SVM (S-SVM), where sparsity in model, i.e. in weights is used to interfere with special feature selection and that can achieve feature selection and classification efficiently.
Chen, X., Cao, C., Mai, J..  2020.  Network Anomaly Detection Based on Deep Support Vector Data Description. 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). :251–255.
Intrusion detection system based on representation learning is the main research direction in the field of anomaly detection. Malicious traffic detection system can distinguish normal and malicious traffic by learning representations between normal and malicious traffic. However, under the context of big data, there are many types of malicious traffic, and the features are also changing constantly. It is still a urgent problem to design a detection model that can effectively learn and summarize the feature of normal traffic and accurately identify the features of new kinds of malicious traffic.in this paper, a malicious traffic detection method based on Deep Support Vector Data Description is proposed, which is called Deep - SVDD. We combine convolutional neural network (CNN) with support vector data description, and train the model with normal traffic. The normal traffic features are mapped to high-dimensional space through neural networks, and a compact hypersphere is trained by unsupervised learning, which includes the normal features of the highdimensional space. Malicious traffic fall outside the hypersphere, thus distinguishing between normal and malicious traffic. Experiments show that the model has a high detection rate and a low false alarm rate, and it can effectively identify new malicious traffic.
2020-12-07
Islam, M. M., Karmakar, G., Kamruzzaman, J., Murshed, M..  2019.  Measuring Trustworthiness of IoT Image Sensor Data Using Other Sensors’ Complementary Multimodal Data. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :775–780.
Trust of image sensor data is becoming increasingly important as the Internet of Things (IoT) applications grow from home appliances to surveillance. Up to our knowledge, there exists only one work in literature that estimates trustworthiness of digital images applied to forensic applications, based on a machine learning technique. The efficacy of this technique is heavily dependent on availability of an appropriate training set and adequate variation of IoT sensor data with noise, interference and environmental condition, but availability of such data cannot be assured always. Therefore, to overcome this limitation, a robust method capable of estimating trustworthy measure with high accuracy is needed. Lowering cost of sensors allow many IoT applications to use multiple types of sensors to observe the same event. In such cases, complementary multimodal data of one sensor can be exploited to measure trust level of another sensor data. In this paper, for the first time, we introduce a completely new approach to estimate the trustworthiness of an image sensor data using another sensor's numerical data. We develop a theoretical model using the Dempster-Shafer theory (DST) framework. The efficacy of the proposed model in estimating trust level of an image sensor data is analyzed by observing a fire event using IoT image and temperature sensor data in a residential setup under different scenarios. The proposed model produces highly accurate trust level in all scenarios with authentic and forged image data.
Qian, Y..  2019.  Research on Trusted Authentication Model and Mechanism of Data Fusion. 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). :479–482.
Firstly, this paper analyses the technical foundation of single sign-on solution of unified authentication platform, and analyses the advantages and disadvantages of each solution. Secondly, from the point of view of software engineering, such as function requirement, performance requirement, development mode, architecture scheme, technology development framework and system configuration environment of the unified authentication platform, the unified authentication platform is analyzed and designed, and the database design and system design framework of the system are put forward according to the system requirements. Thirdly, the idea and technology of unified authentication platform based on JA-SIG CAS are discussed, and the design and implementation of each module of unified authentication platform based on JA-SIG CAS are analyzed, which has been applied in ship cluster platform.
2020-11-23
Zhu, L., Dong, H., Shen, M., Gai, K..  2019.  An Incentive Mechanism Using Shapley Value for Blockchain-Based Medical Data Sharing. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :113–118.
With the development of big data and machine learning techniques, medical data sharing for the use of disease diagnosis has received considerable attention. Blockchain, as an emerging technology, has been widely used to resolve the efficiency and security issues in medical data sharing. However, the existing studies on blockchain-based medical data sharing have rarely concerned about the reasonable incentive mechanism. In this paper, we propose a cooperation model where medical data is shared via blockchain. We derive the topological relationships among the participants consisting of data owners, miners and third parties, and gradually develop the computational process of Shapley value revenue distribution. Specifically, we explore the revenue distribution under different consensuses of blockchain. Finally, we demonstrate the incentive effect and rationality of the proposed solution by analyzing the revenue distribution.
Wu, K., Gao, X., Liu, Y..  2018.  Web server security evaluation method based on multi-source data. 2018 International Conference on Cloud Computing, Big Data and Blockchain (ICCBB). :1–6.
Traditional web security assessments are evaluated using a single data source, and the results of the calculations from different data sources are different. Based on multi-source data, this paper uses Analytic Hierarchy Process to construct an evaluation model, calculates the weight of each level of indicators in the web security evaluation model, analyzes and processes the data, calculates the host security threat assessment values at various levels, and visualizes the evaluation results through ECharts+WebGL technology.
2020-11-20
Han, H., Wang, Q., Chen, C..  2019.  Policy Text Analysis Based on Text Mining and Fuzzy Cognitive Map. 2019 15th International Conference on Computational Intelligence and Security (CIS). :142—146.
With the introduction of computer methods, the amount of material and processing accuracy of policy text analysis have been greatly improved. In this paper, Text mining(TM) and latent semantic analysis(LSA) were used to collect policy documents and extract policy elements from them. Fuzzy association rule mining(FARM) technique and partial association test (PA) were used to discover the causal relationships and impact degrees between elements, and a fuzzy cognitive map (FCM) was developed to deduct the evolution of elements through a soft computing method. This non-interventionist approach avoids the validity defects caused by the subjective bias of researchers and provides policy makers with more objective policy suggestions from a neutral perspective. To illustrate the accuracy of this method, this study experimented by taking the state-owned capital layout adjustment related policies as an example, and proved that this method can effectively analyze policy text.
Alzahrani, A., Johnson, C., Altamimi, S..  2018.  Information security policy compliance: Investigating the role of intrinsic motivation towards policy compliance in the organization. 2018 4th International Conference on Information Management (ICIM). :125—132.
Recent behavioral research in information security has focused on increasing employees' motivation to enhance the security performance in an organization. This empirical study investigated employees' information security policy (ISP) compliance intentions using self-determination theory (SDT). Relevant hypotheses were developed to test the proposed research model. Data obtained via a survey (N=3D407) from a Fortune 600 organization in Saudi Arabia provides empirical support for the model. The results confirmed that autonomy, competence and the concept of relatedness all positively affect employees' intentions to comply. The variable 'perceived value congruence' had a negative effect on ISP compliance intentions, and the perceived legitimacy construct did not affect employees' intentions. In general, the findings of this study suggest that SDT has value in research into employees' ISP compliance intentions.
Sarochar, J., Acharya, I., Riggs, H., Sundararajan, A., Wei, L., Olowu, T., Sarwat, A. I..  2019.  Synthesizing Energy Consumption Data Using a Mixture Density Network Integrated with Long Short Term Memory. 2019 IEEE Green Technologies Conference(GreenTech). :1—4.
Smart cities comprise multiple critical infrastructures, two of which are the power grid and communication networks, backed by centralized data analytics and storage. To effectively model the interdependencies between these infrastructures and enable a greater understanding of how communities respond to and impact them, large amounts of varied, real-world data on residential and commercial consumer energy consumption, load patterns, and associated human behavioral impacts are required. The dissemination of such data to the research communities is, however, largely restricted because of security and privacy concerns. This paper creates an opportunity for the development and dissemination of synthetic energy consumption data which is inherently anonymous but holds similarities to the properties of real data. This paper explores a framework using mixture density network (MDN) model integrated with a multi-layered Long Short-Term Memory (LSTM) network which shows promise in this area of research. The model is trained using an initial sample recorded from residential smart meters in the state of Florida, and is used to generate fully synthetic energy consumption data. The synthesized data will be made publicly available for interested users.
Goyal, Y., Sharma, A..  2019.  A Semantic Machine Learning Approach for Cyber Security Monitoring. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). :439—442.
Security refers to precautions designed to shield the availability and integrity of information exchanged among the digital global community. Information safety measure typically protects the virtual facts from unauthorized sources to get a right of entry to, disclosure, manipulation, alteration or destruction on both hardware and software technologies. According to an evaluation through experts operating in the place of information safety, some of the new cyber-attacks are keep on emerging in all the business processes. As a stop result of the analyses done, it's been determined that although the level of risk is not excessive in maximum of the attacks, it's far a severe risk for important data and the severity of those attacks is prolonged. Prior safety structures has been established to monitor various cyber-threats, predominantly using a gadget processed data or alerts for showing each deterministic and stochastic styles. The principal finding for deterministic patterns in cyber- attacks is that they're neither unbiased nor random over the years. Consequently, the quantity of assaults in the past helps to monitor the range of destiny attacks. The deterministic styles can often be leveraged to generate moderately correct monitoring.
Liu, D., Lou, F., Wang, H..  2019.  Modeling and measurement internal threat process based on advanced stochastic model*. 2019 Chinese Automation Congress (CAC). :1077—1081.
Previous research on internal threats was mostly focused on modeling threat behaviors. These studies have paid little attention to risk measurement. This paper analyzed the internal threat scenarios, introduced the operation related protection model into the firewall-password model, constructed a series of sub models. By analyzing the illegal data out process, the analysis model of target network can be rapidly generated based on four protection sub-models. Then the risk value of an assessment point can be computed dynamically according to the Petri net computing characteristics and the effectiveness of overall network protection can be measured. This method improves the granularity of the model and simplifies the complexity of modeling complex networks and can realize dynamic and real-time risk measurement.
2020-11-16
Zhang, C., Xu, C., Xu, J., Tang, Y., Choi, B..  2019.  GEMˆ2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :842–853.
Blockchain technology has attracted much attention due to the great success of the cryptocurrencies. Owing to its immutability property and consensus protocol, blockchain offers a new solution for trusted storage and computation services. To scale up the services, prior research has suggested a hybrid storage architecture, where only small meta-data are stored onchain and the raw data are outsourced to off-chain storage. To protect data integrity, a cryptographic proof can be constructed online for queries over the data stored in the system. However, the previous schemes only support simple key-value queries. In this paper, we take the first step toward studying authenticated range queries in the hybrid-storage blockchain. The key challenge lies in how to design an authenticated data structure (ADS) that can be efficiently maintained by the blockchain, in which a unique gas cost model is employed. By analyzing the performance of the existing techniques, we propose a novel ADS, called GEM2-tree, which is not only gas-efficient but also effective in supporting authenticated queries. To further reduce the ADS maintenance cost without sacrificing much the query performance, we also propose an optimized structure, GEM2*-tree, by designing a two-level index structure. Theoretical analysis and empirical evaluation validate the performance of the proposed ADSs.
2020-11-09
Pflanzner, T., Feher, Z., Kertesz, A..  2019.  A Crawling Approach to Facilitate Open IoT Data Archiving and Reuse. 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). :235–242.
Several cloud providers have started to offer specific data management services by responding to the new trend called the Internet of Things (IoT). In recent years, we have already seen that cloud computing has managed to serve IoT needs for data retrieval, processing and visualization transparent for the user side. IoT-Cloud systems for smart cities and smart regions can be very complex, therefore their design and analysis should be supported by means of simulation. Nevertheless, the models used in simulation environments should be as close as to the real world utilization to provide reliable results. To facilitate such simulations, in earlier works we proposed an IoT trace archiving service called SUMMON that can be used to gather real world datasets, and to reuse them for simulation experiments. In this paper we provide an extension to SUMMON with an automated web crawling service that gathers IoT and sensor data from publicly available websites. We introduce the architecture and operation of this approach, and exemplify it utilization with three use cases. The provided archiving solution can be used by simulators to perform realistic evaluations.
Ankam, D., Bouguila, N..  2018.  Compositional Data Analysis with PLS-DA and Security Applications. 2018 IEEE International Conference on Information Reuse and Integration (IRI). :338–345.
In Compositional data, the relative proportions of the components contain important relevant information. In such case, Euclidian distance fails to capture variation when considered within data science models and approaches such as partial least squares discriminant analysis (PLS-DA). Indeed, the Euclidean distance assumes implicitly that the data is normally distributed which is not the case of compositional vectors. Aitchison transformation has been considered as a standard in compositional data analysis. In this paper, we consider two other transformation methods, Isometric log ratio (ILR) transformation and data-based power (alpha) transformation, before feeding the data to PLS-DA algorithm for classification [1]. In order to investigate the merits of both methods, we apply them in two challenging information system security applications namely spam filtering and intrusion detection.
2020-11-04
Zhang, J., Chen, J., Wu, D., Chen, B., Yu, S..  2019.  Poisoning Attack in Federated Learning using Generative Adversarial Nets. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :374—380.

Federated learning is a novel distributed learning framework, where the deep learning model is trained in a collaborative manner among thousands of participants. The shares between server and participants are only model parameters, which prevent the server from direct access to the private training data. However, we notice that the federated learning architecture is vulnerable to an active attack from insider participants, called poisoning attack, where the attacker can act as a benign participant in federated learning to upload the poisoned update to the server so that he can easily affect the performance of the global model. In this work, we study and evaluate a poisoning attack in federated learning system based on generative adversarial nets (GAN). That is, an attacker first acts as a benign participant and stealthily trains a GAN to mimic prototypical samples of the other participants' training set which does not belong to the attacker. Then these generated samples will be fully controlled by the attacker to generate the poisoning updates, and the global model will be compromised by the attacker with uploading the scaled poisoning updates to the server. In our evaluation, we show that the attacker in our construction can successfully generate samples of other benign participants using GAN and the global model performs more than 80% accuracy on both poisoning tasks and main tasks.

Liang, Y., He, D., Chen, D..  2019.  Poisoning Attack on Load Forecasting. 2019 IEEE Innovative Smart Grid Technologies - Asia (ISGT Asia). :1230—1235.

Short-term load forecasting systems for power grids have demonstrated high accuracy and have been widely employed for commercial use. However, classic load forecasting systems, which are based on statistical methods, are subject to vulnerability from training data poisoning. In this paper, we demonstrate a data poisoning strategy that effectively corrupts the forecasting model even in the presence of outlier detection. To the best of our knowledge, poisoning attack on short-term load forecasting with outlier detection has not been studied in previous works. Our method applies to several forecasting models, including the most widely-adapted and best-performing ones, such as multiple linear regression (MLR) and neural network (NN) models. Starting with the MLR model, we develop a novel closed-form solution to quickly estimate the new MLR model after a round of data poisoning without retraining. We then employ line search and simulated annealing to find the poisoning attack solution. Furthermore, we use the MLR attacking solution to generate a numerical solution for other models, such as NN. The effectiveness of our algorithm has been tested on the Global Energy Forecasting Competition (GEFCom2012) data set with the presence of outlier detection.

2020-11-02
Pan, C., Huang, J., Gong, J., Yuan, X..  2019.  Few-Shot Transfer Learning for Text Classification With Lightweight Word Embedding Based Models. IEEE Access. 7:53296–53304.
Many deep learning architectures have been employed to model the semantic compositionality for text sequences, requiring a huge amount of supervised data for parameters training, making it unfeasible in situations where numerous annotated samples are not available or even do not exist. Different from data-hungry deep models, lightweight word embedding-based models could represent text sequences in a plug-and-play way due to their parameter-free property. In this paper, a modified hierarchical pooling strategy over pre-trained word embeddings is proposed for text classification in a few-shot transfer learning way. The model leverages and transfers knowledge obtained from some source domains to recognize and classify the unseen text sequences with just a handful of support examples in the target problem domain. The extensive experiments on five datasets including both English and Chinese text demonstrate that the simple word embedding-based models (SWEMs) with parameter-free pooling operations are able to abstract and represent the semantic text. The proposed modified hierarchical pooling method exhibits significant classification performance in the few-shot transfer learning tasks compared with other alternative methods.
2020-10-16
Zhang, Rui, Chen, Hongwei.  2019.  Intrusion Detection of Industrial Control System Based on Stacked Auto-Encoder. 2019 Chinese Automation Congress (CAC). :5638—5643.

With the deep integration of industrial control systems and Internet technologies, how to effectively detect whether industrial control systems are threatened by intrusion is a difficult problem in industrial security research. Aiming at the difficulty of high dimensionality and non-linearity of industrial control system network data, the stacked auto-encoder is used to extract the network data features, and the multi-classification support vector machine is used for classification. The research results show that the accuracy of the intrusion detection model reaches 95.8%.

2020-10-14
Khezrimotlagh, Darius, Khazaei, Javad, Asrari, Arash.  2019.  MILP Modeling of Targeted False Load Data Injection Cyberattacks to Overflow Transmission Lines in Smart Grids. 2019 North American Power Symposium (NAPS). :1—7.
Cyber attacks on transmission lines are one of the main challenges in security of smart grids. These targeted attacks, if not detected, might cause cascading problems in power systems. This paper proposes a bi-level mixed integer linear programming (MILP) optimization model for false data injection on targeted buses in a power system to overflow targeted transmission lines. The upper level optimization problem outputs the optimized false data injections on targeted load buses to overflow a targeted transmission line without violating bad data detection constraints. The lower level problem integrates the false data injections into the optimal power flow problem without violating the optimal power flow constraints. A few case studies are designed to validate the proposed attack model on IEEE 118-bus power system.
Wang, Yufeng, Shi, Wanjiao, Jin, Qun, Ma, Jianhua.  2019.  An Accurate False Data Detection in Smart Grid Based on Residual Recurrent Neural Network and Adaptive threshold. 2019 IEEE International Conference on Energy Internet (ICEI). :499—504.
Smart grids are vulnerable to cyber-attacks, which can cause significant damage and huge economic losses. Generally, state estimation (SE) is used to observe the operation of the grid. State estimation of the grid is vulnerable to false data injection attack (FDIA), so diagnosing this type of malicious attack has a major impact on ensuring reliable operation of the power system. In this paper, we present an effective FDIA detection method based on residual recurrent neural network (R2N2) prediction model and adaptive judgment threshold. Specifically, considering the data contains both linear and nonlinear components, the R2N2 model divides the prediction process into two parts: the first part uses the linear model to fit the state data; the second part predicts the nonlinearity of the residuals of the linear prediction model. The adaptive judgment threshold is inferred through fitting the Weibull distribution with the sum of squared errors between the predicted values and observed values. The thorough simulation results demonstrate that our scheme performs better than other prediction based FDIA detection schemes.
2020-10-12
Granatyr, Jones, Gomes, Heitor Murilo, Dias, João Miguel, Paiva, Ana Maria, Nunes, Maria Augusta Silveira Netto, Scalabrin, Edson Emílio, Spak, Fábio.  2019.  Inferring Trust Using Personality Aspects Extracted from Texts. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :3840–3846.
Trust mechanisms are considered the logical protection of software systems, preventing malicious people from taking advantage or cheating others. Although these concepts are widely used, most applications in this field do not consider affective aspects to aid in trust computation. Researchers of Psychology, Neurology, Anthropology, and Computer Science argue that affective aspects are essential to human's decision-making processes. So far, there is a lack of understanding about how these aspects impact user's trust, particularly when they are inserted in an evaluation system. In this paper, we propose a trust model that accounts for personality using three personality models: Big Five, Needs, and Values. We tested our approach by extracting personality aspects from texts provided by two online human-fed evaluation systems and correlating them to reputation values. The empirical experiments show statistically significant better results in comparison to non-personality-wise approaches.
2020-10-06
Dattana, Vishal, Gupta, Kishu, Kush, Ashwani.  2019.  A Probability based Model for Big Data Security in Smart City. 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC). :1—6.

Smart technologies at hand have facilitated generation and collection of huge volumes of data, on daily basis. It involves highly sensitive and diverse data like personal, organisational, environment, energy, transport and economic data. Data Analytics provide solution for various issues being faced by smart cities like crisis response, disaster resilience, emergence management, smart traffic management system etc.; it requires distribution of sensitive data among various entities within or outside the smart city,. Sharing of sensitive data creates a need for efficient usage of smart city data to provide smart applications and utility to the end users in a trustworthy and safe mode. This shared sensitive data if get leaked as a consequence can cause damage and severe risk to the city's resources. Fortification of critical data from unofficial disclosure is biggest issue for success of any project. Data Leakage Detection provides a set of tools and technology that can efficiently resolves the concerns related to smart city critical data. The paper, showcase an approach to detect the leakage which is caused intentionally or unintentionally. The model represents allotment of data objects between diverse agents using Bigraph. The objective is to make critical data secure by revealing the guilty agent who caused the data leakage.

2020-09-28
Liu, Kai, Zhou, Yun, Wang, Qingyong, Zhu, Xianqiang.  2019.  Vulnerability Severity Prediction With Deep Neural Network. 2019 5th International Conference on Big Data and Information Analytics (BigDIA). :114–119.
High frequency of network security incidents has also brought a lot of negative effects and even huge economic losses to countries, enterprises and individuals in recent years. Therefore, more and more attention has been paid to the problem of network security. In order to evaluate the newly included vulnerability text information accurately, and to reduce the workload of experts and the false negative rate of the traditional method. Multiple deep learning methods for vulnerability text classification evaluation are proposed in this paper. The standard Cross Site Scripting (XSS) vulnerability text data is processed first, and then classified using three kinds of deep neural networks (CNN, LSTM, TextRCNN) and one kind of traditional machine learning method (XGBoost). The dropout ratio of the optimal CNN network, the epoch of all deep neural networks and training set data were tuned via experiments to improve the fit on our target task. The results show that the deep learning methods evaluate vulnerability risk levels better, compared with traditional machine learning methods, but cost more time. We train our models in various training sets and test with the same testing set. The performance and utility of recurrent convolutional neural networks (TextRCNN) is highest in comparison to all other methods, which classification accuracy rate is 93.95%.
Chen, Yuqi, Poskitt, Christopher M., Sun, Jun.  2018.  Learning from Mutants: Using Code Mutation to Learn and Monitor Invariants of a Cyber-Physical System. 2018 IEEE Symposium on Security and Privacy (SP). :648–660.
Cyber-physical systems (CPS) consist of sensors, actuators, and controllers all communicating over a network; if any subset becomes compromised, an attacker could cause significant damage. With access to data logs and a model of the CPS, the physical effects of an attack could potentially be detected before any damage is done. Manually building a model that is accurate enough in practice, however, is extremely difficult. In this paper, we propose a novel approach for constructing models of CPS automatically, by applying supervised machine learning to data traces obtained after systematically seeding their software components with faults ("mutants"). We demonstrate the efficacy of this approach on the simulator of a real-world water purification plant, presenting a framework that automatically generates mutants, collects data traces, and learns an SVM-based model. Using cross-validation and statistical model checking, we show that the learnt model characterises an invariant physical property of the system. Furthermore, we demonstrate the usefulness of the invariant by subjecting the system to 55 network and code-modification attacks, and showing that it can detect 85% of them from the data logs generated at runtime.