Visible to the public Biblio

Found 273 results

Filters: Keyword is Predictive models  [Clear All Filters]
2014
Kholidy, H.A., Erradi, A., Abdelwahed, S., Azab, A..  2014.  A Finite State Hidden Markov Model for Predicting Multistage Attacks in Cloud Systems. Dependable, Autonomic and Secure Computing (DASC), 2014 IEEE 12th International Conference on. :14-19.

Cloud computing significantly increased the security threats because intruders can exploit the large amount of cloud resources for their attacks. However, most of the current security technologies do not provide early warnings about such attacks. This paper presents a Finite State Hidden Markov prediction model that uses an adaptive risk approach to predict multi-staged cloud attacks. The risk model measures the potential impact of a threat on assets given its occurrence probability. The attacks prediction model was integrated with our autonomous cloud intrusion detection framework (ACIDF) to raise early warnings about attacks to the controller so it can take proactive corrective actions before the attacks pose a serious security risk to the system. According to our experiments on DARPA 2000 dataset, the proposed prediction model has successfully fired the early warning alerts 39.6 minutes before the launching of the LLDDoS1.0 attack. This gives the auto response controller ample time to take preventive measures.

Chun-Rong Huang, Chung, P.-C.J., Di-Kai Yang, Hsing-Cheng Chen, Guan-Jie Huang.  2014.  Maximum a Posteriori Probability Estimation for Online Surveillance Video Synopsis. Circuits and Systems for Video Technology, IEEE Transactions on. 24:1417-1429.

To reduce human efforts in browsing long surveillance videos, synopsis videos are proposed. Traditional synopsis video generation applying optimization on video tubes is very time consuming and infeasible for real-time online generation. This dilemma significantly reduces the feasibility of synopsis video generation in practical situations. To solve this problem, the synopsis video generation problem is formulated as a maximum a posteriori probability (MAP) estimation problem in this paper, where the positions and appearing frames of video objects are chronologically rearranged in real time without the need to know their complete trajectories. Moreover, a synopsis table is employed with MAP estimation to decide the temporal locations of the incoming foreground objects in the synopsis video without needing an optimization procedure. As a result, the computational complexity of the proposed video synopsis generation method can be significantly reduced. Furthermore, as it does not require prescreening the entire video, this approach can be applied on online streaming videos.

Jian Sun, Haitao Liao, Upadhyaya, B.R..  2014.  A Robust Functional-Data-Analysis Method for Data Recovery in Multichannel Sensor Systems. Cybernetics, IEEE Transactions on. 44:1420-1431.

Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
 

Ya Zhang, Yi Wei, Jianbiao Ren.  2014.  Multi-touch Attribution in Online Advertising with Survival Theory. Data Mining (ICDM), 2014 IEEE International Conference on. :687-696.

Multi-touch attribution, which allows distributing the credit to all related advertisements based on their corresponding contributions, has recently become an important research topic in digital advertising. Traditionally, rule-based attribution models have been used in practice. The drawback of such rule-based models lies in the fact that the rules are not derived form the data but only based on simple intuition. With the ever enhanced capability to tracking advertisement and users' interaction with the advertisement, data-driven multi-touch attribution models, which attempt to infer the contribution from user interaction data, become an important research direction. We here propose a new data-driven attribution model based on survival theory. By adopting a probabilistic framework, one key advantage of the proposed model is that it is able to remove the presentation biases inherit to most of the other attribution models. In addition to model the attribution, the proposed model is also able to predict user's 'conversion' probability. We validate the proposed method with a real-world data set obtained from a operational commercial advertising monitoring company. Experiment results have shown that the proposed method is quite promising in both conversion prediction and attribution.

Pratanwanich, N., Lio, P..  2014.  Who Wrote This? Textual Modeling with Authorship Attribution in Big Data Data Mining Workshop (ICDMW), 2014 IEEE International Conference on. :645-652.

By representing large corpora with concise and meaningful elements, topic-based generative models aim to reduce the dimension and understand the content of documents. Those techniques originally analyze on words in the documents, but their extensions currently accommodate meta-data such as authorship information, which has been proved useful for textual modeling. The importance of learning authorship is to extract author interests and assign authors to anonymous texts. Author-Topic (AT) model, an unsupervised learning technique, successfully exploits authorship information to model both documents and author interests using topic representations. However, the AT model simplifies that each author has equal contribution on multiple-author documents. To overcome this limitation, we assumes that authors give different degrees of contributions on a document by using a Dirichlet distribution. This automatically transforms the unsupervised AT model to Supervised Author-Topic (SAT) model, which brings a novelty of authorship prediction on anonymous texts. The SAT model outperforms the AT model for identifying authors of documents written by either single authors or multiple authors with a better Receiver Operating Characteristic (ROC) curve and a significantly higher Area Under Curve (AUC). The SAT model not only achieves competitive performance to state-of-the-art techniques e.g. Random forests but also maintains the characteristics of the unsupervised models for information discovery i.e. Word distributions of topics, author interests, and author contributions.
 

Rieke, R., Repp, J., Zhdanova, M., Eichler, J..  2014.  Monitoring Security Compliance of Critical Processes. Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on. :552-560.

Enforcing security in process-aware information systems at runtime requires the monitoring of systems' operation using process information. Analysis of this information with respect to security and compliance aspects is growing in complexity with the increase in functionality, connectivity, and dynamics of process evolution. To tackle this complexity, the application of models is becoming standard practice. Considering today's frequent changes to processes, model-based support for security and compliance analysis is not only needed in pre-operational phases but also at runtime. This paper presents an approach to support evaluation of the security status of processes at runtime. The approach is based on operational formal models derived from process specifications and security policies comprising technical, organizational, regulatory and cross-layer aspects. A process behavior model is synchronized by events from the running process and utilizes prediction of expected close-future states to find possible security violations and allow early decisions on countermeasures. The applicability of the approach is exemplified by a misuse case scenario from a hydroelectric power plant.

Kaci, A., Kamwa, I., Dessaint, L.-A., Guillon, S..  2014.  Phase angles as predictors of network dynamic security limits and further implications. PES General Meeting | Conference Exposition, 2014 IEEE. :1-6.

In the United States, the number of Phasor Measurement Units (PMU) will increase from 166 networked devices in 2010 to 1043 in 2014. According to the Department of Energy, they are being installed in order to “evaluate and visualize reliability margin (which describes how close the system is to the edge of its stability boundary).” However, there is still a lot of debate in academia and industry around the usefulness of phase angles as unambiguous predictors of dynamic stability. In this paper, using 4-year of actual data from Hydro-Québec EMS, it is shown that phase angles enable satisfactory predictions of power transfer and dynamic security margins across critical interface using random forest models, with both explanation level and R-squares accuracy exceeding 99%. A generalized linear model (GLM) is next implemented to predict phase angles from day-ahead to hour-ahead time frames, using historical phase angles values and load forecast. Combining GLM based angles forecast with random forest mapping of phase angles to power transfers result in a new data-driven approach for dynamic security monitoring.
 

Jantsch, A., Tammemae, K..  2014.  A framework of awareness for artificial subjects. Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2014 International Conference on. :1-3.

A small battery driven bio-patch, attached to the human body and monitoring various vital signals such as temperature, humidity, heart activity, muscle and brain activity, is an example of a highly resource constrained system, that has the demanding task to assess correctly the state of the monitored subject (healthy, normal, weak, ill, improving, worsening, etc.), and its own capabilities (attached to subject, working sensors, sufficient energy supply, etc.). These systems and many other systems would benefit from a sense of itself and its environment to improve robustness and sensibility of its behavior. Although we can get inspiration from fields like neuroscience, robotics, AI, and control theory, the tight resource and energy constraints imply that we have to understand accurately what technique leads to a particular feature of awareness, how it contributes to improved behavior, and how it can be implemented cost-efficiently in hardware or software. We review the concepts of environment- and self-models, semantic interpretation, semantic attribution, history, goals and expectations, prediction, and self-inspection, how they contribute to awareness and self-awareness, and how they contribute to improved robustness and sensibility of behavior.

Balakrishnan, R., Parekh, R..  2014.  Learning to predict subject-line opens for large-scale email marketing. Big Data (Big Data), 2014 IEEE International Conference on. :579-584.

Billions of dollars of services and goods are sold through email marketing. Subject lines have a strong influence on open rates of the e-mails, as the consumers often open e-mails based on the subject. Traditionally, the e-mail-subject lines are compiled based on the best assessment of the human editors. We propose a method to help the editors by predicting subject line open rates by learning from past subject lines. The method derives different types of features from subject lines based on keywords, performance of past subject lines and syntax. Furthermore, we evaluate the contribution of individual subject-line keywords to overall open rates based on an iterative method-namely Attribution Scoring - and use this for improved predictions. A random forest based model is trained to combine these features to predict the performance. We use a dataset of more than a hundred thousand different subject lines with many billions of impressions to train and test the method. The proposed method shows significant improvement in prediction accuracy over the baselines for both new as well as already used subject lines.
 

2015
Vizer, L. M., Sears, A..  2015.  Classifying Text-Based Computer Interactions for Health Monitoring. IEEE Pervasive Computing. 14:64–71.

Detecting early trends indicating cognitive decline can allow older adults to better manage their health, but current assessments present barriers precluding the use of such continuous monitoring by consumers. To explore the effects of cognitive status on computer interaction patterns, the authors collected typed text samples from older adults with and without pre-mild cognitive impairment (PreMCI) and constructed statistical models from keystroke and linguistic features for differentiating between the two groups. Using both feature sets, they obtained a 77.1 percent correct classification rate with 70.6 percent sensitivity, 83.3 percent specificity, and a 0.808 area under curve (AUC). These results are in line with current assessments for MC–a more advanced disease–but using an unobtrusive method. This research contributes a combination of features for text and keystroke analysis and enhances understanding of how clinicians or older adults themselves might monitor for PreMCI through patterns in typed text. It has implications for embedded systems that can enable healthcare providers and consumers to proactively and continuously monitor changes in cognitive function.

Kilger, M..  2015.  Integrating Human Behavior Into the Development of Future Cyberterrorism Scenarios. 2015 10th International Conference on Availability, Reliability and Security. :693–700.

The development of future cyber terrorism scenarios is a key component in building a more comprehensive understanding of cyber threats that are likely to emerge in the near-to mid-term future. While developing concepts of likely new, emerging digital technologies is an important part of this process, this article suggests that understanding the psychological and social forces involved in cyber terrorism is also a key component in the analysis and that the synergy of these two dimensions may produce more accurate and detailed future cyber threat scenarios than either analytical element alone.

Geng, J., Ye, D., Luo, P..  2015.  Forecasting severity of software vulnerability using grey model GM(1,1). 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). :344–348.

Vulnerabilities usually represents the risk level of software, and it is of high value to forecast vulnerabilities so as to evaluate the security level of software. Current researches mainly focus on predicting the number of vulnerabilities or the occurrence time of vulnerabilities, however, to our best knowledge, there are no other researches focusing on the prediction of vulnerabilities' severity, which we think is an important aspect reflecting vulnerabilities and software security. To compensate for this deficiency, we borrows the grey model GM(1,1) from grey system theory to forecast the severity of vulnerabilities. The experiment is carried on the real data collected from CVE and proves the feasibility of our predicting method.

G. Kejela, C. Rong.  2015.  "Cross-Device Consumer Identification". 2015 IEEE International Conference on Data Mining Workshop (ICDMW). :1687-1689.

Nowadays, a typical household owns multiple digital devices that can be connected to the Internet. Advertising companies always want to seamlessly reach consumers behind devices instead of the device itself. However, the identity of consumers becomes fragmented as they switch from one device to another. A naive attempt is to use deterministic features such as user name, telephone number and email address. However consumers might refrain from giving away their personal information because of privacy and security reasons. The challenge in ICDM2015 contest is to develop an accurate probabilistic model for predicting cross-device consumer identity without using the deterministic user information. In this paper we present an accurate and scalable cross-device solution using an ensemble of Gradient Boosting Decision Trees (GBDT) and Random Forest. Our final solution ranks 9th both on the public and private LB with F0.5 score of 0.855.

2016
Pang, Y., Xue, X., Namin, A. S..  2016.  Early Identification of Vulnerable Software Components via Ensemble Learning. 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA). :476–481.

Software components, which are vulnerable to being exploited, need to be identified and patched. Employing any prevention techniques designed for the purpose of detecting vulnerable software components in early stages can reduce the expenses associated with the software testing process significantly and thus help building a more reliable and robust software system. Although previous studies have demonstrated the effectiveness of adapting prediction techniques in vulnerability detection, the feasibility of those techniques is limited mainly because of insufficient training data sets. This paper proposes a prediction technique targeting at early identification of potentially vulnerable software components. In the proposed scheme, the potentially vulnerable components are viewed as mislabeled data that may contain true but not yet observed vulnerabilities. The proposed hybrid technique combines the supports vector machine algorithm and ensemble learning strategy to better identify potential vulnerable components. The proposed vulnerability detection scheme is evaluated using some Java Android applications. The results demonstrated that the proposed hybrid technique could identify potentially vulnerable classes with high precision and relatively acceptable accuracy and recall.

Yoshikawa, M., Nozaki, Y..  2016.  Tamper resistance evaluation of PUF in environmental variations. 2016 IEEE Electrical Design of Advanced Packaging and Systems (EDAPS). :119–121.

The damage caused by counterfeits of semiconductors has become a serious problem. Recently, a physical unclonable function (PUF) has attracted attention as a technique to prevent counterfeiting. The present study investigates an arbiter PUF, which is a typical PUF. The vulnerability of a PUF against machine-learning attacks has been revealed. It has also been indicated that the output of a PUF is inverted from its normal output owing to the difference in environmental variations, such as the changes in power supply voltage and temperature. The resistance of a PUF against machine-learning attacks due to the difference in environmental variation has seldom been evaluated. The present study evaluated the resistance of an arbiter PUF against machine-learning attacks due to the difference in environmental variation. By performing an evaluation experiment using a simulation, the present study revealed that the resistance of an arbiter PUF against machine-learning attacks due to environmental variation was slightly improved. However, the present study also successfully predicted more than 95% of the outputs by increasing the number of learning cycles. Therefore, an arbiter PUF was revealed to be vulnerable to machine-learning attacks even after environmental variation.

Yang, B., Zhang, T..  2016.  A Scalable Meta-Model for Big Data Security Analyses. 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). :55–60.

This paper proposes a highly scalable framework that can be applied to detect network anomaly at per flow level by constructing a meta-model for a family of machine learning algorithms or statistical data models. The approach is scalable and attainable because raw data needs to be accessed only one time and it will be processed, computed and transformed into a meta-model matrix in a much smaller size that can be resident in the system RAM. The calculation of meta-model matrix can be achieved through disposable updating operations at per row level: once a per-flow information is proceeded, it is no longer needed in calculating the meta-model matrix. While the proposed framework covers both Gaussian and non-Gaussian data, the focus of this work is on the linear regression models. Specifically, a new concept called meta-model sufficient statistics is proposed to analyze a group of models, where exact, not the approximate, results are derived. In addition, the proposed framework can quickly discover an optimal statistical or computer model from a family of candidate models without the need of rescanning the raw dataset. This suggest an extremely efficient and effectively theory and method is possible for big data security analysis.

2017
Wang, A., Mohaisen, A., Chen, S..  2017.  An Adversary-Centric Behavior Modeling of DDoS Attacks. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). :1126–1136.

Distributed Denial of Service (DDoS) attacks are some of the most persistent threats on the Internet today. The evolution of DDoS attacks calls for an in-depth analysis of those attacks. A better understanding of the attackers' behavior can provide insights to unveil patterns and strategies utilized by attackers. The prior art on the attackers' behavior analysis often falls in two aspects: it assumes that adversaries are static, and makes certain simplifying assumptions on their behavior, which often are not supported by real attack data. In this paper, we take a data-driven approach to designing and validating three DDoS attack models from temporal (e.g., attack magnitudes), spatial (e.g., attacker origin), and spatiotemporal (e.g., attack inter-launching time) perspectives. We design these models based on the analysis of traces consisting of more than 50,000 verified DDoS attacks from industrial mitigation operations. Each model is also validated by testing its effectiveness in accurately predicting future DDoS attacks. Comparisons against simple intuitive models further show that our models can more accurately capture the essential features of DDoS attacks.

Benthall, S..  2017.  Assessing Software Supply Chain Risk Using Public Data. 2017 IEEE 28th Annual Software Technology Conference (STC). :1–5.

The software supply chain is a source of cybersecurity risk for many commercial and government organizations. Public data may be used to inform automated tools for detecting software supply chain risk during continuous integration and deployment. We link data from the National Vulnerability Database (NVD) with open version control data for the open source project OpenSSL, a widely used secure networking library that made the news when a significant vulnerability, Heartbleed, was discovered in 2014. We apply the Alhazmi-Malaiya Logistic (AML) model for software vulnerability discovery to this case. This model predicts a sigmoid cumulative vulnerability discovery function over time. Some versions of OpenSSL do not conform to the predictions of the model because they contain a temporary plateau in the cumulative vulnerability discovery plot. This temporary plateau feature is an empirical signature of a security failure mode that may be useful in future studies of software supply chain risk.

Movahedi, Y., Cukier, M., Andongabo, A., Gashi, I..  2017.  Cluster-Based Vulnerability Assessment Applied to Operating Systems. 2017 13th European Dependable Computing Conference (EDCC). :18–25.

Organizations face the issue of how to best allocate their security resources. Thus, they need an accurate method for assessing how many new vulnerabilities will be reported for the operating systems (OSs) they use in a given time period. Our approach consists of clustering vulnerabilities by leveraging the text information within vulnerability records, and then simulating the mean value function of vulnerabilities by relaxing the monotonic intensity function assumption, which is prevalent among the studies that use software reliability models (SRMs) and nonhomogeneous Poisson process (NHPP) in modeling. We applied our approach to the vulnerabilities of four OSs: Windows, Mac, IOS, and Linux. For the OSs analyzed in terms of curve fitting and prediction capability, our results, compared to a power-law model without clustering issued from a family of SRMs, are more accurate in all cases we analyzed.

Hassan, M., Hamada, M..  2017.  A Computational Model for Improving the Accuracy of Multi-Criteria Recommender Systems. 2017 IEEE 11th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC). :114–119.

Artificial neural networks are complex biologically inspired algorithms made up of highly distributed, adaptive and self-organizing structures that make them suitable for optimization problems. They are made up of a group of interconnected nodes, similar to the great networks of neurons in the human brain. So far, artificial neural networks have not been applied to user modeling in multi-criteria recommender systems. This paper presents neural networks-based user modeling technique that exploits some of the characteristics of biological neurons for improving the accuracy of multi-criteria recommendations. The study was based upon the aggregation function approach that computes the overall rating as a function of the criteria ratings. The proposed technique was evaluated using different evaluation metrics, and the empirical results of the experiments were compared with that of the single rating-based collaborative filtering and two other similarity-based modeling approaches. The two similarity-based techniques used are: the worst-case and the average similarity techniques. The results of the comparative analysis have shown that the proposed technique is more efficient than the two similarity-based techniques and the single rating collaborative filtering technique.

Poon, W. N., Bennin, K. E., Huang, J., Phannachitta, P., Keung, J. W..  2017.  Cross-Project Defect Prediction Using a Credibility Theory Based Naive Bayes Classifier. 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). :434–441.

Several defect prediction models proposed are effective when historical datasets are available. Defect prediction becomes difficult when no historical data exist. Cross-project defect prediction (CPDP), which uses projects from other sources/companies to predict the defects in the target projects proposed in recent studies has shown promising results. However, the performance of most CPDP approaches are still beyond satisfactory mainly due to distribution mismatch between the source and target projects. In this study, a credibility theory based Naïve Bayes (CNB) classifier is proposed to establish a novel reweighting mechanism between the source projects and target projects so that the source data could simultaneously adapt to the target data distribution and retain its own pattern. Our experimental results show that the feasibility of the novel algorithm design and demonstrate the significant improvement in terms of the performance metrics considered achieved by CNB over other CPDP approaches.

Stuckman, J., Walden, J., Scandariato, R..  2017.  The Effect of Dimensionality Reduction on Software Vulnerability Prediction Models. IEEE Transactions on Reliability. 66:17–37.

Statistical prediction models can be an effective technique to identify vulnerable components in large software projects. Two aspects of vulnerability prediction models have a profound impact on their performance: 1) the features (i.e., the characteristics of the software) that are used as predictors and 2) the way those features are used in the setup of the statistical learning machinery. In a previous work, we compared models based on two different types of features: software metrics and term frequencies (text mining features). In this paper, we broaden the set of models we compare by investigating an array of techniques for the manipulation of said features. These techniques fall under the umbrella of dimensionality reduction and have the potential to improve the ability of a prediction model to localize vulnerabilities. We explore the role of dimensionality reduction through a series of cross-validation and cross-project prediction experiments. Our results show that in the case of software metrics, a dimensionality reduction technique based on confirmatory factor analysis provided an advantage when performing cross-project prediction, yielding the best F-measure for the predictions in five out of six cases. In the case of text mining, feature selection can make the prediction computationally faster, but no dimensionality reduction technique provided any other notable advantage.

Sultana, K. Z., Williams, B. J..  2017.  Evaluating micro patterns and software metrics in vulnerability prediction. 2017 6th International Workshop on Software Mining (SoftwareMining). :40–47.

Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics.

Zeng, Y. G..  2017.  Identifying Email Threats Using Predictive Analysis. 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber Security). :1–2.

Malicious emails pose substantial threats to businesses. Whether it is a malware attachment or a URL leading to malware, exploitation or phishing, attackers have been employing emails as an effective way to gain a foothold inside organizations of all kinds. To combat email threats, especially targeted attacks, traditional signature- and rule-based email filtering as well as advanced sandboxing technology both have their own weaknesses. In this paper, we propose a predictive analysis approach that learns the differences between legit and malicious emails through static analysis, creates a machine learning model and makes detection and prediction on unseen emails effectively and efficiently. By comparing three different machine learning algorithms, our preliminary evaluation reveals that a Random Forests model performs the best.

Cai, Y., Huang, H., Cai, H., Qi, Y..  2017.  A K-nearest neighbor locally search regression algorithm for short-term traffic flow forecasting. 2017 9th International Conference on Modelling, Identification and Control (ICMIC). :624–629.

Accurate short-term traffic flow forecasting is of great significance for real-time traffic control, guidance and management. The k-nearest neighbor (k-NN) model is a classic data-driven method which is relatively effective yet simple to implement for short-term traffic flow forecasting. For conventional prediction mechanism of k-NN model, the k nearest neighbors' outputs weighted by similarities between the current traffic flow vector and historical traffic flow vectors is directly used to generate prediction values, so that the prediction results are always not ideal. It is observed that there are always some outliers in k nearest neighbors' outputs, which may have a bad influences on the prediction value, and the local similarities between current traffic flow and historical traffic flows at the current sampling period should have a greater relevant to the prediction value. In this paper, we focus on improving the prediction mechanism of k-NN model and proposed a k-nearest neighbor locally search regression algorithm (k-LSR). The k-LSR algorithm can use locally search strategy to search for optimal nearest neighbors' outputs and use optimal nearest neighbors' outputs weighted by local similarities to forecast short-term traffic flow so as to improve the prediction mechanism of k-NN model. The proposed algorithm is tested on the actual data and compared with other algorithms in performance. We use the root mean squared error (RMSE) as the evaluation indicator. The comparison results show that the k-LSR algorithm is more successful than the k-NN and k-nearest neighbor locally weighted regression algorithm (k-LWR) in forecasting short-term traffic flow, and which prove the superiority and good practicability of the proposed algorithm.