Biblio

Filters: Keyword is principal component analysis  [Clear All Filters]
2020-10-16
Tong, Weiming, Liu, Bingbing, Li, Zhongwei, Jin, Xianji.  2019.  Intrusion Detection Method of Industrial Control System Based on RIPCA-OCSVM. 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE). :1148—1154.

In view of the problem that the intrusion detection method based on One-Class Support Vector Machine (OCSVM) could not detect the outliers within the industrial data, which results in the decision function deviating from the training sample, an anomaly intrusion detection algorithm based on Robust Incremental Principal Component Analysis (RIPCA) -OCSVM is proposed in this paper. The method uses RIPCA algorithm to remove outliers in industrial data sets and realize dimensionality reduction. In combination with the advantages of OCSVM on the single classification problem, an anomaly detection model is established, and the Improved Particle Swarm Optimization (IPSO) is used for model parameter optimization. The simulation results show that the method can efficiently and accurately identify attacks or abnormal behaviors while meeting the real-time requirements of the industrial control system (ICS).

2020-05-18
Zhou, Wei, Yang, Weidong, Wang, Yan, Zhang, Hong.  2018.  Generalized Reconstruction-Based Contribution for Multiple Faults Diagnosis with Bayesian Decision. 2018 IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS). :813–818.
In fault diagnosis of industrial process, there are usually more than one variable that are faulty. When multiple faults occur, the generalized reconstruction-based contribution can be helpful while traditional RBC may make mistakes. Due to the correlation between the variables, these faults usually propagate to other normal variables, which is called smearing effect. Thus, it is helpful to consider the pervious fault diagnosis results. In this paper, a data-driven fault diagnosis method which is based on generalized RBC and bayesian decision is presented. This method combines multi-dimensional RBC and bayesian decision. The proposed method improves the diagnosis capability of multiple and minor faults with greater noise. A numerical simulation example is given to show the effectiveness and superiority of the proposed method.
Wu, Lan, Su, Sheyan, Wen, Chenglin.  2018.  Multiple Fault Diagnosis Methods Based on Multilevel Multi-Granularity PCA. 2018 International Conference on Control, Automation and Information Sciences (ICCAIS). :566–570.
Principal Component Analysis (PCA) is a basic method of fault diagnosis based on multivariate statistical analysis. It utilizes the linear correlation between multiple process variables to implement process fault diagnosis and has been widely used. Traditional PCA fault diagnosis ignores the impact of faults with different magnitudes on detection accuracy. Based on a variety of data processing methods, this paper proposes a multi-level and multi-granularity principal component analysis method to make the detection results more accurate.
2019-05-08
Meng, F., Lou, F., Fu, Y., Tian, Z..  2018.  Deep Learning Based Attribute Classification Insider Threat Detection for Data Security. 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). :576–581.

With the evolution of network threat, identifying threat from internal is getting more and more difficult. To detect malicious insiders, we move forward a step and propose a novel attribute classification insider threat detection method based on long short term memory recurrent neural networks (LSTM-RNNs). To achieve high detection rate, event aggregator, feature extractor, several attribute classifiers and anomaly calculator are seamlessly integrated into an end-to-end detection framework. Using the CERT insider threat dataset v6.2 and threat detection recall as our performance metric, experimental results validate that the proposed threat detection method greatly outperforms k-Nearest Neighbor, Isolation Forest, Support Vector Machine and Principal Component Analysis based threat detection methods.

2019-10-15
Zhang, F., Deng, Z., He, Z., Lin, X., Sun, L..  2018.  Detection Of Shilling Attack In Collaborative Filtering Recommender System By Pca And Data Complexity. 2018 International Conference on Machine Learning and Cybernetics (ICMLC). 2:673–678.

Collaborative filtering (CF) recommender system has been widely used for its well performing in personalized recommendation, but CF recommender system is vulnerable to shilling attacks in which shilling attack profiles are injected into the system by attackers to affect recommendations. Design robust recommender system and propose attack detection methods are the main research direction to handle shilling attacks, among which unsupervised PCA is particularly effective in experiment, but if we have no information about the number of shilling attack profiles, the unsupervised PCA will be suffered. In this paper, a new unsupervised detection method which combine PCA and data complexity has been proposed to detect shilling attacks. In the proposed method, PCA is used to select suspected attack profiles, and data complexity is used to pick out the authentic profiles from suspected attack profiles. Compared with the traditional PCA, the proposed method could perform well and there is no need to determine the number of shilling attack profiles in advance.

2019-10-30
Borgolte, Kevin, Hao, Shuang, Fiebig, Tobias, Vigna, Giovanni.  2018.  Enumerating Active IPv6 Hosts for Large-Scale Security Scans via DNSSEC-Signed Reverse Zones. 2018 IEEE Symposium on Security and Privacy (SP). :770-784.

Security research has made extensive use of exhaustive Internet-wide scans over the recent years, as they can provide significant insights into the overall state of security of the Internet, and ZMap made scanning the entire IPv4 address space practical. However, the IPv4 address space is exhausted, and a switch to IPv6, the only accepted long-term solution, is inevitable. In turn, to better understand the security of devices connected to the Internet, including in particular Internet of Things devices, it is imperative to include IPv6 addresses in security evaluations and scans. Unfortunately, it is practically infeasible to iterate through the entire IPv6 address space, as it is 2ˆ96 times larger than the IPv4 address space. Therefore, enumeration of active hosts prior to scanning is necessary. Without it, we will be unable to investigate the overall security of Internet-connected devices in the future. In this paper, we introduce a novel technique to enumerate an active part of the IPv6 address space by walking DNSSEC-signed IPv6 reverse zones. Subsequently, by scanning the enumerated addresses, we uncover significant security problems: the exposure of sensitive data, and incorrectly controlled access to hosts, such as access to routing infrastructure via administrative interfaces, all of which were accessible via IPv6. Furthermore, from our analysis of the differences between accessing dual-stack hosts via IPv6 and IPv4, we hypothesize that the root cause is that machines automatically and by default take on globally routable IPv6 addresses. This is a practice that the affected system administrators appear unaware of, as the respective services are almost always properly protected from unauthorized access via IPv4. Our findings indicate (i) that enumerating active IPv6 hosts is practical without a preferential network position contrary to common belief, (ii) that the security of active IPv6 hosts is currently still lagging behind the security state of IPv4 hosts, and (iii) that unintended IPv6 connectivity is a major security issue for unaware system administrators.

2019-09-30
Jiao, Y., Hohlfield, J., Victora, R. H..  2018.  Understanding Transition and Remanence Noise in HAMR. IEEE Transactions on Magnetics. 54:1–5.

Transition noise and remanence noise are the two most important types of media noise in heat-assisted magnetic recording. We examine two methods (spatial splitting and principal components analysis) to distinguish them: both techniques show similar trends with respect to applied field and grain pitch (GP). It was also found that PW50can be affected by GP and reader design, but is almost independent of write field and bit length (larger than 50 nm). Interestingly, our simulation shows a linear relationship between jitter and PW50NSRrem, which agrees qualitatively with experimental results.

2020-10-12
Sharafaldin, Iman, Ghorbani, Ali A..  2018.  EagleEye: A Novel Visual Anomaly Detection Method. 2018 16th Annual Conference on Privacy, Security and Trust (PST). :1–6.
We propose a novel visualization technique (Eagle-Eye) for intrusion detection, which visualizes a host as a commu- nity of system call traces in two-dimensional space. The goal of EagleEye is to visually cluster the system call traces. Although human eyes can easily perceive anomalies using EagleEye view, we propose two different methods called SAM and CPM that use the concept of data depth to help administrators distinguish between normal and abnormal behaviors. Our experimental results conducted on Australian Defence Force Academy Linux Dataset (ADFA-LD), which is a modern system calls dataset that includes new exploits and attacks on various programs, show EagleEye's efficiency in detecting diverse exploits and attacks.
2019-02-13
Feng, Y., Akiyama, H., Lu, L., Sakurai, K..  2018.  Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). :173–180.

It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.

2020-05-08
Shen, Weiguo, Wang, Wei.  2018.  Node Identification in Wireless Network Based on Convolutional Neural Network. 2018 14th International Conference on Computational Intelligence and Security (CIS). :238—241.
Aiming at the problem of node identification in wireless networks, a method of node identification based on deep learning is proposed, which starts with the tiny features of nodes in radiofrequency layer. Firstly, in order to cut down the computational complexity, Principal Component Analysis is used to reduce the dimension of node sample data. Secondly, a convolution neural network containing two hidden layers is designed to extract local features of the preprocessed data. Stochastic gradient descent method is used to optimize the parameters, and the Softmax Model is used to determine the output label. Finally, the effectiveness of the method is verified by experiments on practical wireless ad-hoc network.
2019-03-15
Wang, C., Zhao, S., Wang, X., Luo, M., Yang, M..  2018.  A Neural Network Trojan Detection Method Based on Particle Swarm Optimization. 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). :1-3.

Hardware Trojans (HTs) are malicious modifications of the original circuits intended to leak information or cause malfunction. Based on the Side Channel Analysis (SCA) technology, a set of hardware Trojan detection platform is designed for RTL circuits on the basis of HSPICE power consumption simulation. Principal Component Analysis (PCA) algorithm is used to reduce the dimension of power consumption data. An intelligent neural networks (NN) algorithm based on Particle Swarm Optimization (PSO) is introduced to achieve HTs recognition. Experimental results show that the detection accuracy of PSO NN method is much better than traditional BP NN method.

2017-12-28
Stuckman, J., Walden, J., Scandariato, R..  2017.  The Effect of Dimensionality Reduction on Software Vulnerability Prediction Models. IEEE Transactions on Reliability. 66:17–37.

Statistical prediction models can be an effective technique to identify vulnerable components in large software projects. Two aspects of vulnerability prediction models have a profound impact on their performance: 1) the features (i.e., the characteristics of the software) that are used as predictors and 2) the way those features are used in the setup of the statistical learning machinery. In a previous work, we compared models based on two different types of features: software metrics and term frequencies (text mining features). In this paper, we broaden the set of models we compare by investigating an array of techniques for the manipulation of said features. These techniques fall under the umbrella of dimensionality reduction and have the potential to improve the ability of a prediction model to localize vulnerabilities. We explore the role of dimensionality reduction through a series of cross-validation and cross-project prediction experiments. Our results show that in the case of software metrics, a dimensionality reduction technique based on confirmatory factor analysis provided an advantage when performing cross-project prediction, yielding the best F-measure for the predictions in five out of six cases. In the case of text mining, feature selection can make the prediction computationally faster, but no dimensionality reduction technique provided any other notable advantage.

2018-06-11
Chen, C. W., Chang, S. Y., Hu, Y. C., Chen, Y. W..  2017.  Protecting vehicular networks privacy in the presence of a single adversarial authority. 2017 IEEE Conference on Communications and Network Security (CNS). :1–9.

In vehicular networks, each message is signed by the generating node to ensure accountability for the contents of that message. For privacy reasons, each vehicle uses a collection of certificates, which for accountability reasons are linked at a central authority. One such design is the Security Credential Management System (SCMS) [1], which is the leading credential management system in the US. The SCMS is composed of multiple components, each of which has a different task for key management, which are logically separated. The SCMS is designed to ensure privacy against a single insider compromise, or against outside adversaries. In this paper, we demonstrate that the current SCMS design fails to achieve its design goal, showing that a compromised authority can gain substantial information about certificate linkages. We propose a solution that accommodates threshold-based detection, but uses relabeling and noise to limit the information that can be learned from a single insider adversary. We also analyze our solution using techniques from differential privacy and validate it using traffic-simulator based experiments. Our results show that our proposed solution prevents privacy information leakage against the compromised authority in collusion with outsider attackers.

2018-04-02
Gao, Y., Luo, T., Li, J., Wang, C..  2017.  Research on K Anonymity Algorithm Based on Association Analysis of Data Utility. 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). :426–432.

More and more medical data are shared, which leads to disclosure of personal privacy information. Therefore, the construction of medical data privacy preserving publishing model is of great value: not only to make a non-correspondence between the released information and personal identity, but also to maintain the data utility after anonymity. However, there is an inherent contradiction between the anonymity and the data utility. In this paper, a Principal Component Analysis-Grey Relational Analysis (PCA-GRA) K anonymous algorithm is proposed to improve the data utility effectively under the premise of anonymity, in which the association between quasi-identifiers and the sensitive information is reckoned as a criterion to control the generalization hierarchy. Compared with the previous anonymity algorithms, results show that the proposed PCA-GRA K anonymous algorithm has achieved significant improvement in data utility from three aspects, namely information loss, feature maintenance and classification evaluation performance.

2018-01-16
Najafabadi, M. M., Khoshgoftaar, T. M., Calvert, C., Kemp, C..  2017.  User Behavior Anomaly Detection for Application Layer DDoS Attacks. 2017 IEEE International Conference on Information Reuse and Integration (IRI). :154–161.

Distributed Denial of Service (DDoS) attacks are a popular and inexpensive form of cyber attacks. Application layer DDoS attacks utilize legitimate application layer requests to overwhelm a web server. These attacks are a major threat to Internet applications and web services. The main goal of these attacks is to make the services unavailable to legitimate users by overwhelming the resources on a web server. They look valid in connection and protocol characteristics, which makes them difficult to detect. In this paper, we propose a detection method for the application layer DDoS attacks, which is based on user behavior anomaly detection. We extract instances of user behaviors requesting resources from HTTP web server logs. We apply the Principle Component Analysis (PCA) subspace anomaly detection method for the detection of anomalous behavior instances. Web server logs from a web server hosting a student resource portal were collected as experimental data. We also generated nine different HTTP DDoS attacks through penetration testing. Our performance results on the collected data show that using PCAsubspace anomaly detection on user behavior data can detect application layer DDoS attacks, even if they are trying to mimic a normal user's behavior at some level.

2018-11-14
Teoh, T. T., Zhang, Y., Nguwi, Y. Y., Elovici, Y., Ng, W. L..  2017.  Analyst Intuition Inspired High Velocity Big Data Analysis Using PCA Ranked Fuzzy K-Means Clustering with Multi-Layer Perceptron (MLP) to Obviate Cyber Security Risk. 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). :1790–1793.
The growing prevalence of cyber threats in the world are affecting every network user. Numerous security monitoring systems are being employed to protect computer networks and resources from falling victim to cyber-attacks. There is a pressing need to have an efficient security monitoring system to monitor the large network datasets generated in this process. A large network datasets representing Malware attacks have been used in this work to establish an expert system. The characteristics of attacker's IP addresses can be extracted from our integrated datasets to generate statistical data. The cyber security expert provides to the weight of each attribute and forms a scoring system by annotating the log history. We adopted a special semi supervise method to classify cyber security log into attack, unsure and no attack by first breaking the data into 3 cluster using Fuzzy K mean (FKM), then manually label a small data (Analyst Intuition) and finally train the neural network classifier multilayer perceptron (MLP) base on the manually labelled data. By doing so, our results is very encouraging as compare to finding anomaly in a cyber security log, which generally results in creating huge amount of false detection. The method of including Artificial Intelligence (AI) and Analyst Intuition (AI) is also known as AI2. The classification results are encouraging in segregating the types of attacks.
2018-01-10
Wrona, K., Amanowicz, M., Szwaczyk, S., Gierłowski, K..  2017.  SDN testbed for validation of cross-layer data-centric security policies. 2017 International Conference on Military Communications and Information Systems (ICMCIS). :1–6.

Software-defined networks offer a promising framework for the implementation of cross-layer data-centric security policies in military systems. An important aspect of the design process for such advanced security solutions is the thorough experimental assessment and validation of proposed technical concepts prior to their deployment in operational military systems. In this paper, we describe an OpenFlow-based testbed, which was developed with a specific focus on validation of SDN security mechanisms - including both the mechanisms for protecting the software-defined network layer and the cross-layer enforcement of higher level policies, such as data-centric security policies. We also present initial experimentation results obtained using the testbed, which confirm its ability to validate simulation and analytic predictions. Our objective is to provide a sufficiently detailed description of the configuration used in our testbed so that it can be easily re-plicated and re-used by other security researchers in their experiments.

2017-11-27
Parate, M., Tajane, S., Indi, B..  2016.  Assessment of System Vulnerability for Smart Grid Applications. 2016 IEEE International Conference on Engineering and Technology (ICETECH). :1083–1088.

The smart grid is an electrical grid that has a duplex communication. This communication is between the utility and the consumer. Digital system, automation system, computers and control are the various systems of Smart Grid. It finds applications in a wide variety of systems. Some of its applications have been designed to reduce the risk of power system blackout. Dynamic vulnerability assessment is done to identify, quantify, and prioritize the vulnerabilities in a system. This paper presents a novel approach for classifying the data into one of the two classes called vulnerable or non-vulnerable by carrying out Dynamic Vulnerability Assessment (DVA) based on some data mining techniques such as Multichannel Singular Spectrum Analysis (MSSA), and Principal Component Analysis (PCA), and a machine learning tool such as Support Vector Machine Classifier (SVM-C) with learning algorithms that can analyze data. The developed methodology is tested in the IEEE 57 bus, where the cause of vulnerability is transient instability. The results show that data mining tools can effectively analyze the patterns of the electric signals, and SVM-C can use those patterns for analyzing the system data as vulnerable or non-vulnerable and determines System Vulnerability Status.

2018-05-14
2017-03-08
Kalina, J., Schlenker, A., Kutílek, P..  2015.  Highly robust analysis of keystroke dynamics measurements. 2015 IEEE 13th International Symposium on Applied Machine Intelligence and Informatics (SAMI). :133–138.

Standard classification procedures of both data mining and multivariate statistics are sensitive to the presence of outlying values. In this paper, we propose new algorithms for computing regularized versions of linear discriminant analysis for data with small sample sizes in each group. Further, we propose a highly robust version of a regularized linear discriminant analysis. The new method denoted as MWCD-L2-LDA is based on the idea of implicit weights assigned to individual observations, inspired by the minimum weighted covariance determinant estimator. Classification performance of the new method is illustrated on a detailed analysis of our pilot study of authentication methods on computers, using individual typing characteristics by means of keystroke dynamics.

Mishra, A., Kumar, K., Rai, S. N., Mittal, V. K..  2015.  Multi-stage face recognition for biometric access. 2015 Annual IEEE India Conference (INDICON). :1–6.

Protecting the privacy of user-identification data is fundamental to protect the information systems from attacks and vulnerabilities. Providing access to such data only to the limited and legitimate users is the key motivation for `Biometrics'. In `Biometric Systems' confirming a user's claim of his/her identity reliably, is more important than focusing on `what he/she really possesses' or `what he/she remembers'. In this paper the use of face image for biometric access is proposed using two multistage face recognition algorithms that employ biometric facial features to validate the user's claim. The proposed algorithms use standard algorithms and classifiers such as EigenFaces, PCA and LDA in stages. Performance evaluation of both proposed algorithms is carried out using two standard datasets, the Extended Yale database and AT&T database. Results using the proposed multi-stage algorithms are better than those using other standard algorithms. Current limitations and possible applications of the proposed algorithms are also discussed along, with further scope of making these robust to pose, illumination and noise variations.

2015-05-06
Zhongming Jin, Cheng Li, Yue Lin, Deng Cai.  2014.  Density Sensitive Hashing. Cybernetics, IEEE Transactions on. 44:1362-1371.

Nearest neighbor search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, for example, locality sensitive hashing (LSH), are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables (projections) randomly, a large number of hash tables (i.e., long codewords) are required in order to achieve both high precision and recall. To address this limitation, we propose a novel hashing algorithm called density sensitive hashing (DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSH avoids the purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the state-of-the-art hashing approaches.

Jian Sun, Haitao Liao, Upadhyaya, B.R..  2014.  A Robust Functional-Data-Analysis Method for Data Recovery in Multichannel Sensor Systems. Cybernetics, IEEE Transactions on. 44:1420-1431.

Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
 

Jian Sun, Haitao Liao, Upadhyaya, B.R..  2014.  A Robust Functional-Data-Analysis Method for Data Recovery in Multichannel Sensor Systems. Cybernetics, IEEE Transactions on. 44:1420-1431.

Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.
 

2015-04-30
Dickerson, J.P., Kagan, V., Subrahmanian, V.S..  2014.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. :620-627.

In many Twitter applications, developers collect only a limited sample of tweets and a local portion of the Twitter network. Given such Twitter applications with limited data, how can we classify Twitter users as either bots or humans? We develop a collection of network-, linguistic-, and application-oriented variables that could be used as possible features, and identify specific features that distinguish well between humans and bots. In particular, by analyzing a large dataset relating to the 2014 Indian election, we show that a number of sentimentrelated factors are key to the identification of bots, significantly increasing the Area under the ROC Curve (AUROC). The same method may be used for other applications as well.