Visible to the public Biblio

Filters: Keyword is linear regression  [Clear All Filters]
2023-09-20
Zhang, Chengzhao, Tang, Huiyue.  2022.  Empirical Research on Multifactor Quantitative Stock Selection Strategy Based on Machine Learning. 2022 3rd International Conference on Pattern Recognition and Machine Learning (PRML). :380—383.
In this paper, stock selection strategy design based on machine learning and multi-factor analysis is a research hotspot in quantitative investment field. Four machine learning algorithms including support vector machine, gradient lifting regression, random forest and linear regression are used to predict the rise and fall of stocks by taking stock fundamentals as input variables. The portfolio strategy is constructed on this basis. Finally, the stock selection strategy is further optimized. The empirical results show that the multifactor quantitative stock selection strategy has a good stock selection effect, and yield performance under the support vector machine algorithm is the best. With the increase of the number of factors, there is an inverse relationship between the fitting degree and the yield under various algorithms.
2023-06-16
Ren, Lijuan, Wang, Tao, Seklouli, Aicha Sekhari, Zhang, Haiqing, Bouras, Abdelaziz.  2022.  Missing Values for Classification of Machine Learning in Medical data. 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD). :101—106.
Missing values are an unavoidable problem for classification tasks of machine learning in medical data. With the rapid development of the medical system, large scale medical data is increasing. Missing values increase the difficulty of mining hidden but useful information in these medical datasets. Deletion and imputation methods are the most popular methods for dealing with missing values. Existing studies ignored to compare and discuss the deletion and imputation methods of missing values under the row missing rate and the total missing rate. Meanwhile, they rarely used experiment data sets that are mixed-type and large scale. In this work, medical data sets of various sizes and mixed-type are used. At the same time, performance differences of deletion and imputation methods are compared under the MCAR (Missing Completely At Random) mechanism in the baseline task using LR (Linear Regression) and SVM (Support Vector Machine) classifiers for classification with the same row and total missing rates. Experimental results show that under the MCAR missing mechanism, the performance of two types of processing methods is related to the size of datasets and missing rates. As the increasing of missing rate, the performance of two types for processing missing values decreases, but the deletion method decreases faster, and the imputation methods based on machine learning have more stable and better classification performance on average. In addition, small data sets are easily affected by processing methods of missing values.
2022-08-26
Gisin, Vladimir B., Volkova, Elena S..  2021.  Secure Outsourcing of Fuzzy Linear Regression in Cloud Computing. 2021 XXIV International Conference on Soft Computing and Measurements (SCM). :172—174.
There are problems in which the use of linear regression is not sufficiently justified. In these cases, fuzzy linear regression can be used as a modeling tool. The problem of constructing a fuzzy linear regression can usually be reduced to a linear programming problem. One of the features of the resulting linear programming problem is that it uses a relatively large number of constraints in the form of inequalities with a relatively small number of variables. It is known that the problem of constructing a fuzzy linear regression is reduced to the problem of linear programming. If the user does not have enough computing power the resulting problem can be transferred to the cloud server. Two approaches are used for the confidential transfer of the problem to the server: the approach based on cryptographic encryption, and the transformational approach. The paper describes a protocol based on the transformational approach that allows for secure outsourcing of fuzzy linear regression.
2022-07-15
Giesser, Patrick, Stechschulte, Gabriel, Costa Vaz, Anna da, Kaufmann, Michael.  2021.  Implementing Efficient and Scalable In-Database Linear Regression in SQL. 2021 IEEE International Conference on Big Data (Big Data). :5125—5132.
Relational database management systems not only support larger-than-memory data processing and very advanced query optimization, but also offer the benefits of data security, privacy, and consistency. When machine learning on large data sets is processed directly on an existing SQL database server, the data does not need to be exported and transferred to a separate big data processing platform. To achieve this, we implement a linear regression algorithm using SQL code generation such that the computation can be performed server-side and directly in the RDBMs. Our method and its implementation, programmed in Python, solves linear regression (LR) using the ordinary least squares (OLS) method directly in the RDBMS using SQL code generation, leaving most of the processing in the database. Only the matrix of the system of equations, whose size is equal to the number of variables squared, is transferred from the SQL server to the Python client to be solved for OLS regression. For evaluation purposes, our LR implementation was tested with artificially generated datasets and compared to an existing Python library (Scikit Learn). We found that our implementation consistently solves OLS regression faster than Scikit Learn for datasets with more than 10,000 input rows, and if the number of columns is less than 64. Moreover, under the same test conditions where the computation is larger than memory, our implementation showed a fast result, while Scikit returned an out-of-memory error. We conclude that SQL is a promising tool for in-database processing of large-volume, low-dimensional data sets with a particular class of machine learning algorithms, namely those that can be efficiently solved with map-reduce queries such as OLS regression.
Yuan, Rui, Wang, Xinna, Xu, Jiangmin, Meng, Shunmei.  2021.  A Differential-Privacy-based hybrid collaborative recommendation method with factorization and regression. 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :389—396.
Recommender systems have been proved to be effective techniques to provide users with better experiences. However, when a recommender knows the user's preference characteristics or gets their sensitive information, then a series of privacy concerns are raised. A amount of solutions in the literature have been proposed to enhance privacy protection degree of recommender systems. Although the existing solutions have enhanced the protection, they led to a decrease in recommendation accuracy simultaneously. In this paper, we propose a security-aware hybrid recommendation method by combining the factorization and regression techniques. Specifically, the differential privacy mechanism is integrated into data pre-processing for data encryption. Firstly data are perturbed to satisfy differential privacy and transported to the recommender. Then the recommender calculates the aggregated data. However, applying differential privacy raises utility issues of low recommendation accuracy, meanwhile the use of a single model may cause overfitting. In order to tackle this challenge, we adopt a fusion prediction model by combining linear regression (LR) and matrix factorization (MF) for collaborative recommendation. With the MovieLens dataset, we evaluate the recommendation accuracy and regression of our recommender system and demonstrate that our system performs better than the existing recommender system under privacy requirement.
2021-12-22
Zhang, Yuyi, Xu, Feiran, Zou, Jingying, Petrosian, Ovanes L., Krinkin, Kirill V..  2021.  XAI Evaluation: Evaluating Black-Box Model Explanations for Prediction. 2021 II International Conference on Neural Networks and Neurotechnologies (NeuroNT). :13–16.
The results of evaluating explanations of the black-box model for prediction are presented. The XAI evaluation is realized through the different principles and characteristics between black-box model explanations and XAI labels. In the field of high-dimensional prediction, the black-box model represented by neural network and ensemble models can predict complex data sets more accurately than traditional linear regression and white-box models such as the decision tree model. However, an unexplainable characteristic not only hinders developers from debugging but also causes users mistrust. In the XAI field dedicated to ``opening'' the black box model, effective evaluation methods are still being developed. Within the established XAI evaluation framework (MDMC) in this paper, explanation methods for the prediction can be effectively tested, and the identified explanation method with relatively higher quality can improve the accuracy, transparency, and reliability of prediction.
2021-03-22
Penugonda, S., Yong, S., Gao, A., Cai, K., Sen, B., Fan, J..  2020.  Generic Modeling of Differential Striplines Using Machine Learning Based Regression Analysis. 2020 IEEE International Symposium on Electromagnetic Compatibility Signal/Power Integrity (EMCSI). :226–230.
In this paper, a generic model for a differential stripline is created using machine learning (ML) based regression analysis. A recursive approach of creating various inputs is adapted instead of traditional design of experiments (DoE) approach. This leads to reduction of number of simulations as well as control the data points required for performing simulations. The generic model is developed using 48 simulations. It is comparable to the linear regression model, which is obtained using 1152 simulations. Additionally, a tabular W-element model of a differential stripline is used to take into consideration the frequency-dependent dielectric loss. In order to demonstrate the expandability of this approach, the methodology was applied to two differential pairs of striplines in the frequency range of 10 MHz to 20 GHz.
2021-02-01
Behera, S., Prathuri, J. R..  2020.  Application of Homomorphic Encryption in Machine Learning. 2020 2nd PhD Colloquium on Ethically Driven Innovation and Technology for Society (PhD EDITS). :1–2.
The linear regression is a machine learning algorithm used for prediction. But if the input data is in plaintext form then there is a high probability that the sensitive information will get leaked. To overcome this, here we are proposing a method where the input data is encrypted using Homomorphic encryption. The machine learning algorithm can be used on this encrypted data for prediction while maintaining the privacy and secrecy of the sensitive data. The output from this model will be an encrypted result. This encrypted result will be decrypted using a Homomorphic decryption technique to get the plain text. To determine the accuracy of our result, we will compare it with the result obtained after applying the linear regression algorithm on the plain text.
2020-07-20
Xu, Tangwei, Lu, Xiaozhen, Xiao, Liang, Tang, Yuliang, Dai, Huaiyu.  2019.  Voltage Based Authentication for Controller Area Networks with Reinforcement Learning. ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1–5.
Controller area networks (CANs) are vulnerable to spoofing attacks such as frame falsifying attacks, as electronic control units (ECUs) send and receive messages without any authentication and encryption. In this paper, we propose a physical authentication scheme that exploits the voltage features of the ECU signals on the CAN bus and applies reinforcement learning to choose the authentication mode such as the protection level and test threshold. This scheme enables a monitor node to optimize the authentication mode via trial-and-error without knowing the CAN bus signal model and spoofing model. Experimental results show that the proposed authentication scheme can significantly improve the authentication accuracy and response compared with a benchmark scheme.
2020-04-24
Shuvro, Rezoan A., Das, Pankaz, Hayat, Majeed M., Talukder, Mitun.  2019.  Predicting Cascading Failures in Power Grids using Machine Learning Algorithms. 2019 North American Power Symposium (NAPS). :1—6.
Although there has been notable progress in modeling cascading failures in power grids, few works included using machine learning algorithms. In this paper, cascading failures that lead to massive blackouts in power grids are predicted and classified into no, small, and large cascades using machine learning algorithms. Cascading-failure data is generated using a cascading failure simulator framework developed earlier. The data set includes the power grid operating parameters such as loading level, level of load shedding, the capacity of the failed lines, and the topological parameters such as edge betweenness centrality and the average shortest distance for numerous combinations of two transmission line failures as features. Then several machine learning algorithms are used to classify cascading failures. Further, linear regression is used to predict the number of failed transmission lines and the amount of load shedding during a cascade based on initial feature values. This data-driven technique can be used to generate cascading failure data set for any real-world power grids and hence, power-grid engineers can use this approach for cascade data generation and hence predicting vulnerabilities and enhancing robustness of the grid.
2020-02-10
Tsai, I-Chun, Zhong, Yi, Liu, Fang-Ru, Feng, Jianhua.  2019.  A Novel Security Assessment Method Based on Linear Regression for Logic Locking. 2019 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC). :1–3.
This paper presents a novel logic locking security assessment method based on linear regression, by means of modeling between the distribution probabilities of key-inputs and observable outputs. The algorithm reveals a weakness of the encrypted circuit since the assessment can revoke the key-inputs within several iterations. The experiment result shows that the proposed assessment can be applied to varies of encrypted combinational benchmark circuits, which exceeds 85% of correctness after revoking the encrypted key-inputs.
2018-09-28
Jung, Taebo, Jung, Kangsoo, Park, Sehwa, Park, Seog.  2017.  A noise parameter configuration technique to mitigate detour inference attack on differential privacy. 2017 IEEE International Conference on Big Data and Smart Computing (BigComp). :186–192.

Nowadays, data has become more important as the core resource for the information society. However, with the development of data analysis techniques, the privacy violation such as leakage of sensitive data and personal identification exposure are also increasing. Differential privacy is the technique to satisfy the requirement that any additional information should not be disclosed except information from the database itself. It is well known for protecting the privacy from arbitrary attack. However, recent research argues that there is a several ways to infer sensitive information from data although the differential privacy is applied. One of this inference method is to use the correlation between the data. In this paper, we investigate the new privacy threats using attribute correlation which are not covered by traditional studies and propose a privacy preserving technique that configures the differential privacy's noise parameter to solve this new threat. In the experiment, we show the weaknesses of traditional differential privacy method and validate that the proposed noise parameter configuration method provide a sufficient privacy protection and maintain an accuracy of data utility.

2018-01-23
Acar, A., Celik, Z. B., Aksu, H., Uluagac, A. S., McDaniel, P..  2017.  Achieving Secure and Differentially Private Computations in Multiparty Settings. 2017 IEEE Symposium on Privacy-Aware Computing (PAC). :49–59.

Sharing and working on sensitive data in distributed settings from healthcare to finance is a major challenge due to security and privacy concerns. Secure multiparty computation (SMC) is a viable panacea for this, allowing distributed parties to make computations while the parties learn nothing about their data, but the final result. Although SMC is instrumental in such distributed settings, it does not provide any guarantees not to leak any information about individuals to adversaries. Differential privacy (DP) can be utilized to address this; however, achieving SMC with DP is not a trivial task, either. In this paper, we propose a novel Secure Multiparty Distributed Differentially Private (SM-DDP) protocol to achieve secure and private computations in a multiparty environment. Specifically, with our protocol, we simultaneously achieve SMC and DP in distributed settings focusing on linear regression on horizontally distributed data. That is, parties do not see each others’ data and further, can not infer information about individuals from the final constructed statistical model. Any statistical model function that allows independent calculation of local statistics can be computed through our protocol. The protocol implements homomorphic encryption for SMC and functional mechanism for DP to achieve the desired security and privacy guarantees. In this work, we first introduce the theoretical foundation for the SM-DDP protocol and then evaluate its efficacy and performance on two different datasets. Our results show that one can achieve individual-level privacy through the proposed protocol with distributed DP, which is independently applied by each party in a distributed fashion. Moreover, our results also show that the SM-DDP protocol incurs minimal computational overhead, is scalable, and provides security and privacy guarantees.

2017-12-28
Danesh, W., Rahman, M..  2017.  Linear regression based multi-state logic decomposition approach for efficient hardware implementation. 2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH). :153–154.

Multi-state logic presents a promising avenue for more-than-Moore scaling, since efficient implementation of multi-valued logic (MVL) can significantly reduce switching and interconnection requirements and result in significant benefits compared to binary CMOS. So far, traditional approaches lag behind binary CMOS due to: (a) reliance on logic decomposition approaches [4][5][6] that result in many multi-valued minterms [4], complex polynomials [5], and decision diagrams [6], which are difficult to implement, and (b) emulation of multi-valued computation and communication through binary switches and medium that require data conversion, and large circuits. In this paper, we propose a fundamentally different approach for MVL decomposition, merging concepts from data science and nanoelectronics to tackle the problems, (a) First, we do linear regression on all inputs and outputs of a multivalued function, and find an expression that fits most input and output combinations. For unmatched combinations, we do successive regressions to find linear expressions. Next, using our novel visual pattern matching technique, we find conditions based on input and output conditions to select each expression. These expressions along with associated selection criteria ensure that for all possible inputs of a specific function, correct output can be reached. Our selection of regression model to find linear expressions, coefficients and conditions allow efficient hardware implementation. We discuss an approach for solving problem (b) and show an example of quaternary sum circuit. Our estimates show 65.6% saving of switching components compared with a 4-bit CMOS adder.

2017-11-03
Yang, B., Zhang, T..  2016.  A Scalable Meta-Model for Big Data Security Analyses. 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). :55–60.

This paper proposes a highly scalable framework that can be applied to detect network anomaly at per flow level by constructing a meta-model for a family of machine learning algorithms or statistical data models. The approach is scalable and attainable because raw data needs to be accessed only one time and it will be processed, computed and transformed into a meta-model matrix in a much smaller size that can be resident in the system RAM. The calculation of meta-model matrix can be achieved through disposable updating operations at per row level: once a per-flow information is proceeded, it is no longer needed in calculating the meta-model matrix. While the proposed framework covers both Gaussian and non-Gaussian data, the focus of this work is on the linear regression models. Specifically, a new concept called meta-model sufficient statistics is proposed to analyze a group of models, where exact, not the approximate, results are derived. In addition, the proposed framework can quickly discover an optimal statistical or computer model from a family of candidate models without the need of rescanning the raw dataset. This suggest an extremely efficient and effectively theory and method is possible for big data security analysis.

2017-05-18
Hsu, Daniel, Sabato, Sivan.  2016.  Loss Minimization and Parameter Estimation with Heavy Tails. J. Mach. Learn. Res.. 17:543–582.

This work studies applications and generalizations of a simple estimation technique that provides exponential concentration under heavy-tailed distributions, assuming only bounded low-order moments. We show that the technique can be used for approximate minimization of smooth and strongly convex losses, and specifically for least squares linear regression. For instance, our d-dimensional estimator requires just O(d log(1/δ)) random samples to obtain a constant factor approximation to the optimal least squares loss with probability 1-δ, without requiring the covariates or noise to be bounded or subgaussian. We provide further applications to sparse linear regression and low-rank covariance matrix estimation with similar allowances on the noise and covariate distributions. The core technique is a generalization of the median-of-means estimator to arbitrary metric spaces.