Missing Data Imputation Based on Low-rank Recovery and Semi-supervised Regression for Software Effort Estimation
Title | Missing Data Imputation Based on Low-rank Recovery and Semi-supervised Regression for Software Effort Estimation |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Jing, Xiao-Yuan, Qi, Fumin, Wu, Fei, Xu, Baowen |
Conference Name | Proceedings of the 38th International Conference on Software Engineering |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-3900-1 |
Keywords | Collaboration, data deletion, drive factor missing case, effort label missing case, Human Behavior, low-rank recovery and semi-supervised regression imputation (LRSRI), missing data problem, pubcrawl, Scalability, software effort estimation |
Abstract | Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well. |
URL | http://doi.acm.org/10.1145/2884781.2884827 |
DOI | 10.1145/2884781.2884827 |
Citation Key | jing_missing_2016 |