Visible to the public Missing Data Imputation Based on Low-rank Recovery and Semi-supervised Regression for Software Effort Estimation

TitleMissing Data Imputation Based on Low-rank Recovery and Semi-supervised Regression for Software Effort Estimation
Publication TypeConference Paper
Year of Publication2016
AuthorsJing, Xiao-Yuan, Qi, Fumin, Wu, Fei, Xu, Baowen
Conference NameProceedings of the 38th International Conference on Software Engineering
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3900-1
KeywordsCollaboration, data deletion, drive factor missing case, effort label missing case, Human Behavior, low-rank recovery and semi-supervised regression imputation (LRSRI), missing data problem, pubcrawl, Scalability, software effort estimation
Abstract

Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.

URLhttp://doi.acm.org/10.1145/2884781.2884827
DOI10.1145/2884781.2884827
Citation Keyjing_missing_2016