Biblio
Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.