Visible to the public A Novel Support Vector Machine Algorithm for Missing Data

TitleA Novel Support Vector Machine Algorithm for Missing Data
Publication TypeConference Paper
Year of Publication2018
AuthorsZhu, Mengeheng, Shi, Hong
Conference NameProceedings of the 2Nd International Conference on Innovation in Artificial Intelligence
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6345-7
Keywordsclassification, composability, distance calculation, Metrics, missing data, pubcrawl, Resiliency, support vector machine, Support vector machines
AbstractMissing data problem often occurs in data analysis. The most common way to solve this problem is imputation. But imputation methods are only suitable for dealing with a low proportion of missing data, when assuming that missing data satisfies MCAR (Missing Completely at Random) or MAR (Missing at Random). In this paper, considering the reasons for missing data, we propose a novel support vector machine method using a new kernel function to solve the problem with a relatively large proportion of missing data. This method makes full use of observed data to reduce the error caused by filling a large number of missing values. We validate our method on 4 data sets from UCI Repository of Machine Learning. The accuracy, F-score, Kappa statistics and recall are used to evaluate the performance. Experimental results show that our method achieve significant improvement in terms of classification results compared with common imputation methods, even when the proportion of missing data is high.
URLhttp://doi.acm.org/10.1145/3194206.3194214
DOI10.1145/3194206.3194214
Citation Keyzhu_novel_2018