Visible to the public Fast K-means for Large Scale Clustering

TitleFast K-means for Large Scale Clustering
Publication TypeConference Paper
Year of Publication2017
AuthorsHu, Qinghao, Wu, Jiaxiang, Bai, Lu, Zhang, Yifan, Cheng, Jian
Conference NameProceedings of the 2017 ACM on Conference on Information and Knowledge Management
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4918-5
Keywordsclustering, hashing, k-means, Measurement, Metrics, nearest neighbor search, pubcrawl
Abstract

K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even tens of thousands of clusters. In this paper, we propose a fast k-means algorithm named multi-stage k-means (MKM) which uses a multi-stage filtering approach. The multi-stage filtering approach greatly accelerates the k-means algorithm via a coarse-to-fine search strategy. To further speed up the algorithm, hashing is introduced to accelerate the assignment step which is the most time-consuming part in k-means. Extensive experiments on several massive datasets show that the proposed algorithm can obtain up to 600X speed-up over the k-means algorithm with comparable accuracy.

URLhttp://doi.acm.org/10.1145/3132847.3133091
DOI10.1145/3132847.3133091
Citation Keyhu_fast_2017