Fast K-means for Large Scale Clustering
Title | Fast K-means for Large Scale Clustering |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Hu, Qinghao, Wu, Jiaxiang, Bai, Lu, Zhang, Yifan, Cheng, Jian |
Conference Name | Proceedings of the 2017 ACM on Conference on Information and Knowledge Management |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4918-5 |
Keywords | clustering, hashing, k-means, Measurement, Metrics, nearest neighbor search, pubcrawl |
Abstract | K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even tens of thousands of clusters. In this paper, we propose a fast k-means algorithm named multi-stage k-means (MKM) which uses a multi-stage filtering approach. The multi-stage filtering approach greatly accelerates the k-means algorithm via a coarse-to-fine search strategy. To further speed up the algorithm, hashing is introduced to accelerate the assignment step which is the most time-consuming part in k-means. Extensive experiments on several massive datasets show that the proposed algorithm can obtain up to 600X speed-up over the k-means algorithm with comparable accuracy. |
URL | http://doi.acm.org/10.1145/3132847.3133091 |
DOI | 10.1145/3132847.3133091 |
Citation Key | hu_fast_2017 |