Fast K-means for Large Scale Clustering

Submitted by grigby1 on Mon, 06/11/2018 - 3:44pm

Title	Fast K-means for Large Scale Clustering
Publication Type	Conference Paper
Year of Publication	2017
Authors	Hu, Qinghao, Wu, Jiaxiang, Bai, Lu, Zhang, Yifan, Cheng, Jian
Conference Name	Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-4503-4918-5
Keywords	clustering, hashing, k-means, Measurement, Metrics, nearest neighbor search, pubcrawl
Abstract	K-means algorithm has been widely used in machine learning and data mining due to its simplicity and good performance. However, the standard k-means algorithm would be quite slow for clustering millions of data into thousands of or even tens of thousands of clusters. In this paper, we propose a fast k-means algorithm named multi-stage k-means (MKM) which uses a multi-stage filtering approach. The multi-stage filtering approach greatly accelerates the k-means algorithm via a coarse-to-fine search strategy. To further speed up the algorithm, hashing is introduced to accelerate the assignment step which is the most time-consuming part in k-means. Extensive experiments on several massive datasets show that the proposed algorithm can obtain up to 600X speed-up over the k-means algorithm with comparable accuracy.
URL	http://doi.acm.org/10.1145/3132847.3133091
DOI	10.1145/3132847.3133091
Citation Key	hu_fast_2017

Groups: