Title | Mean-Removed Product Quantization for Approximate Nearest Neighbor Search |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Yang, Jiacheng, Chen, Bin, Xia, Shu-Tao |
Conference Name | 2019 International Conference on Data Mining Workshops (ICDMW) |
Keywords | approximate nearest neighbor search, Cartesian product, codewords, data point, high-dimensional space, high-dimensional vector space, information retrieval, low-dimensional subspaces, mean-removed product quantization, mean-removed vector quantization, Measurement, memory usage, Metrics, nearest neighbor search, nearest neighbour methods, neural nets, PQ, product quantization, pubcrawl, quantisation (signal), residual vectors, retrieval speed, scalar quantizer, search problems, subspaces, vector quantisation, Vectors |
Abstract | Product quantization (PQ) and its variations are popular and attractive in approximate nearest neighbor search (ANN) due to their lower memory usage and faster retrieval speed. PQ decomposes the high-dimensional vector space into several low-dimensional subspaces, and quantizes each sub-vector in their subspaces, separately. Thus, PQ can generate a codebook containing an exponential number of codewords or indices by a Cartesian product of the sub-codebooks from different subspaces. However, when there is large variance in the average amplitude of the components of the data points, directly utilizing the PQ on the data points would result in poor performance. In this paper, we propose a new approach, namely, mean-removed product quantization (MRPQ) to address this issue. In fact, the average amplitude of a data point or the mean of a date point can be regarded as statistically independent of the variation of the vector, that is, of the way the components vary about this average. Then we can learn a separate scalar quantizer of the means of the data points and apply the PQ to their residual vectors. As shown in our comprehensive experiments on four large-scale public datasets, our approach can achieve substantial improvements in terms of Recall and MAP over some known methods. Moreover, our approach is general which can be combined with PQ and its variations. |
DOI | 10.1109/ICDMW.2019.00107 |
Citation Key | yang_mean-removed_2019 |