Biblio
Filters: Keyword is low-dimensional subspaces [Clear All Filters]
Stacked K-Means Hashing Quantization for Nearest Neighbor Search. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1—4.
.
2018. Nowadays, with such a huge amount of information available online, one key challenge is how to retrieve target data efficiently. A recent state-of-art solution, k-means hashing (KMH), codes data via a string of binary code obtained by iterative k-means clustering and binary code optimizing. To deal with high dimensional data, KMH divides the space into low-dimensional subspaces, places a hypercube in each subspace and finds its proper location by the mentioned optimizing process. However, the complexity of the optimization increases rapidly when the dimension of the hypercube increases. To address this issue, we propose an improved hashing method stacked k-means hashing (SKMH). The main idea is to increase the approximation by a coarse-to-fine multi-layer lower-dimensional cubes. With these kinds of lower-dimensional cubes, SKMH can achieve a similar approximation ability via a less optimizing time, compared with KMH method using higher-dimensional cubes. Extensive experiments have been conducted on two public databases, demonstrating the performance of our method by some common metrics in fast nearest neighbor search.
Mean-Removed Product Quantization for Approximate Nearest Neighbor Search. 2019 International Conference on Data Mining Workshops (ICDMW). :711—718.
.
2019. Product quantization (PQ) and its variations are popular and attractive in approximate nearest neighbor search (ANN) due to their lower memory usage and faster retrieval speed. PQ decomposes the high-dimensional vector space into several low-dimensional subspaces, and quantizes each sub-vector in their subspaces, separately. Thus, PQ can generate a codebook containing an exponential number of codewords or indices by a Cartesian product of the sub-codebooks from different subspaces. However, when there is large variance in the average amplitude of the components of the data points, directly utilizing the PQ on the data points would result in poor performance. In this paper, we propose a new approach, namely, mean-removed product quantization (MRPQ) to address this issue. In fact, the average amplitude of a data point or the mean of a date point can be regarded as statistically independent of the variation of the vector, that is, of the way the components vary about this average. Then we can learn a separate scalar quantizer of the means of the data points and apply the PQ to their residual vectors. As shown in our comprehensive experiments on four large-scale public datasets, our approach can achieve substantial improvements in terms of Recall and MAP over some known methods. Moreover, our approach is general which can be combined with PQ and its variations.