Visible to the public LSH Forest: Practical Algorithms Made Theoretical

TitleLSH Forest: Practical Algorithms Made Theoretical
Publication TypeConference Paper
Year of Publication2017
AuthorsAndoni, Alexandr, Razenshteyn, Ilya, Nosatzki, Negev Shekel
Conference NameProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms
PublisherSociety for Industrial and Applied Mathematics
Conference LocationPhiladelphia, PA, USA
Keywordscompositionality, hash algorithms, pubcrawl, Resiliency
AbstractWe analyze LSH Forest [BCG05]--a popular heuristic for the nearest neighbor search--and show that a careful yet simple modification of it outperforms "vanilla" LSH algorithms. The end result is the first instance of a simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH. Here is the entire algorithm for the d-dimensional Hamming space. The LSH Forest, for a given dataset, applies a random permutation to all the d coordinates, and builds a trie on the resulting strings. In our modification, we further augment this trie: for each node, we store a constant number of points close to the mean of the corresponding subset of the dataset, which are compared to any query point reaching that node. The overall data structure is simply several such tries sampled independently. While the new algorithm does not quantitatively improve upon the best data-dependent hashing algorithms from [AR15] (which are known to be optimal), it is significantly simpler, being based on a practical heuristic, and is provably better than the best LSH algorithm for the Hamming space [IM98, HIM12].
URLhttp://dl.acm.org/citation.cfm?id=3039686.3039691
Citation Keyandoni_lsh_2017