Density Sensitive Hashing
Title | Density Sensitive Hashing |
Publication Type | Journal Article |
Year of Publication | 2014 |
Authors | Zhongming Jin, Cheng Li, Yue Lin, Deng Cai |
Journal | Cybernetics, IEEE Transactions on |
Volume | 44 |
Pagination | 1362-1371 |
Date Published | Aug |
ISSN | 2168-2267 |
Keywords | Binary codes, clustering, Databases, density sensitive hashing, DSH, Entropy, file organisation, geometric structure, high dimensional nearest neighbor search, locality sensitive hashing, LSH, Nearest neighbor searches, principal component analysis, projective function, Quantization (signal), random projection, search problems, Vectors |
Abstract | Nearest neighbor search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, for example, locality sensitive hashing (LSH), are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables (projections) randomly, a large number of hash tables (i.e., long codewords) are required in order to achieve both high precision and recall. To address this limitation, we propose a novel hashing algorithm called density sensitive hashing (DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSH avoids the purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the state-of-the-art hashing approaches. |
URL | https://ieeexplore.ieee.org/document/6645383/ |
DOI | 10.1109/TCYB.2013.2283497 |
Citation Key | 6645383 |
- locality sensitive hashing
- Vectors
- search problems
- random projection
- Quantization (signal)
- projective function
- principal component analysis
- Nearest neighbor searches
- LSH
- Binary codes
- high dimensional nearest neighbor search
- geometric structure
- file organisation
- Entropy
- DSH
- density sensitive hashing
- Databases
- clustering