I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions

Submitted by aekwall on Mon, 02/22/2021 - 1:46pm

Title	I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions
Publication Type	Conference Paper
Year of Publication	2020
Authors	Li, M., Zhang, Y., Sun, Y., Wang, W., Tsang, I. W., Lin, X.
Conference Name	2020 IEEE 36th International Conference on Data Engineering (ICDE)
Keywords	ANNS, approximate nearest neighbour search, data privacy, data-sensitive hashing-based methods, data-sensitive indexing, external memory, external storage scenarios, file organisation, function approximation, high dimensional space, I/O efficiency, indexing, input-output programs, learned functions, learned hashing, learned index structure, learning (artificial intelligence), Measurement, multimedia database, nearest neighbor search, nearest neighbour methods, neural nets, Predictive Metrics, pubcrawl, query processing, search problems
Abstract	Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.
DOI	10.1109/ICDE48307.2020.00032
Citation Key	li_io_2020

Groups:

Science of Security VO