Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data

Submitted by grigby1 on Mon, 06/11/2018 - 3:43pm

Title	Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data
Publication Type	Conference Paper
Year of Publication	2017
Authors	Yang, C., Li, Z., Qu, W., Liu, Z., Qi, H.
Conference Name	2017 14th International Symposium on Pervasive Systems, Algorithms and Networks 2017 11th International Conference on Frontier of Computer Science and Technology 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC)
Keywords	approximate nearest neighbor search, Clustering algorithms, data handling, data index structure, database indexing, feature extraction, graph theory, grid cells, grid computing, grid-based index structure, high-dimensional data, high-dimensional database, high-dimensional feature vectors, index data, indexing, Internet, large-scale data, large-scale database, massive information, Measurement, Metrics, nearest neighbor search, nearest neighbor search algorithm, Nearest neighbor searches, parallel implementation, Partitioning algorithms, pubcrawl, public databases, Quantization (signal), query processing, retrieval, search algorithms, Search methods, search problems, single machine, synthetic databases
Abstract	The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.
URL	https://ieeexplore.ieee.org/document/8121750/
DOI	10.1109/ISPAN-FCST-ISCC.2017.85
Citation Key	yang_grid-based_2017

Groups:

Science of Security VO