K-nearest Neighbor Search by Random Projection Forests

Submitted by grigby1 on Fri, 05/22/2020 - 2:38pm

Title	K-nearest Neighbor Search by Random Projection Forests
Publication Type	Conference Paper
Year of Publication	2018
Authors	Yan, Donghui, Wang, Yingjie, Wang, Jin, Wang, Honggang, Li, Zhenpeng
Conference Name	2018 IEEE International Conference on Big Data (Big Data)
Date Published	dec
Keywords	Big Data, clustered computers, Clustering algorithms, computational complexity, data mining, Ensemble, ensemble methods, ensemble random projection trees, ensemble size, exponential decay, Forestry, k-nearest neighbor search, k-nearest neighbors, kNN distances, kNN search, learning (artificial intelligence), machine learning, machine learning algorithms, Measurement, Metrics, multicore computers, nearest neighbor search, nearest neighbour methods, pubcrawl, random forests, random projection forests, rpForests, search problems, tree-based methodology, unsupervised learning, Vegetation
Abstract	K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.
DOI	10.1109/BigData.2018.8622307
Citation Key	yan_k-nearest_2018

Groups:

Science of Security VO