Varricchio, Valerio, Frazzoli, Emilio.
2018.
Asymptotically Optimal Pruning for Nonholonomic Nearest-Neighbor Search. 2018 IEEE Conference on Decision and Control (CDC). :4459—4466.
Nearest-Neighbor Search (NNS) arises as a key component of sampling-based motion planning algorithms and it is known as their asymptotic computational bottleneck. Algorithms for exact Nearest-Neighbor Search rely on explicit distance comparisons to different extents. However, in motion planning, evaluating distances is generally a computationally demanding task, since the metric is induced by the minimum cost of steering a dynamical system between states. In the presence of driftless nonholonomic constraints, we propose efficient pruning techniques for the k-d tree algorithm that drastically reduce the number of distance evaluations performed during a query. These techniques exploit computationally convenient lower and upper bounds to the geodesic distance of the corresponding sub-Riemannian geometry. Based on asymptotic properties of the reachable sets, we show that the proposed pruning techniques are optimal, modulo a constant factor, and we provide experimental results with the Reeds-Shepp vehicle model.
Yan, Donghui, Wang, Yingjie, Wang, Jin, Wang, Honggang, Li, Zhenpeng.
2018.
K-nearest Neighbor Search by Random Projection Forests. 2018 IEEE International Conference on Big Data (Big Data). :4775—4781.
K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.
Kang, Hyunjoong, Hong, Sanghyun, Lee, Kookjin, Park, Noseong, Kwon, Soonhyun.
2018.
On Integrating Knowledge Graph Embedding into SPARQL Query Processing. 2018 IEEE International Conference on Web Services (ICWS). :371—374.
SPARQL is a standard query language for knowledge graphs (KGs). However, it is hard to find correct answer if KGs are incomplete or incorrect. Knowledge graph embedding (KGE) enables answering queries on such KGs by inferring unknown knowledge and removing incorrect knowledge. Hence, our long-term goal in this line of research is to propose a new framework that integrates KGE and SPARQL, which opens various research problems to be addressed. In this paper, we solve one of the most critical problems, that is, optimizing the performance of nearest neighbor (NN) search. In our evaluations, we demonstrate that the search time of state-of-the-art NN search algorithms is improved by 40% without sacrificing answer accuracy.
Chen, Yalin, Li, Zhiyang, Shi, Jia, Liu, Zhaobin, Qu, Wenyu.
2018.
Stacked K-Means Hashing Quantization for Nearest Neighbor Search. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1—4.
Nowadays, with such a huge amount of information available online, one key challenge is how to retrieve target data efficiently. A recent state-of-art solution, k-means hashing (KMH), codes data via a string of binary code obtained by iterative k-means clustering and binary code optimizing. To deal with high dimensional data, KMH divides the space into low-dimensional subspaces, places a hypercube in each subspace and finds its proper location by the mentioned optimizing process. However, the complexity of the optimization increases rapidly when the dimension of the hypercube increases. To address this issue, we propose an improved hashing method stacked k-means hashing (SKMH). The main idea is to increase the approximation by a coarse-to-fine multi-layer lower-dimensional cubes. With these kinds of lower-dimensional cubes, SKMH can achieve a similar approximation ability via a less optimizing time, compared with KMH method using higher-dimensional cubes. Extensive experiments have been conducted on two public databases, demonstrating the performance of our method by some common metrics in fast nearest neighbor search.
Ito, Toshitaka, Itotani, Yuri, Wakabayashi, Shin'ichi, Nagayama, Shinobu, Inagi, Masato.
2018.
A Nearest Neighbor Search Engine Using Distance-Based Hashing. 2018 International Conference on Field-Programmable Technology (FPT). :150—157.
This paper proposes an FPGA-based nearest neighbor search engine for high-dimensional data, in which nearest neighbor search is performed based on distance-based hashing. The proposed hardware search engine implements a nearest neighbor search algorithm based on an extension of flexible distance-based hashing (FDH, for short), which finds an exact solution with high probability. The proposed engine is a parallel processing and pipelined circuit so that search results can be obtained in a short execution time. Experimental results show the effectiveness and efficiency of the proposed engine.
Wang, Xi, Yao, Jun, Ji, Hongxia, Zhang, Ze, Li, Chen, Ma, Beizhi.
2018.
A Local Integral Hash Nearest Neighbor Algorithm. 2018 3rd International Conference on Mechanical, Control and Computer Engineering (ICMCCE). :544—548.
Nearest neighbor search algorithm plays a very important role in computer image algorithm. When the search data is large, we need to use fast search algorithm. The current fast retrieval algorithms are tree based algorithms. The efficiency of the tree algorithm decreases sharply with the increase of the data dimension. In this paper, a local integral hash nearest neighbor algorithm of the spatial space is proposed to construct the tree structure by changing the way of the node of the access tree. It is able to express data distribution characteristics. After experimental testing, this paper achieves more efficient performance in high dimensional data.
Vijay, Savinu T., Pournami, P. N..
2018.
Feature Based Image Registration using Heuristic Nearest Neighbour Search. 2018 22nd International Computer Science and Engineering Conference (ICSEC). :1—3.
Image registration is the process of aligning images of the same scene taken at different instances, from different viewpoints or by heterogeneous sensors. This can be achieved either by area based or by feature based image matching techniques. Feature based image registration focuses on detecting relevant features from the input images and attaching descriptors to these features. Matching visual descriptions of two images is a major task in image registration. This feature matching is currently done using Exhaustive Search (or Brute-Force) and Nearest Neighbour Search. The traditional method used for nearest neighbour search is by representing the data as k-d trees. This nearest neighbour search can also be performed using combinatorial optimization algorithms such as Simulated Annealing. This work proposes a method to perform image feature matching by nearest neighbour search done based on Threshold Accepting, a faster version of Simulated Annealing.The experiments performed suggest that the proposed algorithm can produce better results within a minimum number of iterations than many existing algorithms.
Song, Fuyuan, Qin, Zheng, Liu, Qin, Liang, Jinwen, Ou, Lu.
2019.
Efficient and Secure k-Nearest Neighbor Search Over Encrypted Data in Public Cloud. ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1—6.
Cloud computing has become an important and popular infrastructure for data storage and sharing. Typically, data owners outsource their massive data to a public cloud that will provide search services to authorized data users. With privacy concerns, the valuable outsourced data cannot be exposed directly, and should be encrypted before outsourcing to the public cloud. In this paper, we focus on k-Nearest Neighbor (k-NN) search over encrypted data. We propose efficient and secure k-NN search schemes based on matrix similarity to achieve efficient and secure query services in public cloud. In our basic scheme, we construct the traces of two diagonal multiplication matrices to denote the Euclidean distance of two data points, and perform secure k-NN search by comparing traces of corresponding similar matrices. In our enhanced scheme, we strengthen the security property by decomposing matrices based on our basic scheme. Security analysis shows that our schemes protect the data privacy and query privacy under attacking with different levels of background knowledge. Experimental evaluations show that both schemes are efficient in terms of computation complexity as well as computational cost.
Dubey, Abhimanyu, Maaten, Laurens van der, Yalniz, Zeki, Li, Yixuan, Mahajan, Dhruv.
2019.
Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :8759—8768.
A plethora of recent work has shown that convolutional networks are not robust to adversarial images: images that are created by perturbing a sample from the data distribution as to maximize the loss on the perturbed example. In this work, we hypothesize that adversarial perturbations move the image away from the image manifold in the sense that there exists no physical process that could have produced the adversarial image. This hypothesis suggests that a successful defense mechanism against adversarial images should aim to project the images back onto the image manifold. We study such defense mechanisms, which approximate the projection onto the unknown image manifold by a nearest-neighbor search against a web-scale image database containing tens of billions of images. Empirical evaluations of this defense strategy on ImageNet suggest that it very effective in attack settings in which the adversary does not have access to the image database. We also propose two novel attack methods to break nearest-neighbor defense settings and show conditions under which nearest-neighbor defense fails. We perform a series of ablation experiments, which suggest that there is a trade-off between robustness and accuracy between as we use features from deeper in the network, that a large index size (hundreds of millions) is crucial to get good performance, and that careful construction of database is crucial for robustness against nearest-neighbor attacks.
Markchit, Sarawut, Chiu, Chih-Yi.
2019.
Hash Code Indexing in Cross-Modal Retrieval. 2019 International Conference on Content-Based Multimedia Indexing (CBMI). :1—4.
Cross-modal hashing, which searches nearest neighbors across different modalities in the Hamming space, has become a popular technique to overcome the storage and computation barrier in multimedia retrieval recently. Although dozens of cross-modal hashing algorithms are proposed to yield compact binary code representation, applying exhaustive search in a large-scale dataset is impractical for the real-time purpose, and the Hamming distance computation suffers inaccurate results. In this paper, we propose a novel index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme exploits a few binary bits of the hash code as the index code. Based on the index code representation, we construct an inverted index structure to accelerate the retrieval efficiency and train a neural network to improve the indexing accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated by three cross-modal hashing methods. Results show the proposed method effectively boosts the performance over the benchmark datasets and hash methods.
Yang, Jiacheng, Chen, Bin, Xia, Shu-Tao.
2019.
Mean-Removed Product Quantization for Approximate Nearest Neighbor Search. 2019 International Conference on Data Mining Workshops (ICDMW). :711—718.
Product quantization (PQ) and its variations are popular and attractive in approximate nearest neighbor search (ANN) due to their lower memory usage and faster retrieval speed. PQ decomposes the high-dimensional vector space into several low-dimensional subspaces, and quantizes each sub-vector in their subspaces, separately. Thus, PQ can generate a codebook containing an exponential number of codewords or indices by a Cartesian product of the sub-codebooks from different subspaces. However, when there is large variance in the average amplitude of the components of the data points, directly utilizing the PQ on the data points would result in poor performance. In this paper, we propose a new approach, namely, mean-removed product quantization (MRPQ) to address this issue. In fact, the average amplitude of a data point or the mean of a date point can be regarded as statistically independent of the variation of the vector, that is, of the way the components vary about this average. Then we can learn a separate scalar quantizer of the means of the data points and apply the PQ to their residual vectors. As shown in our comprehensive experiments on four large-scale public datasets, our approach can achieve substantial improvements in terms of Recall and MAP over some known methods. Moreover, our approach is general which can be combined with PQ and its variations.
Horzyk, Adrian, Starzyk, Janusz A..
2019.
Associative Data Model in Search for Nearest Neighbors and Similar Patterns. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :933—940.
This paper introduces a biologically inspired associative data model and structure for finding nearest neighbors and similar patterns. The method can be used as an alternative to the classical approaches to accelerate the search for such patterns using various priorities for attributes according to the Sebestyen measure. The presented structure, together with algorithms developed in this paper can be useful in various computational intelligence tasks like pattern matching, recognition, clustering, classification, multi-criterion search etc. This approach is particularly useful for the on-line operation of associative neural network graphs. Graphs that dynamically develop their structure during learning on training data. The results of experiments show that the associative approach can substantially accelerate the nearest neighbor search and that associative structures can also be used as a model for KNN tasks. Finally, this paper presents how the associative structures can be used to self-organize data and represent knowledge about them in the associative way, which yields new search approaches described in this paper.
Ahsan, Ramoza, Bashir, Muzammil, Neamtu, Rodica, Rundensteiner, Elke A., Sarkozy, Gabor.
2019.
Nearest Neighbor Subsequence Search in Time Series Data. 2019 IEEE International Conference on Big Data (Big Data). :2057—2066.
Continuous growth in sensor data and other temporal sequence data necessitates efficient retrieval and similarity search support on these big time series datasets. However, finding exact similarity results, especially at the granularity of subsequences, is known to be prohibitively costly for large data sets. In this paper, we thus propose an efficient framework for solving this exact subsequence similarity match problem, called TINN (TIme series Nearest Neighbor search). Exploiting the range interval diversity properties of time series datasets, TINN captures similarity at two levels of abstraction, namely, relationships among subsequences within each long time series and relationships across distinct time series in the data set. These relationships are compactly organized in an augmented relationship graph model, with the former relationships encoded in similarity vectors at TINN nodes and the later captured by augmented edge types in the TINN Graph. Query processing strategy deploy novel pruning techniques on the TINN Graph, including node skipping, vertical and horizontal pruning, to significantly reduce the number of time series as well as subsequences to be explored. Comprehensive experiments on synthetic and real world time series data demonstrate that our TINN model consistently outperforms state-of-the-art approaches while still guaranteeing to retrieve exact matches.
Abdelhadi, Ameer M.S., Bouganis, Christos-Savvas, Constantinides, George A..
2019.
Accelerated Approximate Nearest Neighbors Search Through Hierarchical Product Quantization. 2019 International Conference on Field-Programmable Technology (ICFPT). :90—98.
A fundamental recurring task in many machine learning applications is the search for the Nearest Neighbor in high dimensional metric spaces. Towards answering queries in large scale problems, state-of-the-art methods employ Approximate Nearest Neighbors (ANN) search, a search that returns the nearest neighbor with high probability, as well as techniques that compress the dataset. Product-Quantization (PQ) based ANN search methods have demonstrated state-of-the-art performance in several problems, including classification, regression and information retrieval. The dataset is encoded into a Cartesian product of multiple low-dimensional codebooks, enabling faster search and higher compression. Being intrinsically parallel, PQ-based ANN search approaches are amendable for hardware acceleration. This paper proposes a novel Hierarchical PQ (HPQ) based ANN search method as well as an FPGA-tailored architecture for its implementation that outperforms current state of the art systems. HPQ gradually refines the search space, reducing the number of data compares and enabling a pipelined search. The mapping of the architecture on a Stratix 10 FPGA device demonstrates over ×250 speedups over current state-of-the-art systems, opening the space for addressing larger datasets and/or improving the query times of current systems.
Rattaphun, Munlika, Prayoonwong, Amorntip, Chiu, Chih- Yi.
2019.
Indexing in k-Nearest Neighbor Graph by Hash-Based Hill-Climbing. 2019 16th International Conference on Machine Vision Applications (MVA). :1—4.
A main issue in approximate nearest neighbor search is to achieve an excellent tradeoff between search accuracy and computation cost. In this paper, we address this issue by leveraging k-nearest neighbor graph and hill-climbing to accelerate vector quantization in the query assignment process. A modified hill-climbing algorithm is proposed to traverse k-nearest neighbor graph to find closest centroids for a query, rather than calculating the query distances to all centroids. Instead of using random seeds in the original hill-climbing algorithm, we generate high-quality seeds based on the hashing technique. It can boost the query assignment efficiency due to a better start-up in hill-climbing. We evaluate the experiment on the benchmarks of SIFT1M and GIST1M datasets, and show the proposed hashing-based seed generation effectively improves the search performance.
Ranjan, G S K, Kumar Verma, Amar, Radhika, Sudha.
2019.
K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). :1—5.
Fault detection in a machine at earlier stage can prevent severe damage and loss to the industries. Fault detection techniques are broadly classified into three categories; signature extraction-based, model-based and knowledge-based approach. Model-based techniques are efficient for raising an alarm signal if there is any fault in the machine. This paper focuses on one such model based-technique to identify the internal faults of induction machine. The model developed is deployed in the end to make it feasible to use in real time. K-Nearest Neighbors (KNN) and grid search cross validation (CV) have been used to train and optimize the model to give the best results. The advantage of proposed algorithm is the accuracy in prediction which has been seen to be 80%. Finally, a user friendly interface has been built using Flask, a python web framework.
Li, Xiaodong.
2019.
DURS: A Distributed Method for k-Nearest Neighbor Search on Uncertain Graphs. 2019 20th IEEE International Conference on Mobile Data Management (MDM). :377—378.
Large graphs are increasingly prevalent in mobile networks, social networks, traffic networks and biological networks. These graphs are often uncertain, where edges are augmented with probabilities that indicates the chance to exist. Recently k-nearest neighbor search has been studied within the field of uncertain graphs, but the scalability and efficiency issues are not well solved. Moreover, solutions are implemented on a single machine and thus cannot fit large uncertain graphs. In this paper, we develop a framework, called DURS, to distribute k-nearest neighbor search into several machines and re-partition the uncertain graphs to balance the work loads and reduce the communication costs. Evaluation results show that DURS is essential to make the system scalable when answering k-nearest neighbor queries on uncertain graphs.