Biblio

List
Filter

Found 46 results

Filters: Keyword is nearest neighbour methods [Clear All Filters]

2021-04-08

Sarma, M. S., Srinivas, Y., Abhiram, M., Ullala, L., Prasanthi, M. S., Rao, J. R.. 2017. Insider Threat Detection with Face Recognition and KNN User Classification. 2017 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM). :39—44.

Information Security in cloud storage is a key trepidation with regards to Degree of Trust and Cloud Penetration. Cloud user community needs to ascertain performance and security via QoS. Numerous models have been proposed [2] [3] [6][7] to deal with security concerns. Detection and prevention of insider threats are concerns that also need to be tackled. Since the attacker is aware of sensitive information, threats due to cloud insider is a grave concern. In this paper, we have proposed an authentication mechanism, which performs authentication based on verifying facial features of the cloud user, in addition to username and password, thereby acting as two factor authentication. New QoS has been proposed which is capable of monitoring and detection of insider threats using Machine Learning Techniques. KNN Classification Algorithm has been used to classify users into legitimate, possibly legitimate, possibly not legitimate and not legitimate groups to verify image authenticity to conclude, whether there is any possible insider threat. A threat detection model has also been proposed for insider threats, which utilizes Facial recognition and Monitoring models. Security Method put forth in [6] [7] is honed to include threat detection QoS to earn higher degree of trust from cloud user community. As a recommendation, Threat detection module should be harnessed in private cloud deployments like Defense and Pharma applications. Experimentation has been conducted using open source Machine Learning libraries and results have been attached in this paper.

2021-03-22

Kumar, S. A., Kumar, A., Bajaj, V., Singh, G. K.. 2020. An Improved Fuzzy Min–Max Neural Network for Data Classification. IEEE Transactions on Fuzzy Systems. 28:1910–1924.

Hyperbox classifier is an efficient tool for modern pattern classification problems due to its transparency and rigorous use of Euclidian geometry. Fuzzy min-max (FMM) network efficiently implements the hyperbox classifier, and has been modified several times to yield better classification accuracy. However, the obtained accuracy is not up to the mark. Therefore, in this paper, a new improved FMM (IFMM) network is proposed to increase the accuracy rate. In the proposed IFMM network, a modified constraint is employed to check the expandability of a hyperbox. It also uses semiperimeter of the hyperbox along with k-nearest mechanism to select the expandable hyperbox. In the proposed IFMM, the contraction rules of conventional FMM and enhanced FMM (EFMM) are also modified using semiperimeter of a hyperbox in order to balance the size of both overlapped hyperboxes. Experimental results show that the proposed IFMM network outperforms the FMM, k-nearest FMM, and EFMM by yielding more accuracy rate with less number of hyperboxes. The proposed methods are also applied to histopathological images to know the best magnification factor for classification.

2021-03-09

Herrera, A. E. Hinojosa, Walshaw, C., Bailey, C.. 2020. Improving Black Box Classification Model Veracity for Electronics Anomaly Detection. 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA). :1092–1097.

Data driven classification models are useful to assess quality of manufactured electronics. Because decisions are taken based on the models, their veracity is relevant, covering aspects such as accuracy, transparency and clarity. The proposed BB-Stepwise algorithm aims to improve the classification model transparency and accuracy of black box models. K-Nearest Neighbours (KNN) is a black box model which is easy to implement and has achieved good classification performance in different applications. In this paper KNN-Stepwise is illustrated for fault detection of electronics devices. The results achieved shows that the proposed algorithm was able to improve the accuracy, veracity and transparency of KNN models and achieve higher transparency and clarity, and at least similar accuracy than when using Decision Tree models.

2021-02-22

Li, M., Zhang, Y., Sun, Y., Wang, W., Tsang, I. W., Lin, X.. 2020. I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :289–300.

Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.

Chen, T., Lin, T., Hong, Y.- P.. 2020. Gait Phase Segmentation Using Weighted Dynamic Time Warping and K-Nearest Neighbors Graph Embedding. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :1180–1184.

Gait phase segmentation is the process of identifying the start and end of different phases within a gait cycle. It is essential to many medical applications, such as disease diagnosis or rehabilitation. This work utilizes inertial measurement units (IMUs) mounted on the individual's foot to gather gait information and develops a gait phase segmentation method based on the collected signals. The proposed method utilizes a weighted dynamic time warping (DTW) algorithm to measure the distance between two different gait signals, and a k-nearest neighbors (kNN) algorithm to obtain the gait phase estimates. To reduce the complexity of the DTW-based kNN search, we propose a neural network-based graph embedding scheme that is able to map the IMU signals associated with each gait cycle into a distance-preserving low-dimensional representation while also producing a prediction on the k nearest neighbors of the test signal. Experiments are conducted on self-collected IMU gait signals to demonstrate the effectiveness of the proposed scheme.

Fang, S., Kennedy, S., Wang, C., Wang, B., Pei, Q., Liu, X.. 2020. Sparser: Secure Nearest Neighbor Search with Space-filling Curves. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :370–375.

Nearest neighbor search, a classic way of identifying similar data, can be applied to various areas, including database, machine learning, natural language processing, software engineering, etc. Secure nearest neighbor search aims to find nearest neighbors to a given query point over encrypted data without accessing data in plaintext. It provides privacy protection to datasets when nearest neighbor queries need to be operated by an untrusted party (e.g., a public server). While different solutions have been proposed to support nearest neighbor queries on encrypted data, these existing solutions still encounter critical drawbacks either in efficiency or privacy. In light of the limitations in the current literature, we propose a novel approximate nearest neighbor search solution, referred to as Sparser, by leveraging a combination of space-filling curves, perturbation, and Order-Preserving Encryption. The advantages of Sparser are twofold, strengthening privacy and improving efficiency. Specifically, Sparser pre-processes plaintext data with space-filling curves and perturbation, such that data is sparse, which mitigates leakage abuse attacks and renders stronger privacy. In addition to privacy enhancement, Sparser can efficiently find approximate nearest neighbors over encrypted data with logarithmic time. Through extensive experiments over real-world datasets, we demonstrate that Sparser can achieve strong privacy protection under leakage abuse attacks and minimize search time.

Han, Z., Wang, F., Li, Z.. 2020. Research on Nearest Neighbor Data Association Algorithm Based on Target “Dynamic” Monitoring Model. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 1:665–668.

In order to solve the problem that the Nearest Neighbor Data Association (NNDA) algorithm cannot detect the “dynamic” change of the target, this paper proposes the nearest neighbor data association algorithm based on the Targets “Dynamic” Monitoring Model (TDMM). Firstly, the gate searching and updating of targets are completed based on TDMM, then the NNDA algorithm is utilized to achieve the data association of targets to realize track updating. Finally, the NNDA algorithm based on TDMM is realized by simulation. The experimental results show that the algorithm proposed can achieve “dynamic” monitoring in multi-target data association, and have more obvious advantages than Multiple Hypothesis Tracking (MHT) in timeliness and association performance.

Lei, X., Tu, G.-H., Liu, A. X., Xie, T.. 2020. Fast and Secure kNN Query Processing in Cloud Computing. 2020 IEEE Conference on Communications and Network Security (CNS). :1–9.

Advances in sensing and tracking technology lead to the proliferation of location-based services. Location service providers (LSPs) often resort to commercial public clouds to store the tremendous geospatial data and process location-based queries from data users. To protect the privacy of LSP's geospatial data and data user's query location against the untrusted cloud, they are required to be encrypted before sending to the cloud. Nevertheless, it is not easy to design a fast and secure location-based query processing scheme over the encrypted data. In this paper, we propose a Fast and Secure kNN (FSkNN) scheme to support secure k nearest neighbor (k NN) search in cloud computing. We reveal the inherent connection between an Sk NN protocol and a secure range query protocol and further describe how to construct FSkNN based on a secure range query protocol. FSkNN leverages a customized accuracy-assured strategy to ensure the result accuracy and adopts a data structure named random Bloom filter (RBF) to build a secure index for efficiently searching. We formally prove the security of FSkNN under the random oracle model. Our evaluation results show that FSkNN is highly practical.

Kornaropoulos, E. M., Papamanthou, C., Tamassia, R.. 2020. The State of the Uniform: Attacks on Encrypted Databases Beyond the Uniform Query Distribution. 2020 IEEE Symposium on Security and Privacy (SP). :1223–1240.

Recent foundational work on leakage-abuse attacks on encrypted databases has broadened our understanding of what an adversary can accomplish with a standard leakage profile. Nevertheless, all known value reconstruction attacks succeed under strong assumptions that may not hold in the real world. The most prevalent assumption is that queries are issued uniformly at random by the client. We present the first value reconstruction attacks that succeed without any knowledge about the query or data distribution. Our approach uses the search-pattern leakage, which exists in all known structured encryption schemes but has not been fully exploited so far. At the core of our method lies a support size estimator, a technique that utilizes the repetition of search tokens with the same response to estimate distances between encrypted values without any assumptions about the underlying distribution. We develop distribution-agnostic reconstruction attacks for both range queries and k-nearest-neighbor (k-NN) queries based on information extracted from the search-pattern leakage. Our new range attack follows a different algorithmic approach than state-of-the-art attacks, which are fine-tuned to succeed under the uniformly distributed queries. Instead, we reconstruct plaintext values under a variety of skewed query distributions and even outperform the accuracy of previous approaches under the uniform query distribution. Our new k-NN attack succeeds with far fewer samples than previous attacks and scales to much larger values of k. We demonstrate the effectiveness of our attacks by experimentally testing them on a wide range of query distributions and database densities, both unknown to the adversary.

Oliver, J., Ali, M., Hagen, J.. 2020. HAC-T and Fast Search for Similarity in Security. 2020 International Conference on Omni-layer Intelligent Systems (COINS). :1–7.

Similarity digests have gained popularity for many security applications like blacklisting/whitelisting, and finding similar variants of malware. TLSH has been shown to be particularly good at hunting similar malware, and is resistant to evasion as compared to other similarity digests like ssdeep and sdhash. Searching and clustering are fundamental tools which help the security analysts and security operations center (SOC) operators in hunting and analyzing malware. Current approaches which aim to cluster malware are not scalable enough to keep up with the vast amount of malware and goodware available in the wild. In this paper, we present techniques which allow for fast search and clustering of TLSH hash digests which can aid analysts to inspect large amounts of malware/goodware. Our approach builds on fast nearest neighbor search techniques to build a tree-based index which performs fast search based on TLSH hash digests. The tree-based index is used in our threshold based Hierarchical Agglomerative Clustering (HAC-T) algorithm which is able to cluster digests in a scalable manner. Our clustering technique can cluster digests in O (n logn) time on average. We performed an empirical evaluation by comparing our approach with many standard and recent clustering techniques. We demonstrate that our approach is much more scalable and still is able to produce good cluster quality. We measured cluster quality using purity on 10 million samples obtained from VirusTotal. We obtained a high purity score in the range from 0.97 to 0.98 using labels from five major anti-virus vendors (Kaspersky, Microsoft, Symantec, Sophos, and McAfee) which demonstrates the effectiveness of the proposed method.

2021-02-16

Başkaya, D., Samet, R.. 2020. DDoS Attacks Detection by Using Machine Learning Methods on Online Systems. 2020 5th International Conference on Computer Science and Engineering (UBMK). :52—57.

DDoS attacks impose serious threats to many large or small organizations; therefore DDoS attacks have to be detected as soon as possible. In this study, a methodology to detect DDoS attacks is proposed and implemented on online systems. In the scope of the proposed methodology, Multi Layer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN), C-Support Vector Machine (SVC) machine learning methods are used with scaling and feature reduction preprocessing methods and then effects of preprocesses on detection accuracy rates of HTTP (Hypertext Transfer Protocol) flood, TCP SYN (Transport Control Protocol Synchronize) flood, UDP (User Datagram Protocol) flood and ICMP (Internet Control Message Protocol) flood DDoS attacks are analyzed. Obtained results showed that DDoS attacks can be detected with high accuracy of 99.2%.

2021-02-01

Zhang, Y., Liu, Y., Chung, C.-L., Wei, Y.-C., Chen, C.-H.. 2020. Machine Learning Method Based on Stream Homomorphic Encryption Computing. 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). :1–2.

This study proposes a machine learning method based on stream homomorphic encryption computing for improving security and reducing computational time. A case study of mobile positioning based on k nearest neighbors ( kNN) is selected to evaluate the proposed method. The results showed the proposed method can save computational resources than others.

2021-01-25

Arthy, R., Daniel, E., Maran, T. G., Praveen, M.. 2020. A Hybrid Secure Keyword Search Scheme in Encrypted Graph for Social Media Database. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). :1000–1004.

Privacy preservation is a challenging task with the huge amount of data that are available in social media. The data those are stored in the distributed environment or in cloud environment need to ensure confidentiality to data. In addition, representing the voluminous data is graph will be convenient to perform keyword search. The proposed work initially reads the data corresponding to social media and converts that into a graph. In order to prevent the data from the active attacks Advanced Encryption Standard algorithm is used to perform graph encryption. Later, search operation is done using two algorithms: kNK keyword search algorithm and top k nearest keyword search algorithm. The first scheme is used to fetch all the data corresponding to the keyword. The second scheme is used to fetch the nearest neighbor. This scheme increases the efficiency of the search process. Here shortest path algorithm is used to find the minimum distance. Now, based on the minimum value the results are produced. The proposed algorithm shows high performance for graph generation and searching and moderate performance for graph encryption.

2021-01-22

Mani, G., Pasumarti, V., Bhargava, B., Vora, F. T., MacDonald, J., King, J., Kobes, J.. 2020. DeCrypto Pro: Deep Learning Based Cryptomining Malware Detection Using Performance Counters. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). :109—118.

Autonomy in cybersystems depends on their ability to be self-aware by understanding the intent of services and applications that are running on those systems. In case of mission-critical cybersystems that are deployed in dynamic and unpredictable environments, the newly integrated unknown applications or services can either be benign and essential for the mission or they can be cyberattacks. In some cases, these cyberattacks are evasive Advanced Persistent Threats (APTs) where the attackers remain undetected for reconnaissance in order to ascertain system features for an attack e.g. Trojan Laziok. In other cases, the attackers can use the system only for computing e.g. cryptomining malware. APTs such as cryptomining malware neither disrupt normal system functionalities nor trigger any warning signs because they simply perform bitwise and cryptographic operations as any other benign compression or encoding application. Thus, it is difficult for defense mechanisms such as antivirus applications to detect these attacks. In this paper, we propose an Operating Context profiling system based on deep neural networks-Long Short-Term Memory (LSTM) networks-using Windows Performance Counters data for detecting these evasive cryptomining applications. In addition, we propose Deep Cryptomining Profiler (DeCrypto Pro), a detection system with a novel model selection framework containing a utility function that can select a classification model for behavior profiling from both the light-weight machine learning models (Random Forest and k-Nearest Neighbors) and a deep learning model (LSTM), depending on available computing resources. Given data from performance counters, we show that individual models perform with high accuracy and can be trained with limited training data. We also show that the DeCrypto Profiler framework reduces the use of computational resources and accurately detects cryptomining applications by selecting an appropriate model, given the constraints such as data sample size and system configuration.

2021-01-20

Rashid, A., Siddique, M. J., Ahmed, S. M.. 2020. Machine and Deep Learning Based Comparative Analysis Using Hybrid Approaches for Intrusion Detection System. 2020 3rd International Conference on Advancements in Computational Sciences (ICACS). :1—9.

Intrusion detection is one of the most prominent and challenging problem faced by cybersecurity organizations. Intrusion Detection System (IDS) plays a vital role in identifying network security threats. It protects the network for vulnerable source code, viruses, worms and unauthorized intruders for many intranet/internet applications. Despite many open source APIs and tools for intrusion detection, there are still many network security problems exist. These problems are handled through the proper pre-processing, normalization, feature selection and ranking on benchmark dataset attributes prior to the enforcement of self-learning-based classification algorithms. In this paper, we have performed a comprehensive comparative analysis of the benchmark datasets NSL-KDD and CIDDS-001. For getting optimal results, we have used the hybrid feature selection and ranking methods before applying self-learning (Machine / Deep Learning) classification algorithmic approaches such as SVM, Naïve Bayes, k-NN, Neural Networks, DNN and DAE. We have analyzed the performance of IDS through some prominent performance indicator metrics such as Accuracy, Precision, Recall and F1-Score. The experimental results show that k-NN, SVM, NN and DNN classifiers perform approx. 100% accuracy regarding performance evaluation metrics on the NSL-KDD dataset whereas k-NN and Naïve Bayes classifiers perform approx. 99% accuracy on the CIDDS-001 dataset.

2020-12-14

Habibi, G., Surantha, N.. 2020. XSS Attack Detection With Machine Learning and n-Gram Methods. 2020 International Conference on Information Management and Technology (ICIMTech). :516–520.

Cross-Site Scripting (XSS) is an attack most often carried out by attackers to attack a website by inserting malicious scripts into a website. This attack will take the user to a webpage that has been specifically designed to retrieve user sessions and cookies. Nearly 68% of websites are vulnerable to XSS attacks. In this study, the authors conducted a study by evaluating several machine learning methods, namely Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Naïve Bayes (NB). The machine learning algorithm is then equipped with the n-gram method to each script feature to improve the detection performance of XSS attacks. The simulation results show that the SVM and n-gram method achieves the highest accuracy with 98%.

2020-10-29

Priyamvada Davuluru, Venkata Salini, Narayanan Narayanan, Barath, Balster, Eric J.. 2019. Convolutional Neural Networks as Classification Tools and Feature Extractors for Distinguishing Malware Programs. 2019 IEEE National Aerospace and Electronics Conference (NAECON). :273—278.

Classifying malware programs is a research area attracting great interest for Anti-Malware industry. In this research, we propose a system that visualizes malware programs as images and distinguishes those using Convolutional Neural Networks (CNNs). We study the performance of several well-established CNN based algorithms such as AlexNet, ResNet and VGG16 using transfer learning approaches. We also propose a computationally efficient CNN-based architecture for classification of malware programs. In addition, we study the performance of these CNNs as feature extractors by using Support Vector Machine (SVM) and K-nearest Neighbors (kNN) for classification purposes. We also propose fusion methods to boost the performance further. We make use of the publicly available database provided by Microsoft Malware Classification Challenge (BIG 2015) for this study. Our overall performance is 99.4% for a set of 2174 test samples comprising 9 different classes thereby setting a new benchmark.

2020-06-12

[Anonymous]. 2018. Discrete Locally-Linear Preserving Hashing. {2018 25th IEEE International Conference on Image Processing (ICIP). :490—494.

Recently, hashing has attracted considerable attention for nearest neighbor search due to its fast query speed and low storage cost. However, existing unsupervised hashing algorithms have two problems in common. Firstly, the widely utilized anchor graph construction algorithm has inherent limitations in local weight estimation. Secondly, the locally linear structure in the original feature space is seldom taken into account for binary encoding. Therefore, in this paper, we propose a novel unsupervised hashing method, dubbed “discrete locally-linear preserving hashing”, which effectively calculates the adjacent matrix while preserving the locally linear structure in the obtained hash space. Specifically, a novel local anchor embedding algorithm is adopted to construct the approximate adjacent matrix. After that, we directly minimize the reconstruction error with the discrete constrain to learn the binary codes. Experimental results on two typical image datasets indicate that the proposed method significantly outperforms the state-of-the-art unsupervised methods.

2020-06-08

Fang, Bo, Hua, Zhongyun, Huang, Hejiao. 2019. Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket. 2019 14th International Conference on Computer Science Education (ICCSE). :5–10.

Nearest neighbor search (NNS) is one of the current popular research directions, which widely used in machine learning, pattern recognition, image detection and so on. In the low dimension data, based on tree search method can get good results. But when the data dimension goes up, that will produce a curse of dimensional. The proposed Locality-Sensitive Hashing algorithm (LSH) greatly improves the efficiency of nearest neighbor query for high dimensional data. But the algorithm relies on the building a large number of hash table, which makes the space complexity very high. C2LSH based on dynamic collision improves the disadvantage of LSH, but its disadvantage is that it needs to detect the collision times of a large number of data points which Increased query time. Therefore, Based on LSH algorithm, later researchers put forward many improved algorithms, but still not ideal.In this paper, we put forward Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket (HSLSH) algorithm aiming at the shortcomings of LSH and C2LSH. Its main idea is to take advantage of the efficiency of heapsort in massive data sorting to improve the efficiency of nearest neighbor query. It only needs to rely on a small number of hash functions can not only overcome the shortcoming of LSH need to build a large number of hash table, and avoids defects of C2LSH. Experiments show that our algorithm is more than 20% better than C2LSH in query accuracy and 40% percent lower in query time.

2020-05-22

Sheth, Utsav, Dutta, Sanghamitra, Chaudhari, Malhar, Jeong, Haewon, Yang, Yaoqing, Kohonen, Jukka, Roos, Teemu, Grover, Pulkit. 2018. An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation. 2018 IEEE International Conference on Big Data (Big Data). :1113—1120.

We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes (Fahim et al., Allerton'17) that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strategy is Multiple Random Projection Trees (MRPT), which reduces the set of candidate points over which Euclidean distance calculations are performed. However, this may result in a high memory footprint and possibly paging penalties for large or high-dimensional data. Here we propose two techniques to parallelize MRPT that exploit data and model parallelism respectively by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster. This is especially critical when a single compute node cannot hold the complete dataset in memory. We also propose a novel coded computation strategy based on MatDot codes for the model-parallel architecture that, in a straggler-prone environment, achieves the storage-optimal recovery threshold, i.e., the number of nodes that are required to serve a query. We experimentally demonstrate that, in the absence of straggling, our distributed approaches require less query time than execution on a single processing node, providing near-linear speedups with respect to the number of worker nodes. Our experiments on real systems with simulated straggling, we also show that in a straggler-prone environment, our strategy achieves a faster query execution than the uncoded strategy.

Despotovski, Filip, Gusev, Marjan, Zdraveski, Vladimir. 2018. Parallel Implementation of K-Nearest-Neighbors for Face Recognition. 2018 26th Telecommunications Forum (℡FOR). :1—4.

Face recognition is a fast-expanding field of research. Countless classification algorithms have found use in face recognition, with more still being developed, searching for better performance and accuracy. For high-dimensional data such as images, the K-Nearest-Neighbours classifier is a tempting choice. However, it is very computationally-intensive, as it has to perform calculations on all items in the stored dataset for each classification it makes. Fortunately, there is a way to speed up the process by performing some of the calculations in parallel. We propose a parallel CUDA implementation of the KNN classifier and then compare it to a serial implementation to demonstrate its performance superiority.

Varricchio, Valerio, Frazzoli, Emilio. 2018. Asymptotically Optimal Pruning for Nonholonomic Nearest-Neighbor Search. 2018 IEEE Conference on Decision and Control (CDC). :4459—4466.

Nearest-Neighbor Search (NNS) arises as a key component of sampling-based motion planning algorithms and it is known as their asymptotic computational bottleneck. Algorithms for exact Nearest-Neighbor Search rely on explicit distance comparisons to different extents. However, in motion planning, evaluating distances is generally a computationally demanding task, since the metric is induced by the minimum cost of steering a dynamical system between states. In the presence of driftless nonholonomic constraints, we propose efficient pruning techniques for the k-d tree algorithm that drastically reduce the number of distance evaluations performed during a query. These techniques exploit computationally convenient lower and upper bounds to the geodesic distance of the corresponding sub-Riemannian geometry. Based on asymptotic properties of the reachable sets, we show that the proposed pruning techniques are optimal, modulo a constant factor, and we provide experimental results with the Reeds-Shepp vehicle model.

Yan, Donghui, Wang, Yingjie, Wang, Jin, Wang, Honggang, Li, Zhenpeng. 2018. K-nearest Neighbor Search by Random Projection Forests. 2018 IEEE International Conference on Big Data (Big Data). :4775—4781.

K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.

Kang, Hyunjoong, Hong, Sanghyun, Lee, Kookjin, Park, Noseong, Kwon, Soonhyun. 2018. On Integrating Knowledge Graph Embedding into SPARQL Query Processing. 2018 IEEE International Conference on Web Services (ICWS). :371—374.

SPARQL is a standard query language for knowledge graphs (KGs). However, it is hard to find correct answer if KGs are incomplete or incorrect. Knowledge graph embedding (KGE) enables answering queries on such KGs by inferring unknown knowledge and removing incorrect knowledge. Hence, our long-term goal in this line of research is to propose a new framework that integrates KGE and SPARQL, which opens various research problems to be addressed. In this paper, we solve one of the most critical problems, that is, optimizing the performance of nearest neighbor (NN) search. In our evaluations, we demonstrate that the search time of state-of-the-art NN search algorithms is improved by 40% without sacrificing answer accuracy.

Ito, Toshitaka, Itotani, Yuri, Wakabayashi, Shin'ichi, Nagayama, Shinobu, Inagi, Masato. 2018. A Nearest Neighbor Search Engine Using Distance-Based Hashing. 2018 International Conference on Field-Programmable Technology (FPT). :150—157.

This paper proposes an FPGA-based nearest neighbor search engine for high-dimensional data, in which nearest neighbor search is performed based on distance-based hashing. The proposed hardware search engine implements a nearest neighbor search algorithm based on an extension of flexible distance-based hashing (FDH, for short), which finds an exact solution with high probability. The proposed engine is a parallel processing and pipelined circuit so that search results can be obtained in a short execution time. Experimental results show the effectiveness and efficiency of the proposed engine.