Visible to the public Biblio

Filters: Keyword is nearest neighbour methods  [Clear All Filters]
2020-05-22
Wang, Xi, Yao, Jun, Ji, Hongxia, Zhang, Ze, Li, Chen, Ma, Beizhi.  2018.  A Local Integral Hash Nearest Neighbor Algorithm. 2018 3rd International Conference on Mechanical, Control and Computer Engineering (ICMCCE). :544—548.

Nearest neighbor search algorithm plays a very important role in computer image algorithm. When the search data is large, we need to use fast search algorithm. The current fast retrieval algorithms are tree based algorithms. The efficiency of the tree algorithm decreases sharply with the increase of the data dimension. In this paper, a local integral hash nearest neighbor algorithm of the spatial space is proposed to construct the tree structure by changing the way of the node of the access tree. It is able to express data distribution characteristics. After experimental testing, this paper achieves more efficient performance in high dimensional data.

Vijay, Savinu T., Pournami, P. N..  2018.  Feature Based Image Registration using Heuristic Nearest Neighbour Search. 2018 22nd International Computer Science and Engineering Conference (ICSEC). :1—3.
Image registration is the process of aligning images of the same scene taken at different instances, from different viewpoints or by heterogeneous sensors. This can be achieved either by area based or by feature based image matching techniques. Feature based image registration focuses on detecting relevant features from the input images and attaching descriptors to these features. Matching visual descriptions of two images is a major task in image registration. This feature matching is currently done using Exhaustive Search (or Brute-Force) and Nearest Neighbour Search. The traditional method used for nearest neighbour search is by representing the data as k-d trees. This nearest neighbour search can also be performed using combinatorial optimization algorithms such as Simulated Annealing. This work proposes a method to perform image feature matching by nearest neighbour search done based on Threshold Accepting, a faster version of Simulated Annealing.The experiments performed suggest that the proposed algorithm can produce better results within a minimum number of iterations than many existing algorithms.
Song, Fuyuan, Qin, Zheng, Liu, Qin, Liang, Jinwen, Ou, Lu.  2019.  Efficient and Secure k-Nearest Neighbor Search Over Encrypted Data in Public Cloud. ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1—6.
Cloud computing has become an important and popular infrastructure for data storage and sharing. Typically, data owners outsource their massive data to a public cloud that will provide search services to authorized data users. With privacy concerns, the valuable outsourced data cannot be exposed directly, and should be encrypted before outsourcing to the public cloud. In this paper, we focus on k-Nearest Neighbor (k-NN) search over encrypted data. We propose efficient and secure k-NN search schemes based on matrix similarity to achieve efficient and secure query services in public cloud. In our basic scheme, we construct the traces of two diagonal multiplication matrices to denote the Euclidean distance of two data points, and perform secure k-NN search by comparing traces of corresponding similar matrices. In our enhanced scheme, we strengthen the security property by decomposing matrices based on our basic scheme. Security analysis shows that our schemes protect the data privacy and query privacy under attacking with different levels of background knowledge. Experimental evaluations show that both schemes are efficient in terms of computation complexity as well as computational cost.
Dubey, Abhimanyu, Maaten, Laurens van der, Yalniz, Zeki, Li, Yixuan, Mahajan, Dhruv.  2019.  Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :8759—8768.
A plethora of recent work has shown that convolutional networks are not robust to adversarial images: images that are created by perturbing a sample from the data distribution as to maximize the loss on the perturbed example. In this work, we hypothesize that adversarial perturbations move the image away from the image manifold in the sense that there exists no physical process that could have produced the adversarial image. This hypothesis suggests that a successful defense mechanism against adversarial images should aim to project the images back onto the image manifold. We study such defense mechanisms, which approximate the projection onto the unknown image manifold by a nearest-neighbor search against a web-scale image database containing tens of billions of images. Empirical evaluations of this defense strategy on ImageNet suggest that it very effective in attack settings in which the adversary does not have access to the image database. We also propose two novel attack methods to break nearest-neighbor defense settings and show conditions under which nearest-neighbor defense fails. We perform a series of ablation experiments, which suggest that there is a trade-off between robustness and accuracy between as we use features from deeper in the network, that a large index size (hundreds of millions) is crucial to get good performance, and that careful construction of database is crucial for robustness against nearest-neighbor attacks.
Markchit, Sarawut, Chiu, Chih-Yi.  2019.  Hash Code Indexing in Cross-Modal Retrieval. 2019 International Conference on Content-Based Multimedia Indexing (CBMI). :1—4.

Cross-modal hashing, which searches nearest neighbors across different modalities in the Hamming space, has become a popular technique to overcome the storage and computation barrier in multimedia retrieval recently. Although dozens of cross-modal hashing algorithms are proposed to yield compact binary code representation, applying exhaustive search in a large-scale dataset is impractical for the real-time purpose, and the Hamming distance computation suffers inaccurate results. In this paper, we propose a novel index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme exploits a few binary bits of the hash code as the index code. Based on the index code representation, we construct an inverted index structure to accelerate the retrieval efficiency and train a neural network to improve the indexing accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated by three cross-modal hashing methods. Results show the proposed method effectively boosts the performance over the benchmark datasets and hash methods.

Yang, Jiacheng, Chen, Bin, Xia, Shu-Tao.  2019.  Mean-Removed Product Quantization for Approximate Nearest Neighbor Search. 2019 International Conference on Data Mining Workshops (ICDMW). :711—718.
Product quantization (PQ) and its variations are popular and attractive in approximate nearest neighbor search (ANN) due to their lower memory usage and faster retrieval speed. PQ decomposes the high-dimensional vector space into several low-dimensional subspaces, and quantizes each sub-vector in their subspaces, separately. Thus, PQ can generate a codebook containing an exponential number of codewords or indices by a Cartesian product of the sub-codebooks from different subspaces. However, when there is large variance in the average amplitude of the components of the data points, directly utilizing the PQ on the data points would result in poor performance. In this paper, we propose a new approach, namely, mean-removed product quantization (MRPQ) to address this issue. In fact, the average amplitude of a data point or the mean of a date point can be regarded as statistically independent of the variation of the vector, that is, of the way the components vary about this average. Then we can learn a separate scalar quantizer of the means of the data points and apply the PQ to their residual vectors. As shown in our comprehensive experiments on four large-scale public datasets, our approach can achieve substantial improvements in terms of Recall and MAP over some known methods. Moreover, our approach is general which can be combined with PQ and its variations.
Horzyk, Adrian, Starzyk, Janusz A..  2019.  Associative Data Model in Search for Nearest Neighbors and Similar Patterns. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :933—940.
This paper introduces a biologically inspired associative data model and structure for finding nearest neighbors and similar patterns. The method can be used as an alternative to the classical approaches to accelerate the search for such patterns using various priorities for attributes according to the Sebestyen measure. The presented structure, together with algorithms developed in this paper can be useful in various computational intelligence tasks like pattern matching, recognition, clustering, classification, multi-criterion search etc. This approach is particularly useful for the on-line operation of associative neural network graphs. Graphs that dynamically develop their structure during learning on training data. The results of experiments show that the associative approach can substantially accelerate the nearest neighbor search and that associative structures can also be used as a model for KNN tasks. Finally, this paper presents how the associative structures can be used to self-organize data and represent knowledge about them in the associative way, which yields new search approaches described in this paper.
Ahsan, Ramoza, Bashir, Muzammil, Neamtu, Rodica, Rundensteiner, Elke A., Sarkozy, Gabor.  2019.  Nearest Neighbor Subsequence Search in Time Series Data. 2019 IEEE International Conference on Big Data (Big Data). :2057—2066.
Continuous growth in sensor data and other temporal sequence data necessitates efficient retrieval and similarity search support on these big time series datasets. However, finding exact similarity results, especially at the granularity of subsequences, is known to be prohibitively costly for large data sets. In this paper, we thus propose an efficient framework for solving this exact subsequence similarity match problem, called TINN (TIme series Nearest Neighbor search). Exploiting the range interval diversity properties of time series datasets, TINN captures similarity at two levels of abstraction, namely, relationships among subsequences within each long time series and relationships across distinct time series in the data set. These relationships are compactly organized in an augmented relationship graph model, with the former relationships encoded in similarity vectors at TINN nodes and the later captured by augmented edge types in the TINN Graph. Query processing strategy deploy novel pruning techniques on the TINN Graph, including node skipping, vertical and horizontal pruning, to significantly reduce the number of time series as well as subsequences to be explored. Comprehensive experiments on synthetic and real world time series data demonstrate that our TINN model consistently outperforms state-of-the-art approaches while still guaranteeing to retrieve exact matches.
Abdelhadi, Ameer M.S., Bouganis, Christos-Savvas, Constantinides, George A..  2019.  Accelerated Approximate Nearest Neighbors Search Through Hierarchical Product Quantization. 2019 International Conference on Field-Programmable Technology (ICFPT). :90—98.
A fundamental recurring task in many machine learning applications is the search for the Nearest Neighbor in high dimensional metric spaces. Towards answering queries in large scale problems, state-of-the-art methods employ Approximate Nearest Neighbors (ANN) search, a search that returns the nearest neighbor with high probability, as well as techniques that compress the dataset. Product-Quantization (PQ) based ANN search methods have demonstrated state-of-the-art performance in several problems, including classification, regression and information retrieval. The dataset is encoded into a Cartesian product of multiple low-dimensional codebooks, enabling faster search and higher compression. Being intrinsically parallel, PQ-based ANN search approaches are amendable for hardware acceleration. This paper proposes a novel Hierarchical PQ (HPQ) based ANN search method as well as an FPGA-tailored architecture for its implementation that outperforms current state of the art systems. HPQ gradually refines the search space, reducing the number of data compares and enabling a pipelined search. The mapping of the architecture on a Stratix 10 FPGA device demonstrates over ×250 speedups over current state-of-the-art systems, opening the space for addressing larger datasets and/or improving the query times of current systems.
Rattaphun, Munlika, Prayoonwong, Amorntip, Chiu, Chih- Yi.  2019.  Indexing in k-Nearest Neighbor Graph by Hash-Based Hill-Climbing. 2019 16th International Conference on Machine Vision Applications (MVA). :1—4.
A main issue in approximate nearest neighbor search is to achieve an excellent tradeoff between search accuracy and computation cost. In this paper, we address this issue by leveraging k-nearest neighbor graph and hill-climbing to accelerate vector quantization in the query assignment process. A modified hill-climbing algorithm is proposed to traverse k-nearest neighbor graph to find closest centroids for a query, rather than calculating the query distances to all centroids. Instead of using random seeds in the original hill-climbing algorithm, we generate high-quality seeds based on the hashing technique. It can boost the query assignment efficiency due to a better start-up in hill-climbing. We evaluate the experiment on the benchmarks of SIFT1M and GIST1M datasets, and show the proposed hashing-based seed generation effectively improves the search performance.
Ranjan, G S K, Kumar Verma, Amar, Radhika, Sudha.  2019.  K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). :1—5.
Fault detection in a machine at earlier stage can prevent severe damage and loss to the industries. Fault detection techniques are broadly classified into three categories; signature extraction-based, model-based and knowledge-based approach. Model-based techniques are efficient for raising an alarm signal if there is any fault in the machine. This paper focuses on one such model based-technique to identify the internal faults of induction machine. The model developed is deployed in the end to make it feasible to use in real time. K-Nearest Neighbors (KNN) and grid search cross validation (CV) have been used to train and optimize the model to give the best results. The advantage of proposed algorithm is the accuracy in prediction which has been seen to be 80%. Finally, a user friendly interface has been built using Flask, a python web framework.
Li, Xiaodong.  2019.  DURS: A Distributed Method for k-Nearest Neighbor Search on Uncertain Graphs. 2019 20th IEEE International Conference on Mobile Data Management (MDM). :377—378.
Large graphs are increasingly prevalent in mobile networks, social networks, traffic networks and biological networks. These graphs are often uncertain, where edges are augmented with probabilities that indicates the chance to exist. Recently k-nearest neighbor search has been studied within the field of uncertain graphs, but the scalability and efficiency issues are not well solved. Moreover, solutions are implemented on a single machine and thus cannot fit large uncertain graphs. In this paper, we develop a framework, called DURS, to distribute k-nearest neighbor search into several machines and re-partition the uncertain graphs to balance the work loads and reduce the communication costs. Evaluation results show that DURS is essential to make the system scalable when answering k-nearest neighbor queries on uncertain graphs.
2020-04-10
Newaz, AKM Iqtidar, Sikder, Amit Kumar, Rahman, Mohammad Ashiqur, Uluagac, A. Selcuk.  2019.  HealthGuard: A Machine Learning-Based Security Framework for Smart Healthcare Systems. 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). :389—396.
The integration of Internet-of-Things and pervasive computing in medical devices have made the modern healthcare system “smart.” Today, the function of the healthcare system is not limited to treat the patients only. With the help of implantable medical devices and wearables, Smart Healthcare System (SHS) can continuously monitor different vital signs of a patient and automatically detect and prevent critical medical conditions. However, these increasing functionalities of SHS raise several security concerns and attackers can exploit the SHS in numerous ways: they can impede normal function of the SHS, inject false data to change vital signs, and tamper a medical device to change the outcome of a medical emergency. In this paper, we propose HealthGuard, a novel machine learning-based security framework to detect malicious activities in a SHS. HealthGuard observes the vital signs of different connected devices of a SHS and correlates the vitals to understand the changes in body functions of the patient to distinguish benign and malicious activities. HealthGuard utilizes four different machine learning-based detection techniques (Artificial Neural Network, Decision Tree, Random Forest, k-Nearest Neighbor) to detect malicious activities in a SHS. We trained HealthGuard with data collected for eight different smart medical devices for twelve benign events including seven normal user activities and five disease-affected events. Furthermore, we evaluated the performance of HealthGuard against three different malicious threats. Our extensive evaluation shows that HealthGuard is an effective security framework for SHS with an accuracy of 91 % and an F1 score of 90 %.
2020-02-26
Rahman, Obaid, Quraishi, Mohammad Ali Gauhar, Lung, Chung-Horng.  2019.  DDoS Attacks Detection and Mitigation in SDN Using Machine Learning. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:184–189.

Software Defined Networking (SDN) is very popular due to the benefits it provides such as scalability, flexibility, monitoring, and ease of innovation. However, it needs to be properly protected from security threats. One major attack that plagues the SDN network is the distributed denial-of-service (DDoS) attack. There are several approaches to prevent the DDoS attack in an SDN network. We have evaluated a few machine learning techniques, i.e., J48, Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (K-NN), to detect and block the DDoS attack in an SDN network. The evaluation process involved training and selecting the best model for the proposed network and applying it in a mitigation and prevention script to detect and mitigate attacks. The results showed that J48 performs better than the other evaluated algorithms, especially in terms of training and testing time.

2020-02-17
Ullah, Imtiaz, Mahmoud, Qusay H..  2019.  A Two-Level Hybrid Model for Anomalous Activity Detection in IoT Networks. 2019 16th IEEE Annual Consumer Communications Networking Conference (CCNC). :1–6.
In this paper we propose a two-level hybrid anomalous activity detection model for intrusion detection in IoT networks. The level-1 model uses flow-based anomaly detection, which is capable of classifying the network traffic as normal or anomalous. The flow-based features are extracted from the CICIDS2017 and UNSW-15 datasets. If an anomaly activity is detected then the flow is forwarded to the level-2 model to find the category of the anomaly by deeply examining the contents of the packet. The level-2 model uses Recursive Feature Elimination (RFE) to select significant features and Synthetic Minority Over-Sampling Technique (SMOTE) for oversampling and Edited Nearest Neighbors (ENN) for cleaning the CICIDS2017 and UNSW-15 datasets. Our proposed model precision, recall and F score for level-1 were measured 100% for the CICIDS2017 dataset and 99% for the UNSW-15 dataset, while the level-2 model precision, recall, and F score were measured at 100 % for the CICIDS2017 dataset and 97 % for the UNSW-15 dataset. The predictor we introduce in this paper provides a solid framework for the development of malicious activity detection in IoT networks.
2019-12-09
Yang, Chao, Chen, Xinghe, Song, Tingting, Jiang, Bin, Liu, Qin.  2018.  A Hybrid Recommendation Algorithm Based on Heuristic Similarity and Trust Measure. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1413–1418.
In this paper, we propose a hybrid collaborative filtering recommendation algorithm based on heuristic similarity and trust measure, in order to alleviate the problem of data sparsity, cold start and trust measure. Firstly, a new similarity measure is implemented by weighted fusion of multiple similarity influence factors obtained from the rating matrix, so that the similarity measure becomes more accurate. Then, a user trust relationship computing model is implemented by constructing the user's trust network based on the trust propagation theory. On this basis, a SIMT collaborative filtering algorithm is designed which integrates trust and similarity instead of the similarity in traditional collaborative filtering algorithm. Further, an improved K nearest neighbor recommendation based on clustering algorithm is implemented for generation of a better recommendation list. Finally, a comparative experiment on FilmTrust dataset shows that the proposed algorithm has improved the quality and accuracy of recommendation, thus overcome the problem of data sparsity, cold start and trust measure to a certain extent.
2019-11-12
Ferenc, Rudolf, Heged\H us, Péter, Gyimesi, Péter, Antal, Gábor, Bán, Dénes, Gyimóthy, Tibor.  2019.  Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions. 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). :8-14.

The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.

2019-02-13
Prakash, A., Priyadarshini, R..  2018.  An Intelligent Software defined Network Controller for preventing Distributed Denial of Service Attack. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). :585–589.

Software Defined Network (SDN) architecture is a new and novel way of network management mechanism. In SDN, switches do not process the incoming packets like conventional network computing environment. They match for the incoming packets in the forwarding tables and if there is none it will be sent to the controller for processing which is the operating system of the SDN. A Distributed Denial of Service (DDoS) attack is a biggest threat to cyber security in SDN network. The attack will occur at the network layer or the application layer of the compromised systems that are connected to the network. In this paper a machine learning based intelligent method is proposed which can detect the incoming packets as infected or not. The different machine learning algorithms adopted for accomplishing the task are Naive Bayes, K-Nearest neighbor (KNN) and Support vector machine (SVM) to detect the anomalous behavior of the data traffic. These three algorithms are compared according to their performances and KNN is found to be the suitable one over other two. The performance measure is taken here is the detection rate of infected packets.

2018-06-20
Ren, Z., Chen, G..  2017.  EntropyVis: Malware classification. 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). :1–6.

Malware writers often develop malware with automated measures, so the number of malware has increased dramatically. Automated measures tend to repeatedly use significant modules, which form the basis for identifying malware variants and discriminating malware families. Thus, we propose a novel visualization analysis method for researching malware similarity. This method converts malicious Windows Portable Executable (PE) files into local entropy images for observing internal features of malware, and then normalizes local entropy images into entropy pixel images for malware classification. We take advantage of the Jaccard index to measure similarities between entropy pixel images and the k-Nearest Neighbor (kNN) classification algorithm to assign entropy pixel images to different malware families. Preliminary experimental results show that our visualization method can discriminate malware families effectively.

2018-06-11
Cai, Y., Huang, H., Cai, H., Qi, Y..  2017.  A K-nearest neighbor locally search regression algorithm for short-term traffic flow forecasting. 2017 9th International Conference on Modelling, Identification and Control (ICMIC). :624–629.

Accurate short-term traffic flow forecasting is of great significance for real-time traffic control, guidance and management. The k-nearest neighbor (k-NN) model is a classic data-driven method which is relatively effective yet simple to implement for short-term traffic flow forecasting. For conventional prediction mechanism of k-NN model, the k nearest neighbors' outputs weighted by similarities between the current traffic flow vector and historical traffic flow vectors is directly used to generate prediction values, so that the prediction results are always not ideal. It is observed that there are always some outliers in k nearest neighbors' outputs, which may have a bad influences on the prediction value, and the local similarities between current traffic flow and historical traffic flows at the current sampling period should have a greater relevant to the prediction value. In this paper, we focus on improving the prediction mechanism of k-NN model and proposed a k-nearest neighbor locally search regression algorithm (k-LSR). The k-LSR algorithm can use locally search strategy to search for optimal nearest neighbors' outputs and use optimal nearest neighbors' outputs weighted by local similarities to forecast short-term traffic flow so as to improve the prediction mechanism of k-NN model. The proposed algorithm is tested on the actual data and compared with other algorithms in performance. We use the root mean squared error (RMSE) as the evaluation indicator. The comparison results show that the k-LSR algorithm is more successful than the k-NN and k-nearest neighbor locally weighted regression algorithm (k-LWR) in forecasting short-term traffic flow, and which prove the superiority and good practicability of the proposed algorithm.

Ocsa, A., Huillca, J. L., Coronado, R., Quispe, O., Arbieto, C., Lopez, C..  2017.  Approximate nearest neighbors by deep hashing on large-scale search: Comparison of representations and retrieval performance. 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI). :1–6.

The growing volume of data and its increasing complexity require even more efficient and faster information retrieval techniques. Approximate nearest neighbor search algorithms based on hashing were proposed to query high-dimensional datasets due to its high retrieval speed and low storage cost. Recent studies promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, there are challenges to solve in order to find a practical and efficient solution to index CNN features, such as the need for a heavy training process to achieve accurate query results and the critical dependency on data-parameters. In this work we execute exhaustive experiments in order to compare recent methods that are able to produces a better representation of the data space with a less computational cost for a better accuracy by computing the best data-parameter values for optimal sub-space projection exploring the correlations among CNN feature attributes using fractal theory. We give an overview of these different techniques and present our comparative experiments for data representation and retrieval performance.