Biblio
Nearest neighbor search algorithm plays a very important role in computer image algorithm. When the search data is large, we need to use fast search algorithm. The current fast retrieval algorithms are tree based algorithms. The efficiency of the tree algorithm decreases sharply with the increase of the data dimension. In this paper, a local integral hash nearest neighbor algorithm of the spatial space is proposed to construct the tree structure by changing the way of the node of the access tree. It is able to express data distribution characteristics. After experimental testing, this paper achieves more efficient performance in high dimensional data.
Cross-modal hashing, which searches nearest neighbors across different modalities in the Hamming space, has become a popular technique to overcome the storage and computation barrier in multimedia retrieval recently. Although dozens of cross-modal hashing algorithms are proposed to yield compact binary code representation, applying exhaustive search in a large-scale dataset is impractical for the real-time purpose, and the Hamming distance computation suffers inaccurate results. In this paper, we propose a novel index scheme over binary hash codes in cross-modal retrieval. The proposed indexing scheme exploits a few binary bits of the hash code as the index code. Based on the index code representation, we construct an inverted index structure to accelerate the retrieval efficiency and train a neural network to improve the indexing accuracy. Experiments are performed on two benchmark datasets for retrieval across image and text modalities, where hash codes are generated by three cross-modal hashing methods. Results show the proposed method effectively boosts the performance over the benchmark datasets and hash methods.
Software Defined Networking (SDN) is very popular due to the benefits it provides such as scalability, flexibility, monitoring, and ease of innovation. However, it needs to be properly protected from security threats. One major attack that plagues the SDN network is the distributed denial-of-service (DDoS) attack. There are several approaches to prevent the DDoS attack in an SDN network. We have evaluated a few machine learning techniques, i.e., J48, Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (K-NN), to detect and block the DDoS attack in an SDN network. The evaluation process involved training and selecting the best model for the proposed network and applying it in a mitigation and prevention script to detect and mitigate attacks. The results showed that J48 performs better than the other evaluated algorithms, especially in terms of training and testing time.
The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.
Software Defined Network (SDN) architecture is a new and novel way of network management mechanism. In SDN, switches do not process the incoming packets like conventional network computing environment. They match for the incoming packets in the forwarding tables and if there is none it will be sent to the controller for processing which is the operating system of the SDN. A Distributed Denial of Service (DDoS) attack is a biggest threat to cyber security in SDN network. The attack will occur at the network layer or the application layer of the compromised systems that are connected to the network. In this paper a machine learning based intelligent method is proposed which can detect the incoming packets as infected or not. The different machine learning algorithms adopted for accomplishing the task are Naive Bayes, K-Nearest neighbor (KNN) and Support vector machine (SVM) to detect the anomalous behavior of the data traffic. These three algorithms are compared according to their performances and KNN is found to be the suitable one over other two. The performance measure is taken here is the detection rate of infected packets.
Malware writers often develop malware with automated measures, so the number of malware has increased dramatically. Automated measures tend to repeatedly use significant modules, which form the basis for identifying malware variants and discriminating malware families. Thus, we propose a novel visualization analysis method for researching malware similarity. This method converts malicious Windows Portable Executable (PE) files into local entropy images for observing internal features of malware, and then normalizes local entropy images into entropy pixel images for malware classification. We take advantage of the Jaccard index to measure similarities between entropy pixel images and the k-Nearest Neighbor (kNN) classification algorithm to assign entropy pixel images to different malware families. Preliminary experimental results show that our visualization method can discriminate malware families effectively.
Accurate short-term traffic flow forecasting is of great significance for real-time traffic control, guidance and management. The k-nearest neighbor (k-NN) model is a classic data-driven method which is relatively effective yet simple to implement for short-term traffic flow forecasting. For conventional prediction mechanism of k-NN model, the k nearest neighbors' outputs weighted by similarities between the current traffic flow vector and historical traffic flow vectors is directly used to generate prediction values, so that the prediction results are always not ideal. It is observed that there are always some outliers in k nearest neighbors' outputs, which may have a bad influences on the prediction value, and the local similarities between current traffic flow and historical traffic flows at the current sampling period should have a greater relevant to the prediction value. In this paper, we focus on improving the prediction mechanism of k-NN model and proposed a k-nearest neighbor locally search regression algorithm (k-LSR). The k-LSR algorithm can use locally search strategy to search for optimal nearest neighbors' outputs and use optimal nearest neighbors' outputs weighted by local similarities to forecast short-term traffic flow so as to improve the prediction mechanism of k-NN model. The proposed algorithm is tested on the actual data and compared with other algorithms in performance. We use the root mean squared error (RMSE) as the evaluation indicator. The comparison results show that the k-LSR algorithm is more successful than the k-NN and k-nearest neighbor locally weighted regression algorithm (k-LWR) in forecasting short-term traffic flow, and which prove the superiority and good practicability of the proposed algorithm.
The growing volume of data and its increasing complexity require even more efficient and faster information retrieval techniques. Approximate nearest neighbor search algorithms based on hashing were proposed to query high-dimensional datasets due to its high retrieval speed and low storage cost. Recent studies promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, there are challenges to solve in order to find a practical and efficient solution to index CNN features, such as the need for a heavy training process to achieve accurate query results and the critical dependency on data-parameters. In this work we execute exhaustive experiments in order to compare recent methods that are able to produces a better representation of the data space with a less computational cost for a better accuracy by computing the best data-parameter values for optimal sub-space projection exploring the correlations among CNN feature attributes using fractal theory. We give an overview of these different techniques and present our comparative experiments for data representation and retrieval performance.