Title | Big Data Nearest Neighbor Similar Data Retrieval Algorithm based on Improved Random Forest |
Publication Type | Conference Paper |
Year of Publication | 2021 |
Authors | Yang, Cuicui, Liu, Pinjie |
Conference Name | 2021 International Conference on Big Data Analysis and Computer Science (BDACS) |
Date Published | jun |
Keywords | Big Data, Computer science, Data models, Decision trees, feature extraction, Hamming distance, improve random forest, Measurement, Metrics, nearest neighbor search, pubcrawl, search algorithm, similar data, Training |
Abstract | In the process of big data nearest neighbor similar data retrieval, affected by the way of data feature extraction, the retrieval accuracy is low. Therefore, this paper proposes the design of big data nearest neighbor similar data retrieval algorithm based on improved random forest. Through the improvement of random forest model and the construction of random decision tree, the characteristics of current nearest neighbor big data are clarified. Based on the improved random forest, the hash code is generated. Finally, combined with the Hamming distance calculation method, the nearest neighbor similar data retrieval of big data is realized. The experimental results show that: in the multi label environment, the retrieval accuracy is improved by 9% and 10%. In the single label environment, the similar data retrieval accuracy of the algorithm is improved by 12% and 28% respectively. |
DOI | 10.1109/BDACS53596.2021.00046 |
Citation Key | yang_big_2021 |