Visible to the public Big Data Nearest Neighbor Similar Data Retrieval Algorithm based on Improved Random Forest

TitleBig Data Nearest Neighbor Similar Data Retrieval Algorithm based on Improved Random Forest
Publication TypeConference Paper
Year of Publication2021
AuthorsYang, Cuicui, Liu, Pinjie
Conference Name2021 International Conference on Big Data Analysis and Computer Science (BDACS)
Date Publishedjun
KeywordsBig Data, Computer science, Data models, Decision trees, feature extraction, Hamming distance, improve random forest, Measurement, Metrics, nearest neighbor search, pubcrawl, search algorithm, similar data, Training
AbstractIn the process of big data nearest neighbor similar data retrieval, affected by the way of data feature extraction, the retrieval accuracy is low. Therefore, this paper proposes the design of big data nearest neighbor similar data retrieval algorithm based on improved random forest. Through the improvement of random forest model and the construction of random decision tree, the characteristics of current nearest neighbor big data are clarified. Based on the improved random forest, the hash code is generated. Finally, combined with the Hamming distance calculation method, the nearest neighbor similar data retrieval of big data is realized. The experimental results show that: in the multi label environment, the retrieval accuracy is improved by 9% and 10%. In the single label environment, the similar data retrieval accuracy of the algorithm is improved by 12% and 28% respectively.
DOI10.1109/BDACS53596.2021.00046
Citation Keyyang_big_2021