Research on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation

Submitted by grigby1 on Thu, 05/19/2022 - 1:23pm

Title	Research on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation
Publication Type	Conference Paper
Year of Publication	2021
Authors	Qing-chao, Ni, Cong-jue, Yin, Dong-hua, Zhao
Conference Name	2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)
Date Published	apr
Keywords	Analytical models, attribute extraction network, BERT, Bit error rate, classification of charges, composability, data augmentation, Data models, Human Behavior, Law, legal intelligence, Metrics, pubcrawl, Scalability, Semantics, text analytics, text categorization, Training
Abstract	With the development of deep learning and the progress of natural language processing technology, as well as the continuous disclosure of judicial data such as judicial documents, legal intelligence has gradually become a research hot spot. The crime classification task is an important branch of text classification, which can help people related to the law to improve their work efficiency. However, in the actual research, the sample data is small and the distribution of crime categories is not balanced. To solve these two problems, BERT was used as the encoder to solve the problem of small data volume, and attribute extraction network was added to solve the problem of unbalanced distribution. Finally, the accuracy of 90.35% on small sample data set could be achieved, and F1 value was 67.62, which was close to the best model performance under sufficient data. Finally, a text enhancement method based on back-translation technology is proposed. Different models are used to conduct experiments. Finally, it is found that LSTM model is improved to some extent, but BERT is not improved to some extent.
DOI	10.1109/ICCCBDA51879.2021.9442500
Citation Key	qing-chao_research_2021

Groups:

Science of Security VO