Visible to the public Research on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation

TitleResearch on Small Sample Text Classification Based on Attribute Extraction and Data Augmentation
Publication TypeConference Paper
Year of Publication2021
AuthorsQing-chao, Ni, Cong-jue, Yin, Dong-hua, Zhao
Conference Name2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)
Date Publishedapr
KeywordsAnalytical models, attribute extraction network, BERT, Bit error rate, classification of charges, composability, data augmentation, Data models, Human Behavior, Law, legal intelligence, Metrics, pubcrawl, Scalability, Semantics, text analytics, text categorization, Training
AbstractWith the development of deep learning and the progress of natural language processing technology, as well as the continuous disclosure of judicial data such as judicial documents, legal intelligence has gradually become a research hot spot. The crime classification task is an important branch of text classification, which can help people related to the law to improve their work efficiency. However, in the actual research, the sample data is small and the distribution of crime categories is not balanced. To solve these two problems, BERT was used as the encoder to solve the problem of small data volume, and attribute extraction network was added to solve the problem of unbalanced distribution. Finally, the accuracy of 90.35% on small sample data set could be achieved, and F1 value was 67.62, which was close to the best model performance under sufficient data. Finally, a text enhancement method based on back-translation technology is proposed. Different models are used to conduct experiments. Finally, it is found that LSTM model is improved to some extent, but BERT is not improved to some extent.
DOI10.1109/ICCCBDA51879.2021.9442500
Citation Keyqing-chao_research_2021