Visible to the public Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)

TitleExploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)
Publication TypeConference Paper
Year of Publication2017
AuthorsDevyatkin, D., Smirnov, I., Ananyeva, M., Kobozeva, M., Chepovskiy, A., Solovyev, F.
Conference Name2017 IEEE International Conference on Intelligence and Security Informatics (ISI)
Date Publishedjul
Keywordsautomatic extremist text detection, Bayes methods, classification methods, classification quality, Dictionaries, differentiating feature, extremist texts, feature extraction, gradient boosting, Human Behavior, learning (artificial intelligence), linear SVM, linguistic features, logistic regression, multinomial naive Bayes, natural language processing, pattern classification, Pragmatics, psych olinguistic features, psycholinguistic features, pubcrawl, Random Forest, regression analysis, Resiliency, Russian language, Russian legislation, Russian-speaking illegal text material, Scalability, semantic features, Semantics, Social network services, Support vector machines, Terrorism, text analysis, text classification, text detection
Abstract

In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.

URLhttps://ieeexplore.ieee.org/document/8004907/
DOI10.1109/ISI.2017.8004907
Citation Keydevyatkin_exploring_2017