Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)

Title	Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)
Publication Type	Conference Paper
Year of Publication	2017
Authors	Devyatkin, D., Smirnov, I., Ananyeva, M., Kobozeva, M., Chepovskiy, A., Solovyev, F.
Conference Name	2017 IEEE International Conference on Intelligence and Security Informatics (ISI)
Date Published	jul
Keywords	automatic extremist text detection, Bayes methods, classification methods, classification quality, Dictionaries, differentiating feature, extremist texts, feature extraction, gradient boosting, Human Behavior, learning (artificial intelligence), linear SVM, linguistic features, logistic regression, multinomial naive Bayes, natural language processing, pattern classification, Pragmatics, psych olinguistic features, psycholinguistic features, pubcrawl, Random Forest, regression analysis, Resiliency, Russian language, Russian legislation, Russian-speaking illegal text material, Scalability, semantic features, Semantics, Social network services, Support vector machines, Terrorism, text analysis, text classification, text detection
Abstract	In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.
URL	https://ieeexplore.ieee.org/document/8004907/
DOI	10.1109/ISI.2017.8004907
Citation Key	devyatkin_exploring_2017