Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)
Title | Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts) |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Devyatkin, D., Smirnov, I., Ananyeva, M., Kobozeva, M., Chepovskiy, A., Solovyev, F. |
Conference Name | 2017 IEEE International Conference on Intelligence and Security Informatics (ISI) |
Date Published | jul |
Keywords | automatic extremist text detection, Bayes methods, classification methods, classification quality, Dictionaries, differentiating feature, extremist texts, feature extraction, gradient boosting, Human Behavior, learning (artificial intelligence), linear SVM, linguistic features, logistic regression, multinomial naive Bayes, natural language processing, pattern classification, Pragmatics, psych olinguistic features, psycholinguistic features, pubcrawl, Random Forest, regression analysis, Resiliency, Russian language, Russian legislation, Russian-speaking illegal text material, Scalability, semantic features, Semantics, Social network services, Support vector machines, Terrorism, text analysis, text classification, text detection |
Abstract | In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection. |
URL | https://ieeexplore.ieee.org/document/8004907/ |
DOI | 10.1109/ISI.2017.8004907 |
Citation Key | devyatkin_exploring_2017 |
- Scalability
- psycholinguistic features
- pubcrawl
- Random Forest
- regression analysis
- Resiliency
- Russian language
- Russian legislation
- Russian-speaking illegal text material
- psych olinguistic features
- semantic features
- Semantics
- Social network services
- Support vector machines
- Terrorism
- text analysis
- text classification
- text detection
- Human behavior
- Bayes methods
- classification methods
- classification quality
- Dictionaries
- differentiating feature
- extremist texts
- feature extraction
- gradient boosting
- automatic extremist text detection
- learning (artificial intelligence)
- linear SVM
- linguistic features
- logistic regression
- multinomial naive Bayes
- natural language processing
- pattern classification
- Pragmatics