Biblio
Filters: Author is Ghanbari, Shirin [Clear All Filters]
Correction of Spaces in Persian Sentences for Tokenization. 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI). :670–674.
.
2019. The exponential growth of the Internet and its users and the emergence of Web 2.0 have caused a large volume of textual data to be created. Automatic analysis of such data can be used in making decisions. As online text is created by different producers with different styles of writing, pre-processing is a necessity prior to any processes related to natural language tasks. An essential part of textual preprocessing prior to the recognition of the word vocabulary is normalization, which includes the correction of spaces that particularly in the Persian language this includes both full-spaces between words and half-spaces. Through the review of user comments within social media services, it can be seen that in many cases users do not adhere to grammatical rules of inserting both forms of spaces, which increases the complexity of the identification of words and henceforth, reducing the accuracy of further processing on the text. In this study, current issues in the normalization and tokenization of preprocessing tools within the Persian language and essentially identifying and correcting the separation of words are and the correction of spaces are proposed. The results obtained and compared to leading preprocessing tools highlight the significance of the proposed methodology.
Extractive Persian Summarizer for News Websites. 2019 5th International Conference on Web Research (ICWR). :85–89.
.
2019. Automatic extractive text summarization is the process of condensing textual information while preserving the important concepts. The proposed method after performing pre-processing on input Persian news articles generates a feature vector of salient sentences from a combination of statistical, semantic and heuristic methods and that are scored and concatenated accordingly. The scoring of the salient features is based on the article's title, proper nouns, pronouns, sentence length, keywords, topic words, sentence position, English words, and quotations. Experimental results on measurements including recall, F-measure, ROUGE-N are presented and compared to other Persian summarizers and shown to provide higher performance.