Title | E-Mail Classification Using Natural Language Processing |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Sel, Slhami, Hanbay, Davut |
Conference Name | 2019 27th Signal Processing and Communications Applications Conference (SIU) |
Date Published | apr |
Keywords | academic notices, business correspondence, Classification algorithms, e-mail classification, Electronic mail, Human Behavior, k-means, k-means algorithm, natural language processing, pattern classification, Postal services, pubcrawl, reminders, Resiliency, Scalability, serious communication tool, Skip Gram, spam e-mails, Support vector machines, test phase M3 model, text analysis, text classification, Tokenization, unsolicited e-mail, unsupervised learning, unsupervised training model, Web page memberships, Word2Vec, Word2Vec algorithm |
Abstract | Thanks to the rapid increase in technology and electronic communications, e-mail has become a serious communication tool. In many applications such as business correspondence, reminders, academic notices, web page memberships, e-mail is used as primary way of communication. If we ignore spam e-mails, there remain hundreds of e-mails received every day. In order to determine the importance of received e-mails, the subject or content of each e-mail must be checked. In this study we proposed an unsupervised system to classify received e-mails. Received e-mails' coordinates are determined by a method of natural language processing called as Word2Vec algorithm. According to the similarities, processed data are grouped by k-means algorithm with an unsupervised training model. In this study, 10517 e-mails were used in training. The success of the system is tested on a test group of 200 e-mails. In the test phase M3 model (window size 3, min. Word frequency 10, Gram skip) consolidated the highest success (91%). Obtained results are evaluated in section VI. |
DOI | 10.1109/SIU.2019.8806593 |
Citation Key | sel_e-mail_2019 |