Visible to the public E-Mail Classification Using Natural Language Processing

TitleE-Mail Classification Using Natural Language Processing
Publication TypeConference Paper
Year of Publication2019
AuthorsSel, Slhami, Hanbay, Davut
Conference Name2019 27th Signal Processing and Communications Applications Conference (SIU)
Date Publishedapr
Keywordsacademic notices, business correspondence, Classification algorithms, e-mail classification, Electronic mail, Human Behavior, k-means, k-means algorithm, natural language processing, pattern classification, Postal services, pubcrawl, reminders, Resiliency, Scalability, serious communication tool, Skip Gram, spam e-mails, Support vector machines, test phase M3 model, text analysis, text classification, Tokenization, unsolicited e-mail, unsupervised learning, unsupervised training model, Web page memberships, Word2Vec, Word2Vec algorithm
AbstractThanks to the rapid increase in technology and electronic communications, e-mail has become a serious communication tool. In many applications such as business correspondence, reminders, academic notices, web page memberships, e-mail is used as primary way of communication. If we ignore spam e-mails, there remain hundreds of e-mails received every day. In order to determine the importance of received e-mails, the subject or content of each e-mail must be checked. In this study we proposed an unsupervised system to classify received e-mails. Received e-mails' coordinates are determined by a method of natural language processing called as Word2Vec algorithm. According to the similarities, processed data are grouped by k-means algorithm with an unsupervised training model. In this study, 10517 e-mails were used in training. The success of the system is tested on a test group of 200 e-mails. In the test phase M3 model (window size 3, min. Word frequency 10, Gram skip) consolidated the highest success (91%). Obtained results are evaluated in section VI.
DOI10.1109/SIU.2019.8806593
Citation Keysel_e-mail_2019