Visible to the public Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation

TitleClassification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation
Publication TypeConference Paper
Year of Publication2020
AuthorsPiazza, Nancirose
Conference Name2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)
Date Publishedoct
KeywordsArtificial neural networks, composability, Dictionaries, Human Behavior, human factors, Indexes, Metrics, Numerical models, part of speech tagging, pubcrawl, Scalability, tagging, text analytics, Training data, trigram representation, Vocabulary, word embedding, Zipf’s Law
AbstractClassification between machine-translated text and original text are often tokenized on vocabulary of the corpi. With N-grams larger than uni-gram, one can create a model that estimates a decision boundary based on word frequency probability distribution; however, this approach is exponentially expensive because of high dimensionality and sparsity. Instead, we let samples of the corpi be represented by part-of-speech tagging which is significantly less vocabulary. With less trigram permutations, we can create a model with its tri-gram frequency probability distribution. In this paper, we explore less conventional ways of approaching techniques for handling documents, dictionaries, and the likes.
DOI10.1109/DSAA49011.2020.00092
Citation Keypiazza_classification_2020