Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation

Submitted by grigby1 on Mon, 11/29/2021 - 3:32pm

Title	Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation
Publication Type	Conference Paper
Year of Publication	2020
Authors	Piazza, Nancirose
Conference Name	2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)
Date Published	oct
Keywords	Artificial neural networks, composability, Dictionaries, Human Behavior, human factors, Indexes, Metrics, Numerical models, part of speech tagging, pubcrawl, Scalability, tagging, text analytics, Training data, trigram representation, Vocabulary, word embedding, Zipf’s Law
Abstract	Classification between machine-translated text and original text are often tokenized on vocabulary of the corpi. With N-grams larger than uni-gram, one can create a model that estimates a decision boundary based on word frequency probability distribution; however, this approach is exponentially expensive because of high dimensionality and sparsity. Instead, we let samples of the corpi be represented by part-of-speech tagging which is significantly less vocabulary. With less trigram permutations, we can create a model with its tri-gram frequency probability distribution. In this paper, we explore less conventional ways of approaching techniques for handling documents, dictionaries, and the likes.
DOI	10.1109/DSAA49011.2020.00092
Citation Key	piazza_classification_2020

Groups: