Visible to the public Biblio

Filters: Keyword is trigram representation  [Clear All Filters]
2021-11-29
Piazza, Nancirose.  2020.  Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). :739–740.
Classification between machine-translated text and original text are often tokenized on vocabulary of the corpi. With N-grams larger than uni-gram, one can create a model that estimates a decision boundary based on word frequency probability distribution; however, this approach is exponentially expensive because of high dimensionality and sparsity. Instead, we let samples of the corpi be represented by part-of-speech tagging which is significantly less vocabulary. With less trigram permutations, we can create a model with its tri-gram frequency probability distribution. In this paper, we explore less conventional ways of approaching techniques for handling documents, dictionaries, and the likes.