Biblio

List
Filter

Found 1 results

Filters: Keyword is trigram representation [Clear All Filters]

2021-11-29

Piazza, Nancirose. 2020. Classification Between Machine Translated Text and Original Text By Part Of Speech Tagging Representation. 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). :739–740.

Classification between machine-translated text and original text are often tokenized on vocabulary of the corpi. With N-grams larger than uni-gram, one can create a model that estimates a decision boundary based on word frequency probability distribution; however, this approach is exponentially expensive because of high dimensionality and sparsity. Instead, we let samples of the corpi be represented by part-of-speech tagging which is significantly less vocabulary. With less trigram permutations, we can create a model with its tri-gram frequency probability distribution. In this paper, we explore less conventional ways of approaching techniques for handling documents, dictionaries, and the likes.