Title | Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics |
Publication Type | Conference Paper |
Year of Publication | 2022 |
Authors | Nelson, Jared Ray, Shekaramiz, Mohammad |
Conference Name | 2022 Intermountain Engineering, Technology and Computing (IETC) |
Keywords | authorship, Correlation, Human Behavior, machine learning, Measurement, Metrics, N-grams, Part of Speech (PoS), pubcrawl, Software algorithms, stylometry, syntactical data, Syntactics, Technological innovation, Training, Training data |
Abstract | This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia. |
DOI | 10.1109/IETC54973.2022.9796736 |
Citation Key | nelson_authorship_2022 |