Visible to the public Biblio

Filters: Keyword is N-grams  [Clear All Filters]
2023-02-03
Nelson, Jared Ray, Shekaramiz, Mohammad.  2022.  Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics. 2022 Intermountain Engineering, Technology and Computing (IETC). :1–6.
This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.
2021-01-18
Yadav, M. K., Gugal, D., Matkar, S., Waghmare, S..  2019.  Encrypted Keyword Search in Cloud Computing using Fuzzy Logic. 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT). :1–4.
Research and Development, and information management professionals routinely employ simple keyword searches or more complex Boolean queries when using databases such as PubMed and Ovid and search engines like Google to find the information they need. While satisfying the basic needs of the researcher, basic search is limited which can adversely affect both precision and recall, decreasing productivity and damaging the researchers' ability to discover new insights. The cloud service providers who store user's data may access sensitive information without any proper authority. A basic approach to save the data confidentiality is to encrypt the data. Data encryption also demands the protection of keyword privacy since those usually contain very vital information related to the files. Encryption of keywords protects keyword safety. Fuzzy keyword search enhances system usability by matching the files perfectly or to the nearest possible files against the keywords entered by the user based on similar semantics. Encrypted keyword search in cloud using this logic provides the user, on entering keywords, to receive best possible files in a more secured manner, by protecting the user's documents.
2019-01-21
Chernis, Boris, Verma, Rakesh.  2018.  Machine Learning Methods for Software Vulnerability Detection. Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. :31–39.

Software vulnerabilities are a primary concern in the IT security industry, as malicious hackers who discover these vulnerabilities can often exploit them for nefarious purposes. However, complex programs, particularly those written in a relatively low-level language like C, are difficult to fully scan for bugs, even when both manual and automated techniques are used. Since analyzing code and making sure it is securely written is proven to be a non-trivial task, both static analysis and dynamic analysis techniques have been heavily investigated, and this work focuses on the former. The contribution of this paper is a demonstration of how it is possible to catch a large percentage of bugs by extracting text features from functions in C source code and analyzing them with a machine learning classifier. Relatively simple features (character count, character diversity, entropy, maximum nesting depth, arrow count, "if" count, "if" complexity, "while" count, and "for" count) were extracted from these functions, and so were complex features (character n-grams, word n-grams, and suffix trees). The simple features performed unexpectedly better compared to the complex features (74% accuracy compared to 69% accuracy).