Visible to the public Biblio

Filters: Keyword is computational linguistics  [Clear All Filters]
2022-03-10
Ahirrao, Mayur, Joshi, Yash, Gandhe, Atharva, Kotgire, Sumeet, Deshmukh, Rohini G..  2021.  Phrase Composing Tool using Natural Language Processing. 2021 International Conference on Intelligent Technologies (CONIT). :1—4.
In this fast-running world, machine communication plays a vital role. To compete with this world, human-machine interaction is a necessary thing. To enhance this, Natural Language Processing technique is used widely. Using this technique, we can reduce the interaction gap between the machine and human. Till now, many such applications are developed which are using this technique.This tool deals with the various methods which are used for development of grammar error correction. These methods include rule-based method, classifier-based method and machine translation-based method. Also, models regarding the Natural Language Processing (NLP) pipeline are trained and implemented in this project accordingly. Additionally, the tool can also perform speech to text operation.
2019-02-22
Vysotska, V., Lytvyn, V., Hrendus, M., Kubinska, S., Brodyak, O..  2018.  Method of Textual Information Authorship Analysis Based on Stylometry. 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). 2:9-16.

The paper dwells on the peculiarities of stylometry technologies usage to determine the style of the author publications. Statistical linguistic analysis of the author's text allows taking advantage of text content monitoring based on Porter stemmer and NLP methods to determine the set of stop words. The latter is used in the methods of stylometry to determine the ownership of the analyzed text to a specific author in percentage points. There is proposed a formal approach to the definition of the author's style of the Ukrainian text in the article. The experimental results of the proposed method for determining the ownership of the analyzed text to a particular author upon the availability of the reference text fragment are obtained. The study was conducted on the basis of the Ukrainian scientific texts of a technical area.

2018-03-19
Rocha, A., Scheirer, W. J., Forstall, C. W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A. R. B., Stamatatos, E..  2017.  Authorship Attribution for Social Media Forensics. IEEE Transactions on Information Forensics and Security. 12:5–33.

The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.

2018-02-15
Pan, J., Mao, X..  2017.  Detecting DOM-Sourced Cross-Site Scripting in Browser Extensions. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). :24–34.

In recent years, with the advances in JavaScript engines and the adoption of HTML5 APIs, web applications begin to show a tendency to shift their functionality from the server side towards the client side, resulting in dense and complex interactions with HTML documents using the Document Object Model (DOM). As a consequence, client-side vulnerabilities become more and more prevalent. In this paper, we focus on DOM-sourced Cross-site Scripting (XSS), which is a kind of severe but not well-studied vulnerability appearing in browser extensions. Comparing with conventional DOM-based XSS, a new attack surface is introduced by DOM-sourced XSS where the DOM could become a vulnerable source as well besides common sources such as URLs and form inputs. To discover such vulnerability, we propose a detecting framework employing hybrid analysis with two phases. The first phase is the lightweight static analysis consisting of a text filter and an abstract syntax tree parser, which produces potential vulnerable candidates. The second phase is the dynamic symbolic execution with an additional component named shadow DOM, generating a document as a proof-of-concept exploit. In our large-scale real-world experiment, 58 previously unknown DOM-sourced XSS vulnerabilities were discovered in user scripts of the popular browser extension Greasemonkey.