Visible to the public Biblio

Found 165 results

Filters: Keyword is natural language processing  [Clear All Filters]
2020-08-28
Traylor, Terry, Straub, Jeremy, Gurmeet, Snell, Nicholas.  2019.  Classifying Fake News Articles Using Natural Language Processing to Identify In-Article Attribution as a Supervised Learning Estimator. 2019 IEEE 13th International Conference on Semantic Computing (ICSC). :445—449.

Intentionally deceptive content presented under the guise of legitimate journalism is a worldwide information accuracy and integrity problem that affects opinion forming, decision making, and voting patterns. Most so-called `fake news' is initially distributed over social media conduits like Facebook and Twitter and later finds its way onto mainstream media platforms such as traditional television and radio news. The fake news stories that are initially seeded over social media platforms share key linguistic characteristics such as making excessive use of unsubstantiated hyperbole and non-attributed quoted content. In this paper, the results of a fake news identification study that documents the performance of a fake news classifier are presented. The Textblob, Natural Language, and SciPy Toolkits were used to develop a novel fake news detector that uses quoted attribution in a Bayesian machine learning system as a key feature to estimate the likelihood that a news article is fake. The resultant process precision is 63.333% effective at assessing the likelihood that an article with quotes is fake. This process is called influence mining and this novel technique is presented as a method that can be used to enable fake news and even propaganda detection. In this paper, the research process, technical analysis, technical linguistics work, and classifier performance and results are presented. The paper concludes with a discussion of how the current system will evolve into an influence mining system.

Khomytska, Iryna, Teslyuk, Vasyl.  2019.  Mathematical Methods Applied for Authorship Attribution on the Phonological Level. 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT). 3:7—11.

The proposed combination of statistical methods has proved efficient for authorship attribution. The complex analysis method based on the proposed combination of statistical methods has made it possible to minimize the number of phoneme groups by which the authorial differentiation of texts has been done.

Jafariakinabad, Fereshteh, Hua, Kien A..  2019.  Style-Aware Neural Model with Application in Authorship Attribution. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :325—328.

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

2020-07-30
Deeba, Farah, Tefera, Getenet, Kun, She, Memon, Hira.  2019.  Protecting the Intellectual Properties of Digital Watermark Using Deep Neural Network. 2019 4th International Conference on Information Systems Engineering (ICISE). :91—95.

Recently in the vast advancement of Artificial Intelligence, Machine learning and Deep Neural Network (DNN) driven us to the robust applications. Such as Image processing, speech recognition, and natural language processing, DNN Algorithms has succeeded in many drawbacks; especially the trained DNN models have made easy to the researchers to produces state-of-art results. However, sharing these trained models are always a challenging task, i.e. security, and protection. We performed extensive experiments to present some analysis of watermark in DNN. We proposed a DNN model for Digital watermarking which investigate the intellectual property of Deep Neural Network, Embedding watermarks, and owner verification. This model can generate the watermarks to deal with possible attacks (fine tuning and train to embed). This approach is tested on the standard dataset. Hence this model is robust to above counter-watermark attacks. Our model accurately and instantly verifies the ownership of all the remotely expanded deep learning models without affecting the model accuracy for standard information data.

2020-07-27
Dangiwa, Bello Ahmed, Kumar, Smitha S.  2018.  A Business Card Reader Application for iOS devices based on Tesseract. 2018 International Conference on Signal Processing and Information Security (ICSPIS). :1–4.
As the accessibility of high-resolution smartphone camera has increased and an improved computational speed, it is now convenient to build Business Card Readers on mobile phones. The project aims to design and develop a Business Card Reader (BCR) Application for iOS devices, using an open-source OCR Engine - Tesseract. The system accuracy was tested and evaluated using a dataset of 55 digital business cards obtained from an online repository. The accuracy result of the system was up to 74% in terms of both text recognition and data detection. A comparative analysis was carried out against a commercial business card reader application and our application performed vastly reasonable.
2020-07-16
Pérez-Soler, Sara, Guerra, Esther, de Lara, Juan.  2019.  Flexible Modelling using Conversational Agents. 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). :478—482.

The advances in natural language processing and the wide use of social networks have boosted the proliferation of chatbots. These are software services typically embedded within a social network, and which can be addressed using conversation through natural language. Many chatbots exist with different purposes, e.g., to book all kind of services, to automate software engineering tasks, or for customer support. In previous work, we proposed the use of chatbots for domain-specific modelling within social networks. In this short paper, we report on the needs for flexible modelling required by modelling using conversation. In particular, we propose a process of meta-model relaxation to make modelling more flexible, followed by correction steps to make the model conforming to its meta-model. The paper shows how this process is integrated within our conversational modelling framework, and illustrates the approach with an example.

2020-07-06
Chai, Yadeng, Liu, Yong.  2019.  Natural Spoken Instructions Understanding for Robot with Dependency Parsing. 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). :866–871.
This paper presents a method based on syntactic information, which can be used for intent determination and slot filling tasks in a spoken language understanding system including the spoken instructions understanding module for robot. Some studies in recent years attempt to solve the problem of spoken language understanding via syntactic information. This research is a further extension of these approaches which is based on dependency parsing. In this model, the input for neural network are vectors generated by a dependency parsing tree, which we called window vector. This vector contains dependency features that improves performance of the syntactic-based model. The model has been evaluated on the benchmark ATIS task, and the results show that it outperforms many other syntactic-based approaches, especially in terms of slot filling, it has a performance level on par with some state of the art deep learning algorithms in recent years. Also, the model has been evaluated on FBM3, a dataset of the RoCKIn@Home competition. The overall rate of correctly understanding the instructions for robot is quite good but still not acceptable in practical use, which is caused by the small scale of FBM3.
2020-05-22
Platonov, A.V., Poleschuk, E.A., Bessmertny, I. A., Gafurov, N. R..  2018.  Using quantum mechanical framework for language modeling and information retrieval. 2018 IEEE 12th International Conference on Application of Information and Communication Technologies (AICT). :1—4.

This article shows the analogy between natural language texts and quantum-like systems on the example of the Bell test calculating. The applicability of the well-known Bell test for texts in Russian is investigated. The possibility of using this test for the text separation on the topics corresponding to the user query in information retrieval system is shown.

Geetha, R, Rekha, Pasupuleti, Karthika, S.  2018.  Twitter Opinion Mining and Boosting Using Sentiment Analysis. 2018 International Conference on Computer, Communication, and Signal Processing (ICCCSP). :1—4.

Social media has been one of the most efficacious and precise by speakers of public opinion. A strategy which sanctions the utilization and illustration of twitter data to conclude public conviction is discussed in this paper. Sentiments on exclusive entities with diverse strengths and intenseness are stated by public, where these sentiments are strenuously cognate to their personal mood and emotions. To examine the sentiments from natural language texts, addressing various opinions, a lot of methods and lexical resources have been propounded. A path for boosting twitter sentiment classification using various sentiment proportions as meta-level features has been proposed by this article. Analysis of tweets was done on the product iPhone 6.

Khadilkar, Kunal, Kulkarni, Siddhivinayak, Bone, Poojarani.  2018.  Plagiarism Detection Using Semantic Knowledge Graphs. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). :1—6.

Every day, huge amounts of unstructured text is getting generated. Most of this data is in the form of essays, research papers, patents, scholastic articles, book chapters etc. Many plagiarism softwares are being developed to be used in order to reduce the stealing and plagiarizing of Intellectual Property (IP). Current plagiarism softwares are mainly using string matching algorithms to detect copying of text from another source. The drawback of some of such plagiarism softwares is their inability to detect plagiarism when the structure of the sentence is changed. Replacement of keywords by their synonyms also fails to be detected by these softwares. This paper proposes a new method to detect such plagiarism using semantic knowledge graphs. The method uses Named Entity Recognition as well as semantic similarity between sentences to detect possible cases of plagiarism. The doubtful cases are visualized using semantic Knowledge Graphs for thorough analysis of authenticity. Rules for active and passive voice have also been considered in the proposed methodology.

Devarakonda, Ranjeet, Giansiracusa, Michael, Kumar, Jitendra.  2018.  Machine Learning and Social Media to Mine and Disseminate Big Scientific Data. 2018 IEEE International Conference on Big Data (Big Data). :5312—5315.

One of the challenges in supplying the communities with wider access to scientific databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it might not be practical to make the public aware of the structure of databases. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide a more intuitive method for generating database queries and delivering responses. Through social media, which makes it possible to interact with a wide section of the population, and with the help of natural language processing, researchers at the Atmospheric Radiation Measurement (ARM) Data Center at Oak Ridge National Laboratory (ORNL) have developed a concept to enable easy search and retrieval of data from several environmental data centers for the scientific community through social media.Using a machine learning framework that maps natural language text to thousands of datasets, instruments, variables, and data streams, the prototype system would allow users to request data through Twitter and receive a link (via tweet) to applicable data results on the project's search catalog tailored to their key words. This automated identification of relevant data from various petascale archives at ORNL could increase convenience, access, and use of the project's data by the broader community. In this paper we discuss how some data-intensive projects at ORNL are using innovative ways to help in data discovery.

Kate, Abhilasha, Kamble, Satish, Bodkhe, Aishwarya, Joshi, Mrunal.  2018.  Conversion of Natural Language Query to SQL Query. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). :488—491.

This paper present an approach to automate the conversion of Natural Language Query to SQL Query effectively. Structured Query Language is a powerful tool for managing data held in a relational database management system. To retrieve or manage data user have to enter the correct SQL Query. But the users who don't have any knowledge about SQL are unable to retrieve the required data. To overcome this we proposed a model in Natural Language Processing for converting the Natural Language Query to SQL query. This helps novice user to get required content without knowing any complex details about SQL. This system can also deal with complex queries. This system is designed for Training and Placement cell officers who work on student database but don't have any knowledge about SQL. In this system, user can also enter the query using speech. System will convert speech into the text format. This query will get transformed to SQL query. System will execute the query and gives output to the user.

2020-05-18
Bakhtin, Vadim V., Isaeva, Ekaterina V..  2019.  New TSBuilder: Shifting towards Cognition. 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :179–181.
The paper reviews a project on the automation of term system construction. TSBuilder (Term System Builder) was developed in 2014 as a multilayer Rosenblatt's perceptron for supervised machine learning, namely 1-3 word terms identification in natural language texts and their rigid categorization. The program is being modified to reduce the rigidity of categorization which will bring text mining more in line with human thinking.We are expanding the range of parameters (semantical, morphological, and syntactical) for categorization, removing the restriction of the term length of three words, using convolution on a continuous sequence of terms, and present the probabilities of a term falling into different categories. The neural network will not assign a single category to a term but give N answers (where N is the number of predefined classes), each of which O ∈ [0, 1] is the probability of the term to belong to a given class.
Panahandeh, Mahnaz, Ghanbari, Shirin.  2019.  Correction of Spaces in Persian Sentences for Tokenization. 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI). :670–674.
The exponential growth of the Internet and its users and the emergence of Web 2.0 have caused a large volume of textual data to be created. Automatic analysis of such data can be used in making decisions. As online text is created by different producers with different styles of writing, pre-processing is a necessity prior to any processes related to natural language tasks. An essential part of textual preprocessing prior to the recognition of the word vocabulary is normalization, which includes the correction of spaces that particularly in the Persian language this includes both full-spaces between words and half-spaces. Through the review of user comments within social media services, it can be seen that in many cases users do not adhere to grammatical rules of inserting both forms of spaces, which increases the complexity of the identification of words and henceforth, reducing the accuracy of further processing on the text. In this study, current issues in the normalization and tokenization of preprocessing tools within the Persian language and essentially identifying and correcting the separation of words are and the correction of spaces are proposed. The results obtained and compared to leading preprocessing tools highlight the significance of the proposed methodology.
Zhu, Meng, Yang, Xudong.  2019.  Chinese Texts Classification System. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :149–152.
In this article, we designed an automatic Chinese text classification system aiming to implement a system for classifying news texts. We propose two improved classification algorithms as two different choices for users to choose and then our system uses the chosen method for the obtaining of the classified result of the input text. There are two improved algorithms, one is k-Bayes using hierarchy conception based on NB method in machine learning field and another one adds attention layer to the convolutional neural network in deep learning field. Through experiments, our results showed that improved classification algorithms had better accuracy than based algorithms and our system is useful for making classifying news texts more reasonably and effectively.
Kermani, Fatemeh Hojati, Ghanbari, Shirin.  2019.  Extractive Persian Summarizer for News Websites. 2019 5th International Conference on Web Research (ICWR). :85–89.
Automatic extractive text summarization is the process of condensing textual information while preserving the important concepts. The proposed method after performing pre-processing on input Persian news articles generates a feature vector of salient sentences from a combination of statistical, semantic and heuristic methods and that are scored and concatenated accordingly. The scoring of the salient features is based on the article's title, proper nouns, pronouns, sentence length, keywords, topic words, sentence position, English words, and quotations. Experimental results on measurements including recall, F-measure, ROUGE-N are presented and compared to other Persian summarizers and shown to provide higher performance.
Lee, Hyun-Young, Kang, Seung-Shik.  2019.  Word Embedding Method of SMS Messages for Spam Message Filtering. 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). :1–4.
SVM has been one of the most popular machine learning method for the binary classification such as sentiment analysis and spam message filtering. We explored a word embedding method for the construction of a feature vector and the deep learning method for the binary classification. CBOW is used as a word embedding technique and feedforward neural network is applied to classify SMS messages into ham or spam. The accuracy of the two classification methods of SVM and neural network are compared for the binary classification. The experimental result shows that the accuracy of deep learning method is better than the conventional machine learning method of SVM-light in the binary classification.
Chen, Long.  2019.  Assertion Detection in Clinical Natural Language Processing: A Knowledge-Poor Machine Learning Approach. 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). :37–40.
Natural language processing (NLP) have been recently used to extract clinical information from free text in Electronic Health Record (EHR). In clinical NLP one challenge is that the meaning of clinical entities is heavily affected by assertion modifiers such as negation, uncertain, hypothetical, experiencer and so on. Incorrect assertion assignment could cause inaccurate diagnosis of patients' condition or negatively influence following study like disease modeling. Thus, clinical NLP systems which can detect assertion status of given target medical findings (e.g. disease, symptom) in clinical context are highly demanded. Here in this work, we propose a deep-learning system based on word embedding, RNN and attention mechanism (more specifically: Attention-based Bidirectional Long Short-Term Memory networks) for assertion detection in clinical notes. Unlike previous state-of-art methods which require knowledge input or feature engineering, our system is a knowledge poor machine learning system and can be easily extended or transferred to other domains. The evaluation of our system on public benchmarking corpora demonstrates that a knowledge poor deep-learning system can also achieve high performance for detecting negation and assertions comparing to state-of-the-art systems.
Sel, Slhami, Hanbay, Davut.  2019.  E-Mail Classification Using Natural Language Processing. 2019 27th Signal Processing and Communications Applications Conference (SIU). :1–4.
Thanks to the rapid increase in technology and electronic communications, e-mail has become a serious communication tool. In many applications such as business correspondence, reminders, academic notices, web page memberships, e-mail is used as primary way of communication. If we ignore spam e-mails, there remain hundreds of e-mails received every day. In order to determine the importance of received e-mails, the subject or content of each e-mail must be checked. In this study we proposed an unsupervised system to classify received e-mails. Received e-mails' coordinates are determined by a method of natural language processing called as Word2Vec algorithm. According to the similarities, processed data are grouped by k-means algorithm with an unsupervised training model. In this study, 10517 e-mails were used in training. The success of the system is tested on a test group of 200 e-mails. In the test phase M3 model (window size 3, min. Word frequency 10, Gram skip) consolidated the highest success (91%). Obtained results are evaluated in section VI.
Nambiar, Sindhya K, Leons, Antony, Jose, Soniya, Arunsree.  2019.  Natural Language Processing Based Part of Speech Tagger using Hidden Markov Model. 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :782–785.
In various natural language processing applications, PART-OF-SPEECH (POS) tagging is performed as a preprocessing step. For making POS tagging accurate, various techniques have been explored. But in Indian languages, not much work has been done. This paper describes the methods to build a Part of speech tagger by using hidden markov model. Supervised learning approach is implemented in which, already tagged sentences in malayalam is used to build hidden markov model.
Sharma, Sarika, Kumar, Deepak.  2019.  Agile Release Planning Using Natural Language Processing Algorithm. 2019 Amity International Conference on Artificial Intelligence (AICAI). :934–938.
Once the requirement is gathered in agile, it is broken down into smaller pre-defined format called user stories. These user stories are then scoped in various sprint releases and delivered accordingly. Release planning in Agile becomes challenging when the number of user stories goes up in hundreds. In such scenarios it is very difficult to manually identify similar user stories and package them together into a release. Hence, this paper suggests application of natural language processing algorithms for identifying similar user stories and then scoping them into a release This paper takes the approach to build a word corpus for every project release identified in the project and then to convert the provided user stories into a vector of string using Java utility for calculating top 3 most occurring words from the given project corpus in a user story. Once all the user stories are represented as vector array then by using RV coefficient NLP algorithm the user stories are clustered into various releases of the software project. Using the proposed approach, the release planning for large and complex software engineering projects can be simplified resulting into efficient planning in less time. The automated commercial tools like JIRA and Rally can be enhanced to include suggested algorithms for managing release planning in Agile.
Thejaswini, S, Indupriya, C.  2019.  Big Data Security Issues and Natural Language Processing. 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). :1307–1312.
Whenever we talk about big data, the concern is always about the security of the data. In recent days the most heard about technology is the Natural Language Processing. This new and trending technology helps in solving the ever ending security problems which are not completely solved using big data. Starting with the big data security issues, this paper deals with addressing the topics related to cyber security and information security using the Natural Language Processing technology. Including the well-known cyber-attacks such as phishing identification and spam detection, this paper also addresses issues on information assurance and security such as detection of Advanced Persistent Threat (APT) in DNS and vulnerability analysis. The goal of this paper is to provide the overview of how natural language processing can be used to address cyber security issues.
Liu, Xueqing.  2018.  Assisting the Development of Secure Mobile Apps with Natural Language Processing. 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). :279–280.
With the rapid growth of mobile devices and mobile apps, mobile has surpassed desktop and now has the largest worldwide market share [1]. While such growth brings in more opportunities, it also poses new challenges in security. Among the challenges, user privacy protection has drawn tremendous attention in recent years, especially after the Facebook-Cambridge Analytica data scandal in April 2018 [2].
Kadebu, Prudence, Thada, Vikas, Chiurunge, Panashe.  2018.  Natural Language Processing and Deep Learning Towards Security Requirements Classification. 2018 3rd International Conference on Contemporary Computing and Informatics (IC3I). :135–140.
Security Requirements classification is an important area to the Software Engineering community in order to build software that is secure, robust and able to withstand attacks. This classification facilitates proper analysis of security requirements so that adequate security mechanisms are incorporated in the development process. Machine Learning techniques have been used in Security Requirements classification to aid in the process that lead to ensuring that correct security mechanisms are designed corresponding to the Security Requirements classifications made to eliminate the risk of security being incorporated in the late stages of development. However, these Machine Learning techniques have been found to have problems including, handcrafting of features, overfitting and failure to perform well with high dimensional data. In this paper we explore Natural Language Processing and Deep Learning to determine if this can be applied to Security Requirements classification.
Fahad, S.K. Ahammad, Yahya, Abdulsamad Ebrahim.  2018.  Inflectional Review of Deep Learning on Natural Language Processing. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). :1–4.
In the age of knowledge, Natural Language Processing (NLP) express its demand by a huge range of utilization. Previously NLP was dealing with statically data. Contemporary time NLP is doing considerably with the corpus, lexicon database, pattern reorganization. Considering Deep Learning (DL) method recognize artificial Neural Network (NN) to nonlinear process, NLP tools become increasingly accurate and efficient that begin a debacle. Multi-Layer Neural Network obtaining the importance of the NLP for its capability including standard speed and resolute output. Hierarchical designs of data operate recurring processing layers to learn and with this arrangement of DL methods manage several practices. In this paper, this resumed striving to reach a review of the tools and the necessary methodology to present a clear understanding of the association of NLP and DL for truly understand in the training. Efficiency and execution both are improved in NLP by Part of speech tagging (POST), Morphological Analysis, Named Entity Recognition (NER), Semantic Role Labeling (SRL), Syntactic Parsing, and Coreference resolution. Artificial Neural Networks (ANN), Time Delay Neural Networks (TDNN), Recurrent Neural Network (RNN), Convolution Neural Networks (CNN), and Long-Short-Term-Memory (LSTM) dealings among Dense Vector (DV), Windows Approach (WA), and Multitask learning (MTL) as a characteristic of Deep Learning. After statically methods, when DL communicate the influence of NLP, the individual form of the NLP process and DL rule collaboration was started a fundamental connection.