Biblio

Found 168 results

Filters: Keyword is natural language processing  [Clear All Filters]
2022-09-09
Khan, Aazar Imran, Jain, Samyak, Sharma, Purushottam, Deep, Vikas, Mehrotra, Deepti.  2021.  Stylometric Analysis of Writing Patterns Using Artificial Neural Networks. 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). :29—35.
Plagiarism checkers have been widely used to verify the authenticity of dissertation/project submissions. However, when non-verbatim plagiarism or online examinations are considered, this practice is not the best solution. In this work, we propose a better authentication system for online examinations that analyses the submitted text's stylometry for a match of writing pattern of the author by whom the text was submitted. The writing pattern is analyzed over many indicators (i.e., features of one's writing style). This model extracts 27 such features and stores them as the writing pattern of an individual. Stylometric Analysis is a better approach to verify a document's authorship as it doesn't check for plagiarism, but verifies if the document was written by a particular individual and hence completely shuts down the possibility of using text-convertors or translators. This paper also includes a brief comparative analysis of some simpler algorithms for the same problem statement. These algorithms yield results that vary in precision and accuracy and hence plotting a conclusion from the comparison shows that the best bet to tackle this problem is through Artificial Neural Networks.
2022-03-10
Yang, Mengde.  2021.  A Survey on Few-Shot Learning in Natural Language Processing. 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA). :294—297.
The annotated dataset is the foundation for Supervised Natural Language Processing. However, the cost of obtaining dataset is high. In recent years, the Few-Shot Learning has gradually attracted the attention of researchers. From the definition, in this paper, we conclude the difference in Few-Shot Learning between Natural Language Processing and Computer Vision. On that basis, the current Few-Shot Learning on Natural Language Processing is summarized, including Transfer Learning, Meta Learning and Knowledge Distillation. Furthermore, we conclude the solutions to Few-Shot Learning in Natural Language Processing, such as the method based on Distant Supervision, Meta Learning and Knowledge Distillation. Finally, we present the challenges facing Few-Shot Learning in Natural Language Processing.
2022-05-19
Kuilboer, Jean-Pierre, Stull, Tristan.  2021.  Text Analytics and Big Data in the Financial domain. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI). :1–4.
This research attempts to provide some insights on the application of text mining and Natural Language Processing (NLP). The application domain is consumer complaints about financial institutions in the USA. As an advanced analytics discipline embedded within the Big Data paradigm, the practice of text analytics contains elements of emergent knowledge processes. Since our experiment should be able to scale up we make use of a pipeline based on Spark-NLP. The usage scenario is adapting the model to a specific industrial context and using the dataset offered by the "Consumer Financial Protection Bureau" to illustrate the application.
2022-01-31
Zhao, Rui.  2021.  The Vulnerability of the Neural Networks Against Adversarial Examples in Deep Learning Algorithms. 2021 2nd International Conference on Computing and Data Science (CDS). :287–295.
With the further development in the fields of computer vision, network security, natural language processing and so on so forth, deep learning technology gradually exposed certain security risks. The existing deep learning algorithms cannot effectively describe the essential characteristics of data, making the algorithm unable to give the correct result in the face of malicious input. Based on current security threats faced by deep learning, this paper introduces the problem of adversarial examples in deep learning, sorts out the existing attack and defense methods of black box and white box, and classifies them. It briefly describes the application of some adversarial examples in different scenarios in recent years, compares several defense technologies of adversarial examples, and finally summarizes the problems in this research field and prospects its future development. This paper introduces the common white box attack methods in detail, and further compares the similarities and differences between the attack of black and white boxes. Correspondingly, the author also introduces the defense methods, and analyzes the performance of these methods against the black and white box attack.
2022-02-24
Duan, Xuanyu, Ge, Mengmeng, Minh Le, Triet Huynh, Ullah, Faheem, Gao, Shang, Lu, Xuequan, Babar, M. Ali.  2021.  Automated Security Assessment for the Internet of Things. 2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC). :47–56.
Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and poten-tial vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.
2022-01-10
M, Babu, R, Hemchandhar, D, Harish Y., S, Akash, K, Abhishek Todi.  2021.  Voice Prescription with End-to-End Security Enhancements. 2021 6th International Conference on Communication and Electronics Systems (ICCES). :1–8.

The recent analysis indicates more than 250,000 people in the United States of America (USA) die every year because of medical errors. World Health Organisation (WHO) reports states that 2.6 million deaths occur due to medical and its prescription errors. Many of the errors related to the wrong drug/dosage administration by caregivers to patients due to indecipherable handwritings, drug interactions, confusing drug names, etc. The espousal of Mobile-based speech recognition applications will eliminate the errors. This allows physicians to narrate the prescription instead of writing. The application can be accessed through smartphones and can be used easily by everyone. An application program interface has been created for handling requests. Natural language processing is used to read text, interpret and determine the important words for generating prescriptions. The patient data is stored and used according to the Health Insurance Portability and Accountability Act of 1996 (HIPAA) guidelines. The SMS4-BSK encryption scheme is used to provide the data transmission securely over Wireless LAN.

2022-03-10
Ozan, Şükrü, Taşar, D. Emre.  2021.  Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods. 2021 29th Signal Processing and Communications Applications Conference (SIU). :1—4.
In this study, we aim to find a method to autotag sentences specific to a domain. Our training data comprises short conversational sentences extracted from chat conversations between company's customer representatives and web site visitors. We manually tagged approximately 14 thousand visitor inputs into ten basic categories, which will later be used in a transformer-based language model with attention mechanisms for the ultimate goal of developing a chatbot application that can produce meaningful dialogue.We considered three different stateof- the-art models and reported their auto-tagging capabilities. We achieved the best performance with the bidirectional encoder representation from transformers (BERT) model. Implementation of the models used in these experiments can be cloned from our GitHub repository and tested for similar auto-tagging problems without much effort.
2022-07-15
Wang, Shilei, Wang, Hui, Yu, Hongtao, Zhang, Fuzhi.  2021.  Detecting shilling groups in recommender systems based on hierarchical topic model. 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :832—837.
In a group shilling attack, attackers work collaboratively to inject fake profiles aiming to obtain desired recommendation result. This type of attacks is more harmful to recommender systems than individual shilling attacks. Previous studies pay much attention to detect individual attackers, and little work has been done on the detection of shilling groups. In this work, we introduce a topic modeling method of natural language processing into shilling attack detection and propose a shilling group detection method on the basis of hierarchical topic model. First, we model the given dataset to a series of user rating documents and use the hierarchical topic model to learn the specific topic distributions of each user from these rating documents to describe user rating behaviors. Second, we divide candidate groups based on rating value and rating time which are not involved in the hierarchical topic model. Lastly, we calculate group suspicious degrees in accordance with several indicators calculated through the analysis of user rating distributions, and use the k-means clustering algorithm to distinguish shilling groups. The experimental results on the Netflix and Amazon datasets show that the proposed approach performs better than baseline methods.
2022-03-10
Zhang, Zhongtang, Liu, Shengli, Yang, Qichao, Guo, Shichen.  2021.  Semantic Understanding of Source and Binary Code based on Natural Language Processing. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:2010—2016.
With the development of open source projects, a large number of open source codes will be reused in binary software, and bugs in source codes will also be introduced into binary codes. In order to detect the reused open source codes in binary codes, it is sometimes necessary to compare and analyze the similarity between source codes and binary codes. One of the main challenge is that the compilation process can generate different binary code representations for the same source code, such as different compiler versions, compilation optimization options and target architectures, which greatly increases the difficulty of semantic similarity detection between source code and binary code. In order to solve the influence of the compilation process on the comparison of semantic similarity of codes, this paper transforms the source code and binary code into LLVM intermediate representation (LLVM IR), which is a universal intermediate representation independent of source code and binary code. We carry out semantic feature extraction and embedding training on LLVM IR based on natural language processing model. Experimental results show that LLVM IR eliminates the influence of compilation on the syntax differences between source code and binary code, and the semantic features of code are well represented and preserved.
2021-12-21
Li, Kemeng, Zheng, Dong, Guo, Rui.  2021.  An Anonymous Editable Blockchain Scheme Based on Certificateless Aggregate Signature. 2021 3rd International Conference on Natural Language Processing (ICNLP). :57–67.
Blockchain technology has gradually replaced traditional centralized data storage methods, and provided people reliable data storage services with its decentralized and non-tamperable features. However, the current blockchain data supervision is insufficient and the data cannot be modified once it is on the blockchain, which will cause the blockchain system to face various problems such as illegal information cannot be deleted and breach of smart contract cannot be fixed in time. To address these issues, we propose an anonymous editable blockchain scheme based on the reconstruction of the blockchain structure of the SpaceMint combining with the certificateless aggregate signature algorithm. Users register with their real identities and use pseudonyms in the system to achieve their anonymity. If the number of users who agree to edit meets the threshold, the data on the blockchain can be modified or deleted, and our scheme has the function of accountability for malicious behavior. The security analysis show that the proposed certificateless aggregate signature algorithm enjoys the unforgeability under the adaptive selected message attack. Moreover, the method of setting the threshold of related users is adopted to guarantee the effectiveness and security of editing blockchain data. At last, we evaluate the performance of our certificateless aggregate signature algorithm and related schemes in theoretical analysis and experimental simulation, which demonstrates our scheme is feasible and efficient in storage, bandwidth and computational cost.
2022-04-25
Khalil, Hady A., Maged, Shady A..  2021.  Deepfakes Creation and Detection Using Deep Learning. 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :1–4.
Deep learning has been used in a wide range of applications like computer vision, natural language processing and image detection. The advancement in deep learning algorithms in image detection and manipulation has led to the creation of deepfakes, deepfakes use deep learning algorithms to create fake images that are at times very hard to distinguish from real images. With the rising concern around personal privacy and security, Many methods to detect deepfake images have emerged, in this paper the use of deep learning for creating as well as detecting deepfakes is explored, this paper also propose the use of deep learning image enhancement method to improve the quality of deepfakes created.
2022-04-19
Srinivasan, Sudarshan, Begoli, Edmon, Mahbub, Maria, Knight, Kathryn.  2021.  Nomen Est Omen - The Role of Signatures in Ascribing Email Author Identity with Transformer Neural Networks. 2021 IEEE Security and Privacy Workshops (SPW). :291–297.
Authorship attribution, an NLP problem where anonymous text is matched to its author, has important, cross-disciplinary applications, particularly those concerning cyber-defense. Our research examines the degree of sensitivity that attention-based models have to adversarial perturbations. We ask, what is the minimal amount of change necessary to maximally confuse a transformer model? In our investigation we examine a balanced subset of emails from the Enron email dataset, calculating the performance of our model before and after email signatures have been perturbed. Results show that the model's performance changed significantly in the absence of a signature, indicating the importance of email signatures in email authorship detection. Furthermore, we show that these models rely on signatures for shorter emails much more than for longer emails. We also indicate that additional research is necessary to investigate stylometric features and adversarial training to further improve classification model robustness.
2022-03-10
Qin, Shuangling, Xu, Chaozhi, Zhang, Fang, Jiang, Tao, Ge, Wei, Li, Jihong.  2021.  Research on Application of Chinese Natural Language Processing in Constructing Knowledge Graph of Chronic Diseases. 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). :271—274.
Knowledge Graph can describe the concepts in the objective world and the relationships between these concepts in a structured way, and identify, discover and infer the relationships between things and concepts. It has been developed in the field of medical and health care. In this paper, the method of natural language processing has been used to build chronic disease knowledge graph, such as named entity recognition, relationship extraction. This method is beneficial to forecast analysis of chronic disease, network monitoring, basic education, etc. The research of this paper can greatly help medical experts in the treatment of chronic disease treatment, and assist primary clinicians with making more scientific decision, and can help Patients with chronic diseases to improve medical efficiency. In the end, it also has practical significance for clinical scientific research of chronic disease.
Pölöskei, István.  2021.  Continuous natural language processing pipeline strategy. 2021 IEEE 15th International Symposium on Applied Computational Intelligence and Informatics (SACI). :000221—000224.
Natural language processing (NLP) is a division of artificial intelligence. The constructed model's quality is entirely reliant on the training dataset's quality. A data streaming pipeline is an adhesive application, completing a managed connection from data sources to machine learning methods. The recommended NLP pipeline composition has well-defined procedures. The implemented message broker design is a usual apparatus for delivering events. It makes it achievable to construct a robust training dataset for machine learning use-case and serve the model's input. The reconstructed dataset is a valid input for the machine learning processes. Based on the data pipeline's product, the model recreation and redeployment can be scheduled automatically.
2022-05-19
Anusha, M, Leelavathi, R.  2021.  Analysis on Sentiment Analytics Using Deep Learning Techniques. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :542–547.
Sentiment analytics is the process of applying natural language processing and methods for text-based information to define and extract subjective knowledge of the text. Natural language processing and text classifications can deal with limited corpus data and more attention has been gained by semantic texts and word embedding methods. Deep learning is a powerful method that learns different layers of representations or qualities of information and produces state-of-the-art prediction results. In different applications of sentiment analytics, deep learning methods are used at the sentence, document, and aspect levels. This review paper is based on the main difficulties in the sentiment assessment stage that significantly affect sentiment score, pooling, and polarity detection. The most popular deep learning methods are a Convolution Neural Network and Recurrent Neural Network. Finally, a comparative study is made with a vast literature survey using deep learning models.
2022-02-24
Paudel, Upakar, Dolan, Andy, Majumdar, Suryadipta, Ray, Indrakshi.  2021.  Context-Aware IoT Device Functionality Extraction from Specifications for Ensuring Consumer Security. 2021 IEEE Conference on Communications and Network Security (CNS). :155–163.
Internet of Thing (IoT) devices are being widely used in smart homes and organizations. An IoT device has some intended purposes, but may also have hidden functionalities. Typically, the device is installed in a home or an organization and the network traffic associated with the device is captured and analyzed to infer high-level functionality to the extent possible. However, such analysis is dynamic in nature, and requires the installation of the device and access to network data which is often hard to get for privacy and confidentiality reasons. We propose an alternative static approach which can infer the functionality of a device from vendor materials using Natural Language Processing (NLP) techniques. Information about IoT device functionality can be used in various applications, one of which is ensuring security in a smart home. We demonstrate how security policies associated with device functionality in a smart home can be formally represented using the NIST Next Generation Access Control (NGAC) model and automatically analyzed using Alloy, which is a formal verification tool. This will provide assurance to the consumer that these devices will be compliant to the home or organizational policy even before they have been purchased.
2021-08-05
Alecakir, Huseyin, Kabukcu, Muhammet, Can, Burcu, Sen, Sevil.  2020.  Discovering Inconsistencies between Requested Permissions and Application Metadata by using Deep Learning. 2020 International Conference on Information Security and Cryptology (ISCTURKEY). :56—56.
Android gives us opportunity to extract meaningful information from metadata. From the security point of view, the missing important information in metadata of an application could be a sign of suspicious application, which could be directed for extensive analysis. Especially the usage of dangerous permissions is expected to be explained in app descriptions. The permission-to-description fidelity problem in the literature aims to discover such inconsistencies between the usage of permissions and descriptions. This study proposes a new method based on natural language processing and recurrent neural networks. The effect of user reviews on finding such inconsistencies is also investigated in addition to application descriptions. The experimental results show that high precision is obtained by the proposed solution, and the proposed method could be used for triage of Android applications.
2021-11-29
Hu, Shengze, He, Chunhui, Ge, Bin, Liu, Fang.  2020.  Enhanced Word Embedding Method in Text Classification. 2020 6th International Conference on Big Data and Information Analytics (BigDIA). :18–22.
For the task of natural language processing (NLP), Word embedding technology has a certain impact on the accuracy of deep neural network algorithms. Considering that the current word embedding method cannot realize the coexistence of words and phrases in the same vector space. Therefore, we propose an enhanced word embedding (EWE) method. Before completing the word embedding, this method introduces a unique sentence reorganization technology to rewrite all the sentences in the original training corpus. Then, all the original corpus and the reorganized corpus are merged together as the training corpus of the distributed word embedding model, so as to realize the coexistence problem of words and phrases in the same vector space. We carried out experiment to demonstrate the effectiveness of the EWE algorithm on three classic benchmark datasets. The results show that the EWE method can significantly improve the classification performance of the CNN model.
2021-08-05
Ramasubramanian, Muthukumaran, Muhammad, Hassan, Gurung, Iksha, Maskey, Manil, Ramachandran, Rahul.  2020.  ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings. 2020 SoutheastCon. :1—6.
Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of keywords. The keywords are integral part of search and discovery of these datasets. Hence, selection of keywords are crucial in increasing the discoverability of datasets. Utilizing machine learning techniques, we provide users with automated keyword suggestions as an improved approach to complement manual selection. We trained a machine learning model that leverages the semantic embedding ability of Word2Vec models to process abstracts and suggest relevant keywords. A user interface tool we built to assist data curators in assignment of such keywords is also described.
2021-06-01
Cideron, Geoffrey, Seurin, Mathieu, Strub, Florian, Pietquin, Olivier.  2020.  HIGhER: Improving instruction following with Hindsight Generation for Experience Replay. 2020 IEEE Symposium Series on Computational Intelligence (SSCI). :225–232.
Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios. This joint learning problem is alleviated through expert demonstrations, auxiliary losses, or neural inductive biases. In this paper, we propose an orthogonal approach called Hindsight Generation for Experience Replay (HIGhER) that extends the Hindsight Experience Replay approach to the language-conditioned policy setting. Whenever the agent does not fulfill its instruction, HIGhER learns to output a new directive that matches the agent trajectory, and it relabels the episode with a positive reward. To do so, HIGhER learns to map a state into an instruction by using past successful trajectories, which removes the need to have external expert interventions to relabel episodes as in vanilla HER. We show the efficiency of our approach in the BabyAI environment, and demonstrate how it complements other instruction following methods.
2021-02-22
Eftimie, S., Moinescu, R., Rǎcuciu, C..  2020.  Insider Threat Detection Using Natural Language Processing and Personality Profiles. 2020 13th International Conference on Communications (COMM). :325–330.
This work represents an interdisciplinary effort to proactively identify insider threats, using natural language processing and personality profiles. Profiles were developed for the relevant insider threat types using the five-factor model of personality and were used in a proof-of-concept detection system. The system employs a third-party cloud service that uses natural language processing to analyze personality profiles based on personal content. In the end, an assessment was made over the feasibility of the system using a public dataset.
Bhagat, V., J, B. R..  2020.  Natural Language Processing on Diverse Data Layers Through Microservice Architecture. 2020 IEEE International Conference for Innovation in Technology (INOCON). :1–6.
With the rapid growth in Natural Language Processing (NLP), all types of industries find a need for analyzing a massive amount of data. Sentiment analysis is becoming a more exciting area for the businessmen and researchers in Text mining & NLP. This process includes the calculation of various sentiments with the help of text mining. Supplementary to this, the world is connected through Information Technology and, businesses are moving toward the next step of the development to make their system more intelligent. Microservices have fulfilled the need for development platforms which help the developers to use various development tools (Languages and applications) efficiently. With the consideration of data analysis for business growth, data security becomes a major concern in front of developers. This paper gives a solution to keep the data secured by providing required access to data scientists without disturbing the base system software. This paper has discussed data storage and exchange policies of microservices through common JavaScript Object Notation (JSON) response which performs the sentiment analysis of customer's data fetched from various microservices through secured APIs.
2021-03-29
Chauhan, R., Heydari, S. Shah.  2020.  Polymorphic Adversarial DDoS attack on IDS using GAN. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–6.
Intrusion Detection systems are important tools in preventing malicious traffic from penetrating into networks and systems. Recently, Intrusion Detection Systems are rapidly enhancing their detection capabilities using machine learning algorithms. However, these algorithms are vulnerable to new unknown types of attacks that can evade machine learning IDS. In particular, they may be vulnerable to attacks based on Generative Adversarial Networks (GAN). GANs have been widely used in domains such as image processing, natural language processing to generate adversarial data of different types such as graphics, videos, texts, etc. We propose a model using GAN to generate adversarial DDoS attacks that can change the attack profile and can be undetected. Our simulation results indicate that by continuous changing of attack profile, defensive systems that use incremental learning will still be vulnerable to new attacks.
2021-02-22
Si, Y., Zhou, W., Gai, J..  2020.  Research and Implementation of Data Extraction Method Based on NLP. 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :11–15.
In order to accurately extract the data from unstructured Chinese text, this paper proposes a rule-based method based on natural language processing and regular expression. This method makes use of the language expression rules of the data in the text and other related knowledge to form the feature word lists and rule template to match the text. Experimental results show that the accuracy of the designed algorithm is 94.09%.
Lansley, M., Kapetanakis, S., Polatidis, N..  2020.  SEADer++ v2: Detecting Social Engineering Attacks using Natural Language Processing and Machine Learning. 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA). :1–6.
Social engineering attacks are well known attacks in the cyberspace and relatively easy to try and implement because no technical knowledge is required. In various online environments such as business domains where customers talk through a chat service with employees or in social networks potential hackers can try to manipulate other people by employing social attacks against them to gain information that will benefit them in future attacks. Thus, we have used a number of natural language processing steps and a machine learning algorithm to identify potential attacks. The proposed method has been tested on a semi-synthetic dataset and it is shown to be both practical and effective.