Visible to the public Biblio

Found 165 results

Filters: Keyword is natural language processing  [Clear All Filters]
2023-09-20
Rawat, Amarjeet, Maheshwari, Himani, Khanduja, Manisha, Kumar, Rajiv, Memoria, Minakshi, Kumar, Sanjeev.  2022.  Sentiment Analysis of Covid19 Vaccines Tweets Using NLP and Machine Learning Classifiers. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). 1:225—230.
Sentiment Analysis (SA) is an approach for detecting subjective information such as thoughts, outlooks, reactions, and emotional state. The majority of previous SA work treats it as a text-classification problem that requires labelled input to train the model. However, obtaining a tagged dataset is difficult. We will have to do it by hand the majority of the time. Another concern is that the absence of sufficient cross-domain portability creates challenging situation to reuse same-labelled data across applications. As a result, we will have to manually classify data for each domain. This research work applies sentiment analysis to evaluate the entire vaccine twitter dataset. The work involves the lexicon analysis using NLP libraries like neattext, textblob and multi class classification using BERT. This word evaluates and compares the results of the machine learning algorithms.
2023-09-01
Sumoto, Kensuke, Kanakogi, Kenta, Washizaki, Hironori, Tsuda, Naohiko, Yoshioka, Nobukazu, Fukazawa, Yoshiaki, Kanuka, Hideyuki.  2022.  Automatic labeling of the elements of a vulnerability report CVE with NLP. 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI). :164—165.
Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.
2023-07-20
Khokhlov, Igor, Okutan, Ahmet, Bryla, Ryan, Simmons, Steven, Mirakhorli, Mehdi.  2022.  Automated Extraction of Software Names from Vulnerability Reports using LSTM and Expert System. 2022 IEEE 29th Annual Software Technology Conference (STC). :125—134.
Software vulnerabilities are closely monitored by the security community to timely address the security and privacy issues in software systems. Before a vulnerability is published by vulnerability management systems, it needs to be characterized to highlight its unique attributes, including affected software products and versions, to help security professionals prioritize their patches. Associating product names and versions with disclosed vulnerabilities may require a labor-intensive process that may delay their publication and fix, and thereby give attackers more time to exploit them. This work proposes a machine learning method to extract software product names and versions from unstructured CVE descriptions automatically. It uses Word2Vec and Char2Vec models to create context-aware features from CVE descriptions and uses these features to train a Named Entity Recognition (NER) model using bidirectional Long short-term memory (LSTM) networks. Based on the attributes of the product names and versions in previously published CVE descriptions, we created a set of Expert System (ES) rules to refine the predictions of the NER model and improve the performance of the developed method. Experiment results on real-life CVE examples indicate that using the trained NER model and the set of ES rules, software names and versions in unstructured CVE descriptions could be identified with F-Measure values above 0.95.
2023-06-29
Rahman, Md. Shahriar, Ashraf, Faisal Bin, Kabir, Md. Rayhan.  2022.  An Efficient Deep Learning Technique for Bangla Fake News Detection. 2022 25th International Conference on Computer and Information Technology (ICCIT). :206–211.

People connect with a plethora of information from many online portals due to the availability and ease of access to the internet and electronic communication devices. However, news portals sometimes abuse press freedom by manipulating facts. Most of the time, people are unable to discriminate between true and false news. It is difficult to avoid the detrimental impact of Bangla fake news from spreading quickly through online channels and influencing people’s judgment. In this work, we investigated many real and false news pieces in Bangla to discover a common pattern for determining if an article is disseminating incorrect information or not. We developed a deep learning model that was trained and validated on our selected dataset. For learning, the dataset contains 48,678 legitimate news and 1,299 fraudulent news. To deal with the imbalanced data, we used random undersampling and then ensemble to achieve the combined output. In terms of Bangla text processing, our proposed model achieved an accuracy of 98.29% and a recall of 99%.

Matheven, Anand, Kumar, Burra Venkata Durga.  2022.  Fake News Detection Using Deep Learning and Natural Language Processing. 2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI). :11–14.

The rise of social media has brought the rise of fake news and this fake news comes with negative consequences. With fake news being such a huge issue, efforts should be made to identify any forms of fake news however it is not so simple. Manually identifying fake news can be extremely subjective as determining the accuracy of the information in a story is complex and difficult to perform, even for experts. On the other hand, an automated solution would require a good understanding of NLP which is also complex and may have difficulties producing an accurate output. Therefore, the main problem focused on this project is the viability of developing a system that can effectively and accurately detect and identify fake news. Finding a solution would be a significant benefit to the media industry, particularly the social media industry as this is where a large proportion of fake news is published and spread. In order to find a solution to this problem, this project proposed the development of a fake news identification system using deep learning and natural language processing. The system was developed using a Word2vec model combined with a Long Short-Term Memory model in order to showcase the compatibility of the two models in a whole system. This system was trained and tested using two different dataset collections that each consisted of one real news dataset and one fake news dataset. Furthermore, three independent variables were chosen which were the number of training cycles, data diversity and vector size to analyze the relationship between these variables and the accuracy levels of the system. It was found that these three variables did have a significant effect on the accuracy of the system. From this, the system was then trained and tested with the optimal variables and was able to achieve the minimum expected accuracy level of 90%. The achieving of this accuracy levels confirms the compatibility of the LSTM and Word2vec model and their capability to be synergized into a single system that is able to identify fake news with a high level of accuracy.

ISSN: 2640-0146

2023-06-09
Sun, Zeyu, Zhang, Chi.  2022.  Research on Relation Extraction of Fusion Entity Enhancement and Shortest Dependency Path based on BERT. 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 10:766—770.
Deep learning models rely on single word features and location features of text to achieve good results in text relation extraction tasks. However, previous studies have failed to make full use of semantic information contained in sentence dependency syntax trees, and data sparseness and noise propagation still affect classification models. The BERT(Bidirectional Encoder Representations from Transformers) pretrained language model provides a better representation of natural language processing tasks. And entity enhancement methods have been proved to be effective in relation extraction tasks. Therefore, this paper proposes a combination of the shortest dependency path and entity-enhanced BERT pre-training language model for model construction to reduce the impact of noise terms on the classification model and obtain more semantically expressive feature representation. The algorithm is tested on SemEval-2010 Task 8 English relation extraction dataset, and the F1 value of the final experiment can reach 0. 881.
2023-06-02
Labrador, Víctor, Pastrana, Sergio.  2022.  Examining the trends and operations of modern Dark-Web marketplaces. 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). :163—172.

Currently, the Dark Web is one key platform for the online trading of illegal products and services. Analysing the .onion sites hosting marketplaces is of interest for law enforcement and security researchers. This paper presents a study on 123k listings obtained from 6 different Dark Web markets. While most of current works leverage existing datasets, these are outdated and might not contain new products, e.g., those related to the 2020 COVID pandemic. Thus, we build a custom focused crawler to collect the data. Being able to conduct analyses on current data is of considerable importance as these marketplaces continue to change and grow, both in terms of products offered and users. Also, there are several anti-crawling mechanisms being improved, making this task more difficult and, consequently, reducing the amount of data obtained in recent years on these marketplaces. We conduct a data analysis evaluating multiple characteristics regarding the products, sellers, and markets. These characteristics include, among others, the number of sales, existing categories in the markets, the origin of the products and the sellers. Our study sheds light on the products and services being offered in these markets nowadays. Moreover, we have conducted a case study on one particular productive and dynamic drug market, i.e., Cannazon. Our initial goal was to understand its evolution over time, analyzing the variation of products in stock and their price longitudinally. We realized, though, that during the period of study the market suffered a DDoS attack which damaged its reputation and affected users' trust on it, which was a potential reason which lead to the subsequent closure of the market by its operators. Consequently, our study provides insights regarding the last days of operation of such a productive market, and showcases the effectiveness of a potential intervention approach by means of disrupting the service and fostering mistrust.

2023-05-12
Kostis, Ioannis - Aris, Karamitsios, Konstantinos, Kotrotsios, Konstantinos, Tsolaki, Magda, Tsolaki, Anthoula.  2022.  AI-Enabled Conversational Agents in Service of Mild Cognitive Impairment Patients. 2022 International Conference on Electrical and Information Technology (IEIT). :69–74.
Over the past two decades, several forms of non-intrusive technology have been deployed in cooperation with medical specialists in order to aid patients diagnosed with some form of mental, cognitive or psychological condition. Along with the availability and accessibility to applications offered by mobile devices, as well as the advancements in the field of Artificial Intelligence applications and Natural Language Processing, Conversational Agents have been developed with the objective of aiding medical specialists detecting those conditions in their early stages and monitoring their symptoms and effects on the cognitive state of the patient, as well as supporting the patient in their effort of mitigating those symptoms. Coupled with the recent advances in the the scientific field of machine and deep learning, we aim to explore the grade of applicability of such technologies into cognitive health support Conversational Agents, and their impact on the acceptability of such applications bytheir end users. Therefore, we conduct a systematic literature review, following a transparent and thorough process in order to search and analyze the bibliography of the past five years, focused on the implementation of Conversational Agents, supported by Artificial Intelligence technologies and in service of patients diagnosed with Mild Cognitive Impairment and its variants.
Borg, Markus, Bengtsson, Johan, Österling, Harald, Hagelborn, Alexander, Gagner, Isabella, Tomaszewski, Piotr.  2022.  Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice. 2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN). :22–32.
Due to the migration megatrend, efficient and effective second-language acquisition is vital. One proposed solution involves AI-enabled conversational agents for person-centered interactive language practice. We present results from ongoing action research targeting quality assurance of proprietary generative dialog models trained for virtual job interviews. The action team elicited a set of 38 requirements for which we designed corresponding automated test cases for 15 of particular interest to the evolving solution. Our results show that six of the test case designs can detect meaningful differences between candidate models. While quality assurance of natural language processing applications is complex, we provide initial steps toward an automated framework for machine learning model selection in the context of an evolving conversational agent. Future work will focus on model selection in an MLOps setting.
2023-04-28
Suryotrisongko, Hatma, Ginardi, Hari, Ciptaningtyas, Henning Titi, Dehqan, Saeed, Musashi, Yasuo.  2022.  Topic Modeling for Cyber Threat Intelligence (CTI). 2022 Seventh International Conference on Informatics and Computing (ICIC). :1–7.
Topic modeling algorithms from the natural language processing (NLP) discipline have been used for various applications. For instance, topic modeling for the product recommendation systems in the e-commerce systems. In this paper, we briefly reviewed topic modeling applications and then described our proposed idea of utilizing topic modeling approaches for cyber threat intelligence (CTI) applications. We improved the previous work by implementing BERTopic and Top2Vec approaches, enabling users to select their preferred pre-trained text/sentence embedding model, and supporting various languages. We implemented our proposed idea as the new topic modeling module for the Open Web Application Security Project (OWASP) Maryam: Open-Source Intelligence (OSINT) framework. We also described our experiment results using a leaked hacker forum dataset (nulled.io) to attract more researchers and open-source communities to participate in the Maryam project of OWASP Foundation.
2023-02-03
Oldal, Laura Gulyás, Kertész, Gábor.  2022.  Evaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts. 2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC). :000161–000166.
The range of text analysis methods in the field of natural language processing (NLP) has become more and more extensive thanks to the increasing computational resources of the 21st century. As a result, many deep learning-based solutions have been proposed for the purpose of authorship attribution, as they offer more flexibility and automated feature extraction compared to traditional statistical methods. A number of solutions have appeared for the attribution of English texts, however, the number of methods designed for Hungarian language is extremely small. Hungarian is a morphologically rich language, sentence formation is flexible and the alphabet is different from other languages. Furthermore, a language specific POS tagger, pretrained word embeddings, dependency parser, etc. are required. As a result, methods designed for other languages cannot be directly applied on Hungarian texts. In this paper, we review deep learning-based authorship attribution methods for English texts and offer techniques for the adaptation of these solutions to Hungarian language. As a part of the paper, we collected a new dataset consisting of Hungarian literary works of 15 authors. In addition, we extensively evaluate the implemented methods on the new dataset.
Ouamour, S., Sayoud, H..  2022.  Computational Identification of Author Style on Electronic Libraries - Case of Lexical Features. 2022 5th International Symposium on Informatics and its Applications (ISIA). :1–4.
In the present work, we intend to present a thorough study developed on a digital library, called HAT corpus, for a purpose of authorship attribution. Thus, a dataset of 300 documents that are written by 100 different authors, was extracted from the web digital library and processed for a task of author style analysis. All the documents are related to the travel topic and written in Arabic. Basically, three important rules in stylometry should be respected: the minimum document size, the same topic for all documents and the same genre too. In this work, we made a particular effort to respect those conditions seriously during the corpus preparation. That is, three lexical features: Fixed-length words, Rare words and Suffixes are used and evaluated by using a centroid based Manhattan distance. The used identification approach shows interesting results with an accuracy of about 0.94.
Praveen, Sivakami, Dcouth, Alysha, Mahesh, A S.  2022.  NoSQL Injection Detection Using Supervised Text Classification. 2022 2nd International Conference on Intelligent Technologies (CONIT). :1–5.
For a long time, SQL injection has been considered one of the most serious security threats. NoSQL databases are becoming increasingly popular as big data and cloud computing technologies progress. NoSQL injection attacks are designed to take advantage of applications that employ NoSQL databases. NoSQL injections can be particularly harmful because they allow unrestricted code execution. In this paper we use supervised learning and natural language processing to construct a model to detect NoSQL injections. Our model is designed to work with MongoDB, CouchDB, CassandraDB, and Couchbase queries. Our model has achieved an F1 score of 0.95 as established by 10-fold cross validation.
2022-12-01
Yu, Jialin, Cristea, Alexandra I., Harit, Anoushka, Sun, Zhongtian, Aduragba, Olanrewaju Tahir, Shi, Lei, Moubayed, Noura Al.  2022.  INTERACTION: A Generative XAI Framework for Natural Language Inference Explanations. 2022 International Joint Conference on Neural Networks (IJCNN). :1—8.
XAI with natural language processing aims to produce human-readable explanations as evidence for AI decision-making, which addresses explainability and transparency. However, from an HCI perspective, the current approaches only focus on delivering a single explanation, which fails to account for the diversity of human thoughts and experiences in language. This paper thus addresses this gap, by proposing a generative XAI framework, INTERACTION (explain aNd predicT thEn queRy with contextuAl CondiTional varIational autO-eNcoder). Our novel framework presents explanation in two steps: (step one) Explanation and Label Prediction; and (step two) Diverse Evidence Generation. We conduct intensive experiments with the Transformer architecture on a benchmark dataset, e-SNLI [1]. Our method achieves competitive or better performance against state-of-the-art baseline models on explanation generation (up to 4.7% gain in BLEU) and prediction (up to 4.4% gain in accuracy) in step one; it can also generate multiple diverse explanations in step two.
2022-09-09
Khan, Aazar Imran, Jain, Samyak, Sharma, Purushottam, Deep, Vikas, Mehrotra, Deepti.  2021.  Stylometric Analysis of Writing Patterns Using Artificial Neural Networks. 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). :29—35.
Plagiarism checkers have been widely used to verify the authenticity of dissertation/project submissions. However, when non-verbatim plagiarism or online examinations are considered, this practice is not the best solution. In this work, we propose a better authentication system for online examinations that analyses the submitted text's stylometry for a match of writing pattern of the author by whom the text was submitted. The writing pattern is analyzed over many indicators (i.e., features of one's writing style). This model extracts 27 such features and stores them as the writing pattern of an individual. Stylometric Analysis is a better approach to verify a document's authorship as it doesn't check for plagiarism, but verifies if the document was written by a particular individual and hence completely shuts down the possibility of using text-convertors or translators. This paper also includes a brief comparative analysis of some simpler algorithms for the same problem statement. These algorithms yield results that vary in precision and accuracy and hence plotting a conclusion from the comparison shows that the best bet to tackle this problem is through Artificial Neural Networks.
Frankel, Sophia F., Ghosh, Krishnendu.  2021.  Machine Learning Approaches for Authorship Attribution using Source Code Stylometry. 2021 IEEE International Conference on Big Data (Big Data). :3298—3304.
Identification of source code authorship is vital for attribution. In this work, a machine learning framework is described to identify source code authorship. The framework integrates the features extracted using natural language processing based approaches and abstract syntax tree of the code. We evaluate the methodology on Google Code Jam dataset. We present the performance measures of the logistic regression and deep learning on the dataset.
2022-08-26
Christopherjames, Jim Elliot, Saravanan, Mahima, Thiyam, Deepa Beeta, S, Prasath Alias Surendhar, Sahib, Mohammed Yashik Basheer, Ganapathi, Manju Varrshaa, Milton, Anisha.  2021.  Natural Language Processing based Human Assistive Health Conversational Agent for Multi-Users. 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC). :1414–1420.
Background: Most of the people are not medically qualified for studying or understanding the extremity of their diseases or symptoms. This is the place where natural language processing plays a vital role in healthcare. These chatbots collect patients' health data and depending on the data, these chatbot give more relevant data to patients regarding their body conditions and recommending further steps also. Purposes: In the medical field, AI powered healthcare chatbots are beneficial for assisting patients and guiding them in getting the most relevant assistance. Chatbots are more useful for online search that users or patients go through when patients want to know for their health symptoms. Methods: In this study, the health assistant system was developed using Dialogflow application programming interface (API) which is a Google's Natural language processing powered algorithm and the same is deployed on google assistant, telegram, slack, Facebook messenger, and website and mobile app. With this web application, a user can make health requests/queries via text message and might also get relevant health suggestions/recommendations through it. Results: This chatbot acts like an informative and conversational chatbot. This chatbot provides medical knowledge such as disease symptoms and treatments. Storing patients personal and medical information in a database for further analysis of the patients and patients get real time suggestions from doctors. Conclusion: In the healthcare sector AI-powered applications have seen a remarkable spike in recent days. This covid crisis changed the whole healthcare system upside down. So this NLP powered chatbot system reduced office waiting, saving money, time and energy. Patients might be getting medical knowledge and assisting ourselves within their own time and place.
2022-08-12
Maruyama, Yoshihiro.  2021.  Learning, Development, and Emergence of Compositionality in Natural Language Processing. 2021 IEEE International Conference on Development and Learning (ICDL). :1–7.
There are two paradigms in language processing, as characterised by symbolic compositional and statistical distributional modelling, which may be regarded as based upon the principles of compositionality (or symbolic recursion) and of contextuality (or the distributional hypothesis), respectively. Starting with philosophy of language as in Frege and Wittgenstein, we elucidate the nature of language and language processing from interdisciplinary perspectives across different fields of science. At the same time, we shed new light on conceptual issues in language processing on the basis of recent advances in Transformer-based models such as BERT and GPT-3. We link linguistic cognition with mathematical cognition through these discussions, explicating symbol grounding/emergence problems shared by both of them. We also discuss whether animal cognition can develop recursive compositional information processing.
2022-07-15
Wang, Shilei, Wang, Hui, Yu, Hongtao, Zhang, Fuzhi.  2021.  Detecting shilling groups in recommender systems based on hierarchical topic model. 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :832—837.
In a group shilling attack, attackers work collaboratively to inject fake profiles aiming to obtain desired recommendation result. This type of attacks is more harmful to recommender systems than individual shilling attacks. Previous studies pay much attention to detect individual attackers, and little work has been done on the detection of shilling groups. In this work, we introduce a topic modeling method of natural language processing into shilling attack detection and propose a shilling group detection method on the basis of hierarchical topic model. First, we model the given dataset to a series of user rating documents and use the hierarchical topic model to learn the specific topic distributions of each user from these rating documents to describe user rating behaviors. Second, we divide candidate groups based on rating value and rating time which are not involved in the hierarchical topic model. Lastly, we calculate group suspicious degrees in accordance with several indicators calculated through the analysis of user rating distributions, and use the k-means clustering algorithm to distinguish shilling groups. The experimental results on the Netflix and Amazon datasets show that the proposed approach performs better than baseline methods.
2022-06-06
Hung, Benjamin W.K., Muramudalige, Shashika R., Jayasumana, Anura P., Klausen, Jytte, Libretti, Rosanne, Moloney, Evan, Renugopalakrishnan, Priyanka.  2019.  Recognizing Radicalization Indicators in Text Documents Using Human-in-the-Loop Information Extraction and NLP Techniques. 2019 IEEE International Symposium on Technologies for Homeland Security (HST). :1–7.
Among the operational shortfalls that hinder law enforcement from achieving greater success in preventing terrorist attacks is the difficulty in dynamically assessing individualized violent extremism risk at scale given the enormous amount of primarily text-based records in disparate databases. In this work, we undertake the critical task of employing natural language processing (NLP) techniques and supervised machine learning models to classify textual data in analyst and investigator notes and reports for radicalization behavioral indicators. This effort to generate structured knowledge will build towards an operational capability to assist analysts in rapidly mining law enforcement and intelligence databases for cues and risk indicators. In the near-term, this effort also enables more rapid coding of biographical radicalization profiles to augment a research database of violent extremists and their exhibited behavioral indicators.
2022-05-19
Anusha, M, Leelavathi, R.  2021.  Analysis on Sentiment Analytics Using Deep Learning Techniques. 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :542–547.
Sentiment analytics is the process of applying natural language processing and methods for text-based information to define and extract subjective knowledge of the text. Natural language processing and text classifications can deal with limited corpus data and more attention has been gained by semantic texts and word embedding methods. Deep learning is a powerful method that learns different layers of representations or qualities of information and produces state-of-the-art prediction results. In different applications of sentiment analytics, deep learning methods are used at the sentence, document, and aspect levels. This review paper is based on the main difficulties in the sentiment assessment stage that significantly affect sentiment score, pooling, and polarity detection. The most popular deep learning methods are a Convolution Neural Network and Recurrent Neural Network. Finally, a comparative study is made with a vast literature survey using deep learning models.
Zhang, Cheng, Yamana, Hayato.  2021.  Improving Text Classification Using Knowledge in Labels. 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA). :193–197.
Various algorithms and models have been proposed to address text classification tasks; however, they rarely consider incorporating the additional knowledge hidden in class labels. We argue that hidden information in class labels leads to better classification accuracy. In this study, instead of encoding the labels into numerical values, we incorporated the knowledge in the labels into the original model without changing the model architecture. We combined the output of an original classification model with the relatedness calculated based on the embeddings of a sequence and a keyword set. A keyword set is a word set to represent knowledge in the labels. Usually, it is generated from the classes while it could also be customized by the users. The experimental results show that our proposed method achieved statistically significant improvements in text classification tasks. The source code and experimental details of this study can be found on Github11https://github.com/HeroadZ/KiL.
Kuilboer, Jean-Pierre, Stull, Tristan.  2021.  Text Analytics and Big Data in the Financial domain. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI). :1–4.
This research attempts to provide some insights on the application of text mining and Natural Language Processing (NLP). The application domain is consumer complaints about financial institutions in the USA. As an advanced analytics discipline embedded within the Big Data paradigm, the practice of text analytics contains elements of emergent knowledge processes. Since our experiment should be able to scale up we make use of a pipeline based on Spark-NLP. The usage scenario is adapting the model to a specific industrial context and using the dataset offered by the "Consumer Financial Protection Bureau" to illustrate the application.
2022-04-25
Khalil, Hady A., Maged, Shady A..  2021.  Deepfakes Creation and Detection Using Deep Learning. 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC). :1–4.
Deep learning has been used in a wide range of applications like computer vision, natural language processing and image detection. The advancement in deep learning algorithms in image detection and manipulation has led to the creation of deepfakes, deepfakes use deep learning algorithms to create fake images that are at times very hard to distinguish from real images. With the rising concern around personal privacy and security, Many methods to detect deepfake images have emerged, in this paper the use of deep learning for creating as well as detecting deepfakes is explored, this paper also propose the use of deep learning image enhancement method to improve the quality of deepfakes created.
Ren, Jing, Xia, Feng, Liu, Yemeng, Lee, Ivan.  2021.  Deep Video Anomaly Detection: Opportunities and Challenges. 2021 International Conference on Data Mining Workshops (ICDMW). :959–966.
Anomaly detection is a popular and vital task in various research contexts, which has been studied for several decades. To ensure the safety of people’s lives and assets, video surveillance has been widely deployed in various public spaces, such as crossroads, elevators, hospitals, banks, and even in private homes. Deep learning has shown its capacity in a number of domains, ranging from acoustics, images, to natural language processing. However, it is non-trivial to devise intelligent video anomaly detection systems cause anomalies significantly differ from each other in different application scenarios. There are numerous advantages if such intelligent systems could be realised in our daily lives, such as saving human resources in a large degree, reducing financial burden on the government, and identifying the anomalous behaviours timely and accurately. Recently, many studies on extending deep learning models for solving anomaly detection problems have emerged, resulting in beneficial advances in deep video anomaly detection techniques. In this paper, we present a comprehensive review of deep learning-based methods to detect the video anomalies from a new perspective. Specifically, we summarise the opportunities and challenges of deep learning models on video anomaly detection tasks, respectively. We put forth several potential future research directions of intelligent video anomaly detection system in various application domains. Moreover, we summarise the characteristics and technical problems in current deep learning methods for video anomaly detection.