Biblio

List
Filter

Found 101 results

Filters: Keyword is text analysis [Clear All Filters]

2023-02-03

Oldal, Laura Gulyás, Kertész, Gábor. 2022. Evaluation of Deep Learning-based Authorship Attribution Methods on Hungarian Texts. 2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC). :000161–000166.

The range of text analysis methods in the field of natural language processing (NLP) has become more and more extensive thanks to the increasing computational resources of the 21st century. As a result, many deep learning-based solutions have been proposed for the purpose of authorship attribution, as they offer more flexibility and automated feature extraction compared to traditional statistical methods. A number of solutions have appeared for the attribution of English texts, however, the number of methods designed for Hungarian language is extremely small. Hungarian is a morphologically rich language, sentence formation is flexible and the alphabet is different from other languages. Furthermore, a language specific POS tagger, pretrained word embeddings, dependency parser, etc. are required. As a result, methods designed for other languages cannot be directly applied on Hungarian texts. In this paper, we review deep learning-based authorship attribution methods for English texts and offer techniques for the adaptation of these solutions to Hungarian language. As a part of the paper, we collected a new dataset consisting of Hungarian literary works of 15 authors. In addition, we extensively evaluate the implemented methods on the new dataset.

Ouamour, S., Sayoud, H.. 2022. Computational Identification of Author Style on Electronic Libraries - Case of Lexical Features. 2022 5th International Symposium on Informatics and its Applications (ISIA). :1–4.

In the present work, we intend to present a thorough study developed on a digital library, called HAT corpus, for a purpose of authorship attribution. Thus, a dataset of 300 documents that are written by 100 different authors, was extracted from the web digital library and processed for a task of author style analysis. All the documents are related to the travel topic and written in Arabic. Basically, three important rules in stylometry should be respected: the minimum document size, the same topic for all documents and the same genre too. In this work, we made a particular effort to respect those conditions seriously during the corpus preparation. That is, three lexical features: Fixed-length words, Rare words and Suffixes are used and evaluated by using a centroid based Manhattan distance. The used identification approach shows interesting results with an accuracy of about 0.94.

2022-08-26

Zhao, Yue, Shen, Yang, Qi, Yuanbo. 2021. A Security Analysis of Chinese Robot Supply Chain Based on Open-Source Intelligence. 2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI). :219—222.

This paper argues that the security management of the robot supply chain would preferably focus on Sino-US relations and technical bottlenecks based on a comprehensive security analysis through open-source intelligence and data mining of associated discourses. Through the lens of the newsboy model and game theory, this study reconstructs the risk appraisal model of the robot supply chain and rebalances the process of the Sino-US competition game, leading to the prediction of China's strategic movements under the supply risks. Ultimately, this paper offers a threefold suggestion: increasing the overall revenue through cost control and scaled expansion, resilience enhancement and risk prevention, and outreach of a third party's cooperation for confrontation capabilities reinforcement.

2022-05-19

Fareed, Samsad Beagum Sheik. 2021. API Pipeline for Visualising Text Analytics Features of Twitter Texts. 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ). :1–6.

Twitter text analysis is quite useful in analysing emotions, sentiments and feedbacks of consumers on products and services. This helps the service providers and the manufacturers to improve their products and services, address serious issues before they lead to a crisis and improve business acumen. Twitter texts also form a data source for various research studies. They are used in topic analysis, sentiment analysis, content analysis and thematic analysis. In this paper, we present a pipeline for searching, analysing and visualizing the text analytics features of twitter texts using web APIs. It allows to build a simple yet powerful twitter text analytics tool for researchers and other interested users.

Fuentalba, Diego, Durán, Claudia, Guillaume, Charles, Carrasco, Raúl, Gutierrez, Sebastián, Pinto, Oscar. 2021. Text Analytics Architecture in IoT Systems. 2021 Third South American Colloquium on Visible Light Communications (SACVLC). :01–06.

Management control and monitoring of production activities in intelligent environments in subway mines must be aligned with the strategies and objectives of each agent. It is required that in operations, the local structure of each service is fault-tolerant and that large amounts of data are transmitted online to executives to make effective and efficient decisions. The paper proposes an architecture that enables strategic text analysis on the Internet of Things devices through task partitioning with multiple agent systems and evaluates the feasibility of the design by building a prototype that improves communication. The results validate the system's design because Raspberry Pi can execute text mining algorithms and agents in about 3 seconds for 197 texts. This work emphasizes multiple agents for text analytics because the algorithms, along with the agents, use about 70% of a Raspberry Pi CPU.

2021-11-29

Somsakul, Supawit, Prom-on, Santitham. 2020. On the Network and Topological Analyses of Legal Documents Using Text Mining Approach. 2020 1st International Conference on Big Data Analytics and Practices (IBDAP). :1–6.

This paper presents a computational study of Thai legal documents using text mining and network analytic approach. Thai legal systems rely much on the existing judicial rulings. Thus, legal documents contain complex relationships and require careful examination. The objective of this study is to use text mining to model relationships between these legal documents and draw useful insights. A structure of document relationship was found as a result of the study in forms of a network that is related to the meaningful relations of legal documents. This can potentially be developed further into a document retrieval system based on how documents are related in the network.

2021-04-09

Mir, N., Khan, M. A. U.. 2020. Copyright Protection for Online Text Information : Using Watermarking and Cryptography. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1—4.

Information and security are interdependent elements. Information security has evolved to be a matter of global interest and to achieve this; it requires tools, policies and assurance of technologies against any relevant security risks. Internet influx while providing a flexible means of sharing the online information economically has rapidly attracted countless writers. Text being an important constituent of online information sharing, creates a huge demand of intellectual copyright protection of text and web itself. Various visible watermarking techniques have been studied for text documents but few for web-based text. In this paper, web page watermarking and cryptography for online content copyrights protection is proposed utilizing the semantic and syntactic rules using HTML (Hypertext Markup Language) and is tested for English and Arabic languages.

2021-03-18

Banday, M. T., Sheikh, S. A.. 2020. Improving Security Control of Text-Based CAPTCHA Challenges using Honeypot and Timestamping. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). :704—708.

The resistance to attacks aimed to break CAPTCHA challenges and the effectiveness, efficiency and satisfaction of human users in solving them called usability are the two major concerns while designing CAPTCHA schemes. User-friendliness, universality, and accessibility are related dimensions of usability, which must also be addressed adequately. With recent advances in segmentation and optical character recognition techniques, complex distortions, degradations and transformations are added to text-based CAPTCHA challenges resulting in their reduced usability. The extent of these deformations can be decreased if some additional security mechanism is incorporated in such challenges. This paper proposes an additional security mechanism that can add an extra layer of protection to any text-based CAPTCHA challenge, making it more challenging for bots and scripts that might be used to attack websites and web applications. It proposes the use of hidden text-boxes for user entry of CAPTCHA string which serves as honeypots for bots and automated scripts. The honeypot technique is used to trick bots and automated scripts into filling up input fields which legitimate human users cannot fill in. The paper reports implementation of honeypot technique and results of tests carried out over three months during which form submissions were logged for analysis. The results demonstrated great effectiveness of honeypots technique to improve security control and usability of text-based CAPTCHA challenges.

2021-03-04

Kalin, J., Ciolino, M., Noever, D., Dozier, G.. 2020. Black Box to White Box: Discover Model Characteristics Based on Strategic Probing. 2020 Third International Conference on Artificial Intelligence for Industries (AI4I). :60—63.

In Machine Learning, White Box Adversarial Attacks rely on knowing underlying knowledge about the model attributes. This works focuses on discovering to distrinct pieces of model information: the underlying architecture and primary training dataset. With the process in this paper, a structured set of input probes and the output of the model become the training data for a deep classifier. Two subdomains in Machine Learning are explored - image based classifiers and text transformers with GPT-2. With image classification, the focus is on exploring commonly deployed architectures and datasets available in popular public libraries. Using a single transformer architecture with multiple levels of parameters, text generation is explored by fine tuning off different datasets. Each dataset explored in image and text are distinguishable from one another. Diversity in text transformer outputs implies further research is needed to successfully classify architecture attribution in text domain.

2021-02-22

Si, Y., Zhou, W., Gai, J.. 2020. Research and Implementation of Data Extraction Method Based on NLP. 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :11–15.

In order to accurately extract the data from unstructured Chinese text, this paper proposes a rule-based method based on natural language processing and regular expression. This method makes use of the language expression rules of the data in the text and other related knowledge to form the feature word lists and rule template to match the text. Experimental results show that the accuracy of the designed algorithm is 94.09%.

Martinelli, F., Marulli, F., Mercaldo, F., Marrone, S., Santone, A.. 2020. Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

Artificial Intelligence systems have enabled significant benefits for users and society, but whilst the data for their feeding are always increasing, a side to privacy and security leaks is offered. The severe vulnerabilities to the right to privacy obliged governments to enact specific regulations to ensure privacy preservation in any kind of transaction involving sensitive information. In the case of digital and/or physical documents comprising sensitive information, the right to privacy can be preserved by data obfuscation procedures. The capability of recognizing sensitive information for obfuscation is typically entrusted to the experience of human experts, who are over-whelmed by the ever increasing amount of documents to process. Artificial intelligence could proficiently mitigate the effort of the human officers and speed up processes. Anyway, until enough knowledge won't be available in a machine readable format, automatic and effectively working systems can't be developed. In this work we propose a methodology for transferring and leveraging general knowledge across specific-domain tasks. We built, from scratch, specific-domain knowledge data sets, for training artificial intelligence models supporting human experts in privacy preserving tasks. We exploited a mixture of natural language processing techniques applied to unlabeled domain-specific documents corpora for automatically obtain labeled documents, where sensitive information are recognized and tagged. We performed preliminary tests just over 10.000 documents from the healthcare and justice domains. Human experts supported us during the validation. Results we obtained, estimated in terms of precision, recall and F1-score metrics across these two domains, were promising and encouraged us to further investigations.

2021-01-25

Abusukhon, A., AlZu’bi, S.. 2020. New Direction of Cryptography: A Review on Text-to-Image Encryption Algorithms Based on RGB Color Value. 2020 Seventh International Conference on Software Defined Systems (SDS). :235–239.

Data encryption techniques are important for answering the question: How secure is the Internet for sending sensitive data. Keeping data secure while they are sent through the global network is a difficult task. This is because many hackers are fishing these data in order to get some benefits. The researchers have developed various types of encryption algorithms to protect data from attackers. These algorithms are mainly classified into two categories namely symmetric and asymmetric encryption algorithms. This survey sheds light on the recent work carried out on encrypting a text into an image based on the RGB color value and held a comparison between them based on various factors evolved from the literature.

2020-12-14

Yu, L., Chen, L., Dong, J., Li, M., Liu, L., Zhao, B., Zhang, C.. 2020. Detecting Malicious Web Requests Using an Enhanced TextCNN. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :768–777.

This paper proposes an approach that combines a deep learning-based method and a traditional machine learning-based method to efficiently detect malicious requests Web servers received. The first few layers of Convolutional Neural Network for Text Classification (TextCNN) are used to automatically extract powerful semantic features and in the meantime transferable statistical features are defined to boost the detection ability, specifically Web request parameter tampering. The semantic features from TextCNN and transferable statistical features from artificially-designing are grouped together to be fed into Support Vector Machine (SVM), replacing the last layer of TextCNN for classification. To facilitate the understanding of abstract features in form of numerical data in vectors extracted by TextCNN, this paper designs trace-back functions that map max-pooling outputs back to words in Web requests. After investigating the current available datasets for Web attack detection, HTTP Dataset CSIC 2010 is selected to test and verify the proposed approach. Compared with other deep learning models, the experimental results demonstrate that the approach proposed in this paper is competitive with the state-of-the-art.

2020-12-11

Palash, M. H., Das, P. P., Haque, S.. 2019. Sentimental Style Transfer in Text with Multigenerative Variational Auto-Encoder. 2019 International Conference on Bangla Speech and Language Processing (ICBSLP). :1—4.

Style transfer is an emerging trend in the fields of deep learning's applications, especially in images and audio data this is proven very useful and sometimes the results are astonishing. Gradually styles of textual data are also being changed in many novel works. This paper focuses on the transfer of the sentimental vibe of a sentence. Given a positive clause, the negative version of that clause or sentence is generated keeping the context same. The opposite is also done with negative sentences. Previously this was a very tough job because the go-to techniques for such tasks such as Recurrent Neural Networks (RNNs) [1] and Long Short-Term Memories(LSTMs) [2] can't perform well with it. But since newer technologies like Generative Adversarial Network(GAN) and Variational AutoEncoder(VAE) are emerging, this work seem to become more and more possible and effective. In this paper, Multi-Genarative Variational Auto-Encoder is employed to transfer sentiment values. Inspite of working with a small dataset, this model proves to be promising.

Dabas, K., Madaan, N., Arya, V., Mehta, S., Chakraborty, T., Singh, G.. 2019. Fair Transfer of Multiple Style Attributes in Text. 2019 Grace Hopper Celebration India (GHCI). :1—5.

To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. To the best of our knowledge this work is the first that shows and attempt to solve the issues related to multiple style transfer. We also demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp dataset to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.

Phu, T. N., Hoang, L., Toan, N. N., Tho, N. Dai, Binh, N. N.. 2019. C500-CFG: A Novel Algorithm to Extract Control Flow-based Features for IoT Malware Detection. 2019 19th International Symposium on Communications and Information Technologies (ISCIT). :568—573.

{Static characteristic extraction method Control flow-based features proposed by Ding has the ability to detect malicious code with higher accuracy than traditional Text-based methods. However, this method resolved NP-hard problem in a graph, therefore it is not feasible with the large-size and high-complexity programs. So, we propose the C500-CFG algorithm in Control flow-based features based on the idea of dynamic programming, solving Ding's NP-hard problem in O(N2) time complexity, where N is the number of basic blocks in decom-piled executable codes. Our algorithm is more efficient and more outstanding in detecting malware than Ding's algorithm: fast processing time, allowing processing large files, using less memory and extracting more feature information. Applying our algorithms with IoT data sets gives outstanding results on 2 measures: Accuracy = 99.34%

Slawinski, M., Wortman, A.. 2019. Applications of Graph Integration to Function Comparison and Malware Classification. 2019 4th International Conference on System Reliability and Safety (ICSRS). :16—24.

We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.

2020-11-20

Han, H., Wang, Q., Chen, C.. 2019. Policy Text Analysis Based on Text Mining and Fuzzy Cognitive Map. 2019 15th International Conference on Computational Intelligence and Security (CIS). :142—146.

With the introduction of computer methods, the amount of material and processing accuracy of policy text analysis have been greatly improved. In this paper, Text mining(TM) and latent semantic analysis(LSA) were used to collect policy documents and extract policy elements from them. Fuzzy association rule mining(FARM) technique and partial association test (PA) were used to discover the causal relationships and impact degrees between elements, and a fuzzy cognitive map (FCM) was developed to deduct the evolution of elements through a soft computing method. This non-interventionist approach avoids the validity defects caused by the subjective bias of researchers and provides policy makers with more objective policy suggestions from a neutral perspective. To illustrate the accuracy of this method, this study experimented by taking the state-owned capital layout adjustment related policies as an example, and proved that this method can effectively analyze policy text.

2020-11-04

Flores, P.. 2019. Digital Simulation in the Virtual World: Its Effect in the Knowledge and Attitude of Students Towards Cybersecurity. 2019 Sixth HCT Information Technology Trends (ITT). :1—5.

The search for alternative delivery modes to teaching has been one of the pressing concerns of numerous educational institutions. One key innovation to improve teaching and learning is e-learning which has undergone enormous improvements. From its focus on text-based environment, it has evolved into Virtual Learning Environments (VLEs) which provide more stimulating and immersive experiences among learners and educators. An example of VLEs is the virtual world which is an emerging educational platform among universities worldwide. One very interesting topic that can be taught using the virtual world is cybersecurity. Simulating cybersecurity in the virtual world may give a realistic experience to students which can be hardly achieved by classroom teaching. To date, there are quite a number of studies focused on cybersecurity awareness and cybersecurity behavior. But none has focused looking into the effect of digital simulation in the virtual world, as a new educational platform, in the cybersecurity attitude of the students. It is in this regard that this study has been conducted by designing simulation in the virtual world lessons that teaches the five aspects of cybersecurity namely; malware, phishing, social engineering, password usage and online scam, which are the most common cybersecurity issues. The study sought to examine the effect of this digital simulation design in the cybersecurity knowledge and attitude of the students. The result of the study ascertains that students exposed under simulation in the virtual world have a greater positive change in cybersecurity knowledge and attitude than their counterparts.

2020-11-02

Pan, C., Huang, J., Gong, J., Yuan, X.. 2019. Few-Shot Transfer Learning for Text Classification With Lightweight Word Embedding Based Models. IEEE Access. 7:53296–53304.

Many deep learning architectures have been employed to model the semantic compositionality for text sequences, requiring a huge amount of supervised data for parameters training, making it unfeasible in situations where numerous annotated samples are not available or even do not exist. Different from data-hungry deep models, lightweight word embedding-based models could represent text sequences in a plug-and-play way due to their parameter-free property. In this paper, a modified hierarchical pooling strategy over pre-trained word embeddings is proposed for text classification in a few-shot transfer learning way. The model leverages and transfers knowledge obtained from some source domains to recognize and classify the unseen text sequences with just a handful of support examples in the target problem domain. The extensive experiments on five datasets including both English and Chinese text demonstrate that the simple word embedding-based models (SWEMs) with parameter-free pooling operations are able to abstract and represent the semantic text. The proposed modified hierarchical pooling method exhibits significant classification performance in the few-shot transfer learning tasks compared with other alternative methods.

2020-10-12

Granatyr, Jones, Gomes, Heitor Murilo, Dias, João Miguel, Paiva, Ana Maria, Nunes, Maria Augusta Silveira Netto, Scalabrin, Edson Emílio, Spak, Fábio. 2019. Inferring Trust Using Personality Aspects Extracted from Texts. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :3840–3846.

Trust mechanisms are considered the logical protection of software systems, preventing malicious people from taking advantage or cheating others. Although these concepts are widely used, most applications in this field do not consider affective aspects to aid in trust computation. Researchers of Psychology, Neurology, Anthropology, and Computer Science argue that affective aspects are essential to human's decision-making processes. So far, there is a lack of understanding about how these aspects impact user's trust, particularly when they are inserted in an evaluation system. In this paper, we propose a trust model that accounts for personality using three personality models: Big Five, Needs, and Values. We tested our approach by extracting personality aspects from texts provided by two online human-fed evaluation systems and correlating them to reputation values. The empirical experiments show statistically significant better results in comparison to non-personality-wise approaches.

MacMahon, Silvana Togneri, Alfano, Marco, Lenzitti, Biagio, Bosco, Giosuè Lo, McCaffery, Fergal, Taibi, Davide, Helfert, Markus. 2019. Improving Communication in Risk Management of Health Information Technology Systems by means of Medical Text Simplification. 2019 IEEE Symposium on Computers and Communications (ISCC). :1135–1140.

Health Information Technology Systems (HITS) are increasingly used to improve the quality of patient care while reducing costs. These systems have been developed in response to the changing models of care to an ongoing relationship between patient and care team, supported by the use of technology due to the increased instance of chronic disease. However, the use of HITS may increase the risk to patient safety and security. While standards can be used to address and manage these risks, significant communication problems exist between experts working in different departments. These departments operate in silos often leading to communication breakdowns. For example, risk management stakeholders who are not clinicians may struggle to understand, define and manage risks associated with these systems when talking to medical professionals as they do not understand medical terminology or the associated care processes. In order to overcome this communication problem, we propose the use of the “Three Amigos” approach together with the use of the SIMPLE tool that has been developed to assist patients in understanding medical terms. This paper examines how the “Three Amigos” approach and the SIMPLE tool can be used to improve estimation of severity of risk by non-clinical risk management stakeholders and provides a practical example of their use in a ten step risk management process.

2020-10-05

Su, Jinsong, Zeng, Jiali, Xiong, Deyi, Liu, Yang, Wang, Mingxuan, Xie, Jun. 2018. A Hierarchy-to-Sequence Attentional Neural Machine Translation Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 26:623—632.

Although sequence-to-sequence attentional neural machine translation (NMT) has achieved great progress recently, it is confronted with two challenges: learning optimal model parameters for long parallel sentences and well exploiting different scopes of contexts. In this paper, partially inspired by the idea of segmenting a long sentence into short clauses, each of which can be easily translated by NMT, we propose a hierarchy-to-sequence attentional NMT model to handle these two challenges. Our encoder takes the segmented clause sequence as input and explores a hierarchical neural network structure to model words, clauses, and sentences at different levels, particularly with two layers of recurrent neural networks modeling semantic compositionality at the word and clause level. Correspondingly, the decoder sequentially translates segmented clauses and simultaneously applies two types of attention models to capture contexts of interclause and intraclause for translation prediction. In this way, we can not only improve parameter learning, but also well explore different scopes of contexts for translation. Experimental results on Chinese-English and English-German translation demonstrate the superiorities of the proposed model over the conventional NMT model.

Liu, Donglei, Niu, Zhendong, Zhang, Chunxia, Zhang, Jiadi. 2019. Multi-Scale Deformable CNN for Answer Selection. IEEE Access. 7:164986—164995.

The answer selection task is one of the most important issues within the automatic question answering system, and it aims to automatically find accurate answers to questions. Traditional methods for this task use manually generated features based on tf-idf and n-gram models to represent texts, and then select the right answers according to the similarity between the representations of questions and the candidate answers. Nowadays, many question answering systems adopt deep neural networks such as convolutional neural network (CNN) to generate the text features automatically, and obtained better performance than traditional methods. CNN can extract consecutive n-gram features with fixed length by sliding fixed-length convolutional kernels over the whole word sequence. However, due to the complex semantic compositionality of the natural language, there are many phrases with variable lengths and be composed of non-consecutive words in natural language, such as these phrases whose constituents are separated by other words within the same sentences. But the traditional CNN is unable to extract the variable length n-gram features and non-consecutive n-gram features. In this paper, we propose a multi-scale deformable convolutional neural network to capture the non-consecutive n-gram features by adding offset to the convolutional kernel, and also propose to stack multiple deformable convolutional layers to mine multi-scale n-gram features by the means of generating longer n-gram in higher layer. Furthermore, we apply the proposed model into the task of answer selection. Experimental results on public dataset demonstrate the effectiveness of our proposed model in answer selection.

2020-08-28

Perry, Lior, Shapira, Bracha, Puzis, Rami. 2019. NO-DOUBT: Attack Attribution Based On Threat Intelligence Reports. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :80—85.

The task of attack attribution, i.e., identifying the entity responsible for an attack, is complicated and usually requires the involvement of an experienced security expert. Prior attempts to automate attack attribution apply various machine learning techniques on features extracted from the malware's code and behavior in order to identify other similar malware whose authors are known. However, the same malware can be reused by multiple actors, and the actor who performed an attack using a malware might differ from the malware's author. Moreover, information collected during an incident may contain many clues about the identity of the attacker in addition to the malware used. In this paper, we propose a method of attack attribution based on textual analysis of threat intelligence reports, using state of the art algorithms and models from the fields of machine learning and natural language processing (NLP). We have developed a new text representation algorithm which captures the context of the words and requires minimal feature engineering. Our approach relies on vector space representation of incident reports derived from a small collection of labeled reports and a large corpus of general security literature. Both datasets have been made available to the research community. Experimental results show that the proposed representation can attribute attacks more accurately than the baselines' representations. In addition, we show how the proposed approach can be used to identify novel previously unseen threat actors and identify similarities between known threat actors.