Visible to the public Biblio

Filters: Keyword is Uniform resource locators  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z   [Show ALL]
A
Yadollahi, Mohammad Mehdi, Shoeleh, Farzaneh, Serkani, Elham, Madani, Afsaneh, Gharaee, Hossein.  2019.  An Adaptive Machine Learning Based Approach for Phishing Detection Using Hybrid Features. 2019 5th International Conference on Web Research (ICWR). :281—286.

Nowadays, phishing is one of the most usual web threats with regards to the significant growth of the World Wide Web in volume over time. Phishing attackers always use new (zero-day) and sophisticated techniques to deceive online customers. Hence, it is necessary that the anti-phishing system be real-time and fast and also leverages from an intelligent phishing detection solution. Here, we develop a reliable detection system which can adaptively match the changing environment and phishing websites. Our method is an online and feature-rich machine learning technique to discriminate the phishing and legitimate websites. Since the proposed approach extracts different types of discriminative features from URLs and webpages source code, it is an entirely client-side solution and does not require any service from the third-party. The experimental results highlight the robustness and competitiveness of our anti-phishing system to distinguish the phishing and legitimate websites.

Johnson, R., Kiourtis, N., Stavrou, A., Sritapan, V..  2015.  Analysis of content copyright infringement in mobile application markets. 2015 APWG Symposium on Electronic Crime Research (eCrime). :1–10.

As mobile devices increasingly become bigger in terms of display and reliable in delivering paid entertainment and video content, we also see a rise in the presence of mobile applications that attempt to profit by streaming pirated content to unsuspected end-users. These applications are both paid and free and in the case of free applications, the source of funding appears to be advertisements that are displayed while the content is streamed to the device. In this paper, we assess the extent of content copyright infringement for mobile markets that span multiple platforms (iOS, Android, and Windows Mobile) and cover both official and unofficial mobile markets located across the world. Using a set of search keywords that point to titles of paid streaming content, we discovered 8,592 Android, 5,550 iOS, and 3,910 Windows mobile applications that matched our search criteria. Out of those applications, hundreds had links to either locally or remotely stored pirated content and were not developed, endorsed, or, in many cases, known to the owners of the copyrighted contents. We also revealed the network locations of 856,717 Uniform Resource Locators (URLs) pointing to back-end servers and cyber-lockers used to communicate the pirated content to the mobile application.

L, Gururaj H, C, Soundarya B, V, Janhavi, H, Lakshmi, MJ, Prassan Kumar.  2022.  Analysis of Cyber Security Attacks using Kali Linux. 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). :1—6.
In the prevailing situation, the sports like economic, industrial, cultural, social, and governmental activities are carried out in the online world. Today's international is particularly dependent on the wireless era and protective these statistics from cyber-assaults is a hard hassle. The reason for cyber-assaults is to damage thieve the credentials. In a few other cases, cyber-attacks ought to have a navy or political functions. The damages are PC viruses, facts break, DDS, and exceptional attack vectors. To this surrender, various companies use diverse answers to prevent harm because of cyberattacks. Cyber safety follows actual-time data at the modern-day-day IT data. So, far, numerous techniques have proposed with the resource of researchers around the area to prevent cyber-attacks or lessen the harm due to them. The cause of this has a look at is to survey and comprehensively evaluate the usual advances supplied around cyber safety and to analyse the traumatic situations, weaknesses, and strengths of the proposed techniques. Different sorts of attacks are taken into consideration in element. In addition, evaluation of various cyber-attacks had been finished through the platform called Kali Linux. It is predicted that the complete assessment has a have a study furnished for college students, teachers, IT, and cyber safety researchers might be beneficial.
Anagandula, K., Zavarsky, P..  2020.  An Analysis of Effectiveness of Black-Box Web Application Scanners in Detection of Stored SQL Injection and Stored XSS Vulnerabilities. 2020 3rd International Conference on Data Intelligence and Security (ICDIS). :40—48.

Black-box web application scanners are used to detect vulnerabilities in the web application without any knowledge of the source code. Recent research had shown their poor performance in detecting stored Cross-Site Scripting (XSS) and stored SQL Injection (SQLI). The detection efficiency of four black-box scanners on two testbeds, Wackopicko and Custom testbed Scanit (obtained from [5]), have been analyzed in this paper. The analysis showed that the scanners need to be improved for better detection of multi-step stored XSS and stored SQLI. This study involves the interaction between the selected scanners and the web application to measure their efficiency of inserting proper attack vectors in appropriate fields. The results of this research paper indicate that there is not much difference in terms of performance between open-source and commercial black-box scanners used in this research. However, it may depend on the policies and trust issues of the companies using them according to their needs. Some of the possible recommendations are provided to improve the detection rate of stored SQLI and stored XSS vulnerabilities in this paper. The study concludes that the state-of-the-art of automated black-box web application scanners in 2020 needs to be improved to detect stored XSS and stored SQLI more effectively.

Wibawa, Dikka Aditya Satria, Setiawan, Hermawan, Girinoto.  2022.  Anti-Phishing Game Framework Based on Extended Design Play Experience (DPE) Framework as an Educational Media. 2022 7th International Workshop on Big Data and Information Security (IWBIS). :107–112.
The main objective of this research is to increase security awareness against phishing attacks in the education sector by teaching users about phishing URLs. The educational media was made based on references from several previous studies that were used as basic references. Development of antiphishing game framework educational media using the extended DPE framework. Participants in this study were vocational and college students in the technology field. The respondents included vocational and college students, each with as many as 30 respondents. To assess the level of awareness and understanding of phishing, especially phishing URLs, participants will be given a pre-test before playing the game, and after completing the game, the application will be given a posttest. A paired t-test was used to answer the research hypothesis. The results of data analysis show differences in the results of increasing identification of URL phishing by respondents before and after using educational media of the anti-phishing game framework in increasing security awareness against URL phishing attacks. More serious game development can be carried out in the future to increase user awareness, particularly in phishing or other security issues, and can be implemented for general users who do not have a background in technology.
M, Yazhmozhi V., Janet, B., Reddy, Srinivasulu.  2020.  Anti-phishing System using LSTM and CNN. 2020 IEEE International Conference for Innovation in Technology (INOCON). :1—5.
Users prefer to do e-banking and e-shopping now-a-days because of the exponential growth of the internet. Because of this paradigm shift, hackers are finding umpteen ways to steal our personal information and critical details like details of debit and credit cards, by disguising themselves as reputed websites, just by changing the spelling or making minor modifications to the URL. Identifying whether an URL is benign or malicious is a challenging job, because it makes use of the weakness of the user. While there are several works carried out to detect phishing websites, they only use heuristic methods and list based techniques and therefore couldn't avoid phishing effectively. In this paper an anti-phishing system was proposed to protect the users. It uses an ensemble model that uses both LSTM and CNN with a massive data set containing nearly 2,00,000 URLs, that is balanced. After analyzing the accuracy of different existing approaches, it has been found that the ensemble model that uses both LSTM and CNN performed better with an accuracy of 96% and the precision is 97% respectively which is far better than the existing solutions.
Mhana, Samer Attallah, Din, Jamilah Binti, Atan, Rodziah Binti.  2016.  Automatic generation of Content Security Policy to mitigate cross site scripting. 2016 2nd International Conference on Science in Information Technology (ICSITech). :324–328.

Content Security Policy (CSP) is powerful client-side security layer that helps in mitigating and detecting wide ranges of Web attacks including cross-site scripting (XSS). However, utilizing CSP by site administrators is a fallible process and may require significant changes in web application code. In this paper, we propose an approach to help site administers to overcome these limitations in order to utilize the full benefits of CSP mechanism which leads to more immune sites from XSS. The algorithm is implemented as a plugin. It does not interfere with the Web application original code. The plugin can be “installed” on any other web application with minimum efforts. The algorithm can be implemented as part of Web Server layer, not as part of the business logic layer. It can be extended to support generating CSP for contents that are modified by JavaScript after loading. Current approach inspects the static contents of URLs.

B
Hammoud, O. R., Tarkhanov, I. A..  2020.  Blockchain-based open infrastructure for URL filtering in an Internet browser. 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT). :1—4.
This research is dedicated to the development of a prototype of open infrastructure for users’ internet traffic filtering on a browser level. We described the advantages of a distributed approach in comparison with current centralized solutions. Besides, we suggested a solution to define the optimum size for a URL storage block in Ethereum network. This solution may be used for the development of infrastructure of DApps applications on Ethereum network in future. The efficiency of the suggested approach is supported by several experiments.
Fargose, Rehan, Gaonkar, Samarth, Jadhav, Paras, Jadiya, Harshit, Lopes, Minal.  2022.  Browser Extension For A Safe Browsing Experience. 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS). :1–6.
Due to the rise of the internet a business model known as online advertising has seen unprecedented success. However, it has also become a prime method through which criminals can scam people. Often times even legitimate websites contain advertisements that are linked to scam websites since they are not verified by the website’s owners. Scammers have become quite creative with their attacks, using various unorthodox and inconspicuous methods such as I-frames, Favicons, Proxy servers, Domains, etc. Many modern Anti-viruses are paid services and hence not a feasible option for most users in 3rd world countries. Often people don’t possess devices that have enough RAM to even run such software efficiently leaving them without any options. This project aims to create a Browser extension that will be able to distinguish between safe and unsafe websites by utilizing Machine Learning algorithms. This system is lightweight and free thus fulfilling the needs of most people looking for a cheap and reliable security solution and allowing people to surf the internet easily and safely. The system will scan all the intermittent URL clicks as well, not just the main website thus providing an even greater degree of security.
C
Paschalides, Demetris, Christodoulou, Chrysovalantis, Andreou, Rafael, Pallis, George, Dikaiakos, Marios D., Kornilakis, Alexandros, Markatos, Evangelos.  2019.  Check-It: A plugin for Detecting and Reducing the Spread of Fake News and Misinformation on the Web. 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI). :298–302.
Over the past few years, we have been witnessing the rise of misinformation on the Internet. People fall victims of fake news continuously, and contribute to their propagation knowingly or inadvertently. Many recent efforts seek to reduce the damage caused by fake news by identifying them automatically with artificial intelligence techniques, using signals from domain flag-lists, online social networks, etc. In this work, we present Check-It, a system that combines a variety of signals into a pipeline for fake news identification. Check-It is developed as a web browser plugin with the objective of efficient and timely fake news detection, while respecting user privacy. In this paper, we present the design, implementation and performance evaluation of Check-It. Experimental results show that it outperforms state-of-the-art methods on commonly-used datasets.
Korolev, D., Frolov, A., Babalova, I..  2020.  Classification of Websites Based on the Content and Features of Sites in Onion Space. 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). :1680—1683.
This paper describes a method for classifying onion sites. According to the results of the research, the most spread model of site in onion space is built. To create such a model, a specially trained neural network is used. The classification of neural network is based on five different categories such as using authentication system, corporate email, readable URL, feedback and type of onion-site. The statistics of the most spread types of websites in Dark Net are given.
Akaishi, Sota, Uda, Ryuya.  2019.  Classification of XSS Attacks by Machine Learning with Frequency of Appearance and Co-occurrence. 2019 53rd Annual Conference on Information Sciences and Systems (CISS). :1–6.
Cross site scripting (XSS) attack is one of the attacks on the web. It brings session hijack with HTTP cookies, information collection with fake HTML input form and phishing with dummy sites. As a countermeasure of XSS attack, machine learning has attracted a lot of attention. There are existing researches in which SVM, Random Forest and SCW are used for the detection of the attack. However, in the researches, there are problems that the size of data set is too small or unbalanced, and that preprocessing method for vectorization of strings causes misclassification. The highest accuracy of the classification was 98% in existing researches. Therefore, in this paper, we improved the preprocessing method for vectorization by using word2vec to find the frequency of appearance and co-occurrence of the words in XSS attack scripts. Moreover, we also used a large data set to decrease the deviation of the data. Furthermore, we evaluated the classification results with two procedures. One is an inappropriate procedure which some researchers tend to select by mistake. The other is an appropriate procedure which can be applied to an attack detection filter in the real environment.
Islam, M., Rahaman, S., Meng, N., Hassanshahi, B., Krishnan, P., Yao, D. D..  2020.  Coding Practices and Recommendations of Spring Security for Enterprise Applications. 2020 IEEE Secure Development (SecDev). :49—57.
Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by conducting a measurement-based approach on 28 Spring applications. Our analysis shows that security risks associated with the identified security anti-patterns and insecure defaults can leave the enterprise application vulnerable to a wide range of high-risk attacks. To prevent these high-risk attacks, we also provide recommendations for practitioners. Consequently, our study has contributed one update to the official Spring security documentation while other security issues identified in this study are being considered for future major releases by Spring security community.
Philomina, Josna, Fahim Fathima, K A, Gayathri, S, Elias, Glory Elizabeth, Menon, Abhinaya A.  2022.  A comparitative study of machine learning models for the detection of Phishing Websites. 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS). :1–7.
Global cybersecurity threats have grown as a result of the evolving digital transformation. Cybercriminals have more opportunities as a result of digitization. Initially, cyberthreats take the form of phishing in order to gain confidential user credentials.As cyber-attacks get more sophisticated and sophisticated, the cybersecurity industry is faced with the problem of utilising cutting-edge technology and techniques to combat the ever-present hostile threats. Hackers use phishing to persuade customers to grant them access to a company’s digital assets and networks. As technology progressed, phishing attempts became more sophisticated, necessitating the development of tools to detect phishing.Machine learning is unsupervised one of the most powerful weapons in the fight against terrorist threats. The features used for phishing detection, as well as the approaches employed with machine learning, are discussed in this study.In this light, the study’s major goal is to propose a unique, robust ensemble machine learning model architecture that gives the highest prediction accuracy with the lowest error rate, while also recommending a few alternative robust machine learning models.Finally, the Random forest algorithm attained a maximum accuracy of 96.454 percent. But by implementing a hybrid model including the 3 classifiers- Decision Trees,Random forest, Gradient boosting classifiers, the accuracy increases to 98.4 percent.
Ambedkar, M. Dayal, Ambedkar, N. S., Raw, R. S..  2016.  A comprehensive inspection of cross site scripting attack. 2016 International Conference on Computing, Communication and Automation (ICCCA). :497–502.
Cross Site Scripting attack (XSS) is the computer security threat which allows the attacker to get access over the sensitive information, when the javaScript, VBScript, ActiveX, Flash or HTML which is embedded in the malicious XSS link gets executed. In this paper, we authors have discussed about various impacts of XSS, types of XSS, checked whether the site is vulnerable towards the XSS or not, discussed about various tools for examining the XSS vulnerability and summarizes the preventive measures against XSS.
D
Shirsat, S. D..  2018.  Demonstrating Different Phishing Attacks Using Fuzzy Logic. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). :57-61.

Phishing has increased tremendously over last few years and it has become a serious threat to global security and economy. Existing literature dealing with the problem of phishing is scarce. Phishing is a deception technique that uses a combination of technology and social engineering to acquire sensitive information such as online banking passwords, credit card or bank account details [2]. Phishing can be done through emails and websites to collect confidential information. Phishers design fraudulent websites which look similar to the legitimate websites and lure the user to visit the malicious website. Therefore, the users must be aware of malicious websites to protect their sensitive data [1]. But it is very difficult to distinguish between legitimate and fake website especially for nontechnical users [4]. Moreover, phishing sites are growing rapidly. The aim of this paper is to demonstrate phishing detection using fuzzy logic and interpreting results using different defuzzification methods.

Chen, Zhongyong, Han, Liegang, Xu, Yongshun, Yu, Zuwei.  2021.  Design and Implementation of A Vulnerability-Tolerant Reverse Proxy Based on Moving Target Defense for E-Government Application. 2021 2nd Information Communication Technologies Conference (ICTC). :270—273.
The digital transformation is injecting energy into economic growth and governance improvement for the China government. Digital governance and e-government services are playing a more and more important role in public management and social governance. Meanwhile, cyber-attacks and threats become the major challenges for e-government application systems. In this paper, we proposed a novel dynamic access entry scheme for web application, which provide a rapidly-changing defender-controlled attack surface based on Moving Target Defense (MTD) technology. The scheme can turn the static keywords of Uniform Resource Locator (URL) into the dynamic and random ones, which significantly increase the cost to adversaries attack. We present the prototype of the proposed scheme and evaluate the feasibility and effectiveness. The experimental results demonstrated the scheme is practical and effective.
Fujii, Shota, Kawaguchi, Nobutaka, Kojima, Shoya, Suzuki, Tomoya, Yamauchi, Toshihiro.  2022.  Design and Implementation of System for URL Signature Construction and Impact Assessment. 2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI). :95–100.
The attacker’s server plays an important role in sending attack orders and receiving stolen information, particularly in the more recent cyberattacks. Under these circumstances, it is important to use network-based signatures to block malicious communications in order to reduce the damage. However, in addition to blocking malicious communications, signatures are also required not to block benign communications during normal business operations. Therefore, the generation of signatures requires a high level of understanding of the business, and highly depends on individual skills. In addition, in actual operation, it is necessary to test whether the generated signatures do not interfere with benign communications, which results in high operational costs. In this paper, we propose SIGMA, a system that automatically generates signatures to block malicious communication without interfering with benign communication and then automatically evaluates the impact of the signatures. SIGMA automatically extracts the common parts of malware communication destinations by clustering them and generates multiple candidate signatures. After that, SIGMA automatically calculates the impact on normal communication based on business logs, etc., and presents the final signature to the analyst, which has the highest blockability of malicious communication and non-blockability of normal communication. Our objectives with this system are to reduce the human factor in generating the signatures, reduce the cost of the impact evaluation, and support the decision of whether to apply the signatures. In the preliminary evaluation, we showed that SIGMA can automatically generate a set of signatures that detect 100% of suspicious URLs with an over-detection rate of just 0.87%, using the results of 14,238 malware analyses and actual business logs. This result suggests that the cost for generation of signatures and the evaluation of their impact on business operations can be suppressed, which used to be a time-consuming and human-intensive process.
Shah, Rajeev Kumar, Hasan, Mohammad Kamrul, Islam, Shayla, Khan, Asif, Ghazal, Taher M., Khan, Ahmad Neyaz.  2022.  Detect Phishing Website by Fuzzy Multi-Criteria Decision Making. 2022 1st International Conference on AI in Cybersecurity (ICAIC). :1–8.
Phishing activity is undertaken by the hackers to compromise the computer networks and financial system. A compromised computer system or network provides data and or processing resources to the world of cybercrime. Cybercrimes are projected to cost the world \$6 trillion by 2021, in this context phishing is expected to continue being a growing challenge. Statistics around phishing growth over the last decade support this theory as phishing numbers enjoy almost an exponential growth over the period. Recent reports on the complexity of the phishing show that the fight against phishing URL as a means of building more resilient cyberspace is an evolving challenge. Compounding the problem is the lack of cyber security expertise to handle the expected rise in incidents. Previous research have proposed different methods including neural network, data mining technique, heuristic-based phishing detection technique, machine learning to detect phishing websites. However, recently phishers have started to use more sophisticated techniques to attack the internet users such as VoIP phishing, spear phishing etc. For these modern methods, the traditional ways of phishing detection provide low accuracy. Hence, the requirement arises for the application and development of modern tools and techniques to use as a countermeasure against such phishing attacks. Keeping in view the nature of recent phishing attacks, it is imperative to develop a state-of-the art anti-phishing tool which should be able to predict the phishing attacks before the occurrence of actual phishing incidents. We have designed such a tool that will work efficiently to detect the phishing websites so that a user can understand easily the risk of using of his personal and financial data.
Pan, J., Mao, X..  2017.  Detecting DOM-Sourced Cross-Site Scripting in Browser Extensions. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). :24–34.

In recent years, with the advances in JavaScript engines and the adoption of HTML5 APIs, web applications begin to show a tendency to shift their functionality from the server side towards the client side, resulting in dense and complex interactions with HTML documents using the Document Object Model (DOM). As a consequence, client-side vulnerabilities become more and more prevalent. In this paper, we focus on DOM-sourced Cross-site Scripting (XSS), which is a kind of severe but not well-studied vulnerability appearing in browser extensions. Comparing with conventional DOM-based XSS, a new attack surface is introduced by DOM-sourced XSS where the DOM could become a vulnerable source as well besides common sources such as URLs and form inputs. To discover such vulnerability, we propose a detecting framework employing hybrid analysis with two phases. The first phase is the lightweight static analysis consisting of a text filter and an abstract syntax tree parser, which produces potential vulnerable candidates. The second phase is the dynamic symbolic execution with an additional component named shadow DOM, generating a document as a proof-of-concept exploit. In our large-scale real-world experiment, 58 previously unknown DOM-sourced XSS vulnerabilities were discovered in user scripts of the popular browser extension Greasemonkey.

Chen, Quan, Snyder, Peter, Livshits, Ben, Kapravelos, Alexandros.  2021.  Detecting Filter List Evasion with Event-Loop-Turn Granularity JavaScript Signatures. 2021 IEEE Symposium on Security and Privacy (SP). :1715–1729.

Content blocking is an important part of a per-formant, user-serving, privacy respecting web. Current content blockers work by building trust labels over URLs. While useful, this approach has many well understood shortcomings. Attackers may avoid detection by changing URLs or domains, bundling unwanted code with benign code, or inlining code in pages.The common flaw in existing approaches is that they evaluate code based on its delivery mechanism, not its behavior. In this work we address this problem by building a system for generating signatures of the privacy-and-security relevant behavior of executed JavaScript. Our system uses as the unit of analysis each script's behavior during each turn on the JavaScript event loop. Focusing on event loop turns allows us to build highly identifying signatures for JavaScript code that are robust against code obfuscation, code bundling, URL modification, and other common evasions, as well as handle unique aspects of web applications.This work makes the following contributions to the problem of measuring and improving content blocking on the web: First, we design and implement a novel system to build per-event-loop-turn signatures of JavaScript behavior through deep instrumentation of the Blink and V8 runtimes. Second, we apply these signatures to measure how much privacy-and-security harming code is missed by current content blockers, by using EasyList and EasyPrivacy as ground truth and finding scripts that have the same privacy and security harming patterns. We build 1,995,444 signatures of privacy-and-security relevant behaviors from 11,212 unique scripts blocked by filter lists, and find 3,589 unique scripts hosting known harmful code, but missed by filter lists, affecting 12.48% of websites measured. Third, we provide a taxonomy of ways scripts avoid detection and quantify the occurrence of each. Finally, we present defenses against these evasions, in the form of filter list additions where possible, and through a proposed, signature based system in other cases.As part of this work, we share the implementation of our signature-generation system, the data gathered by applying that system to the Alexa 100K, and 586 AdBlock Plus compatible filter list rules to block instances of currently blocked code being moved to new URLs.

Yu, L., Chen, L., Dong, J., Li, M., Liu, L., Zhao, B., Zhang, C..  2020.  Detecting Malicious Web Requests Using an Enhanced TextCNN. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :768–777.
This paper proposes an approach that combines a deep learning-based method and a traditional machine learning-based method to efficiently detect malicious requests Web servers received. The first few layers of Convolutional Neural Network for Text Classification (TextCNN) are used to automatically extract powerful semantic features and in the meantime transferable statistical features are defined to boost the detection ability, specifically Web request parameter tampering. The semantic features from TextCNN and transferable statistical features from artificially-designing are grouped together to be fed into Support Vector Machine (SVM), replacing the last layer of TextCNN for classification. To facilitate the understanding of abstract features in form of numerical data in vectors extracted by TextCNN, this paper designs trace-back functions that map max-pooling outputs back to words in Web requests. After investigating the current available datasets for Web attack detection, HTTP Dataset CSIC 2010 is selected to test and verify the proposed approach. Compared with other deep learning models, the experimental results demonstrate that the approach proposed in this paper is competitive with the state-of-the-art.
Acharya, Jatin, Chuadhary, Anshul, Chhabria, Anish, Jangale, Smita.  2021.  Detecting Malware, Malicious URLs and Virus Using Machine Learning and Signature Matching. 2021 2nd International Conference for Emerging Technology (INCET). :1–5.
Nowadays most of our data is stored on an electronic device. The risk of that device getting infected by Viruses, Malware, Worms, Trojan, Ransomware, or any unwanted invader has increased a lot these days. This is mainly because of easy access to the internet. Viruses and malware have evolved over time so identification of these files has become difficult. Not only by viruses and malware your device can be attacked by a click on forged URLs. Our proposed solution for this problem uses machine learning techniques and signature matching techniques. The main aim of our solution is to identify the malicious programs/URLs and act upon them. The core idea in identifying the malware is selecting the key features from the Portable Executable file headers using these features we trained a random forest model. This RF model will be used for scanning a file and determining if that file is malicious or not. For identification of the virus, we are using the signature matching technique which is used to match the MD5 hash of the file with the virus signature database containing the MD5 hash of the identified viruses and their families. To distinguish between benign and illegitimate URLs there is a logistic regression model used. The regression model uses a tokenizer for feature extraction from the URL that is to be classified. The tokenizer separates all the domains, sub-domains and separates the URLs on every `/'. Then a TfidfVectorizer (Term Frequency - Inverse Document Frequency) is used to convert the text into a weighted value. These values are used to predict if the URL is safe to visit or not. On the integration of all three modules, the final application will provide full system protection against malicious software.
Buber, E., Dırı, B., Sahingoz, O. K..  2017.  Detecting phishing attacks from URL by using NLP techniques. 2017 International Conference on Computer Science and Engineering (UBMK). :337–342.

Nowadays, cyber attacks affect many institutions and individuals, and they result in a serious financial loss for them. Phishing Attack is one of the most common types of cyber attacks which is aimed at exploiting people's weaknesses to obtain confidential information about them. This type of cyber attack threats almost all internet users and institutions. To reduce the financial loss caused by this type of attacks, there is a need for awareness of the users as well as applications with the ability to detect them. In the last quarter of 2016, Turkey appears to be second behind China with an impact rate of approximately 43% in the Phishing Attack Analysis report between 45 countries. In this study, firstly, the characteristics of this type of attack are explained, and then a machine learning based system is proposed to detect them. In the proposed system, some features were extracted by using Natural Language Processing (NLP) techniques. The system was implemented by examining URLs used in Phishing Attacks before opening them with using some extracted features. Many tests have been applied to the created system, and it is seen that the best algorithm among the tested ones is the Random Forest algorithm with a success rate of 89.9%.

Peng, Tianrui, Harris, Ian, Sawa, Yuki.  2018.  Detecting Phishing Attacks Using Natural Language Processing and Machine Learning. 2018 IEEE 12th International Conference on Semantic Computing (ICSC). :300–301.
Phishing attacks are one of the most common and least defended security threats today. We present an approach which uses natural language processing techniques to analyze text and detect inappropriate statements which are indicative of phishing attacks. Our approach is novel compared to previous work because it focuses on the natural language text contained in the attack, performing semantic analysis of the text to detect malicious intent. To demonstrate the effectiveness of our approach, we have evaluated it using a large benchmark set of phishing emails.