Visible to the public Biblio

Found 871 results

Filters: Keyword is feature extraction  [Clear All Filters]
2022-10-12
Kumar, Yogendra, Subba, Basant.  2021.  A lightweight machine learning based security framework for detecting phishing attacks. 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS). :184—188.
A successful phishing attack is prelude to various other severe attacks such as login credentials theft, unauthorized access to user’s confidential data, malware and ransomware infestation of victim’s machine etc. This paper proposes a real time lightweight machine learning based security framework for detection of phishing attacks through analysis of Uniform Resource Locators (URLs). The proposed framework initially extracts a set of highly discriminating and uncorrelated features from the URL string corpus. These extracted features are then used to transform the URL strings into their corresponding numeric feature vectors, which are eventually used to train various machine learning based classifier models for identification of malicious phishing URLs. Performance analysis of the proposed security framework on two well known datasets: Kaggle dataset and UNB dataset shows that it is capable of detecting malicious phishing URLs with high precision, while at the same time maintain a very low level of false positive rate. The proposed framework is also shown to outperform other similar security frameworks proposed in the literature.121https://www.kaggle.com/antonyj453/ur1dataset2https://www.unb.ca/cic/datasets/ur1-2016.htm1
Deval, Shalin Kumar, Tripathi, Meenakshi, Bezawada, Bruhadeshwar, Ray, Indrakshi.  2021.  “X-Phish: Days of Future Past”‡: Adaptive & Privacy Preserving Phishing Detection. 2021 IEEE Conference on Communications and Network Security (CNS). :227—235.
Website phishing continues to persist as one of the most important security threats of the modern Internet era. A major concern has been that machine learning based approaches, which have been the cornerstones of deployed phishing detection solutions, have not been able to adapt to the evolving nature of the phishing attacks. To create updated machine learning models, the collection of a sufficient corpus of real-time phishing data has always been a challenging problem as most phishing websites are short-lived. In this work, for the first time, we address these important concerns and describe an adaptive phishing detection solution that is able to adapt to changes in phishing attacks. Our solution has two major contributions. First, our solution allows for multiple organizations to collaborate in a privacy preserving manner and generate a robust machine learning model for phishing detection. Second, our solution is designed to be flexible in order to adapt to the novel phishing features introduced by attackers. Our solution not only allows for incorporating novel features into the existing machine learning model, but also can help, to a certain extent, the “unlearning” of existing features that have become obsolete in current phishing attacks. We evaluated our approach on a large real-world data collected over a period of six months. Our results achieve a high true positive rate of 97 %, which is on par with existing state-of-the art centralized solutions. Importantly, our results demonstrate that, a machine learning model can incorporate new features while selectively “unlearning” the older obsolete features.
2022-10-06
Zhu, Xiaoyan, Zhang, Yu, Zhu, Lei, Hei, Xinhong, Wang, Yichuan, Hu, Feixiong, Yao, Yanni.  2021.  Chinese named entity recognition method for the field of network security based on RoBERTa. 2021 International Conference on Networking and Network Applications (NaNA). :420–425.
As the mobile Internet is developing rapidly, people who use cell phones to access the Internet dominate, and the mobile Internet has changed the development environment of online public opinion and made online public opinion events spread more widely. In the online environment, any kind of public issues may become a trigger for the generation of public opinion and thus need to be controlled for network supervision. The method in this paper can identify entities from the event texts obtained from mobile Today's Headlines, People's Daily, etc., and informatize security of public opinion in event instances, thus strengthening network supervision and control in mobile, and providing sufficient support for national security event management. In this paper, we present a SW-BiLSTM-CRF model, as well as a model combining the RoBERTa pre-trained model with the classical neural network BiLSTM model. Our experiments show that this approach provided achieves quite good results on Chinese emergency corpus, with accuracy and F1 values of 87.21% and 78.78%, respectively.
2022-10-03
Wang, Youning, Liu, Qi, Wang, Yang.  2021.  An Improved Bi-LSTM Model for Entity Extraction of Intellectual Property Using Complex Graph. 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). :1920–1925.
The protection of Intellectual Property (IP) has gradually increased in recent years. Traditional intellectual property management service has lower efficiency for such scale of data. Considering that the maturity of deep learning models has led to the development of knowledge graphs. Relevant researchers have investigated the application of knowledge graphs in different domains, such as medical services, social media, etc. However, few studies of knowledge graphs have been undertaken in the domain of intellectual property. In this paper, we introduce the process of building a domain knowledge graph and start from data preparation to conduct the research of named entity recognition.
2022-09-30
Wüstrich, Lars, Schröder, Lukas, Pahl, Marc-Oliver.  2021.  Cyber-Physical Anomaly Detection for ICS. 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM). :950–955.
Industrial Control Systems (ICS) are complex systems made up of many components with different tasks. For a safe and secure operation, each device needs to carry out its tasks correctly. To monitor a system and ensure the correct behavior of systems, anomaly detection is used.Models of expected behavior often rely only on cyber or physical features for anomaly detection. We propose an anomaly detection system that combines both types of features to create a dynamic fingerprint of an ICS. We present how a cyber-physical anomaly detection using sound on the physical layer can be designed, and which challenges need to be overcome for a successful implementation. We perform an initial evaluation for identifying actions of a 3D printer.
Yu, Dongqing, Hou, Xiaowei, Li, Ce, Lv, Qiujian, Wang, Yan, Li, Ning.  2021.  Anomaly Detection in Unstructured Logs Using Attention-based Bi-LSTM Network. 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). :403–407.
System logs record valuable information about the runtime status of IT systems. Therefore, system logs are a naturally excellent source of information for anomaly detection. Most of the existing studies on log-based anomaly detection construct a detection model to identify anomalous logs. Generally, the model treats historical logs as natural language sequences and learns the normal patterns from normal log sequences, and detects deviations from normal patterns as anomalies. However, the majority of existing methods focus on sequential and quantitative information and ignore semantic information hidden in log sequence so that they are inefficient in anomaly detection. In this paper, we propose a novel framework for automatically detecting log anomalies by utilizing an attention-based Bi-LSTM model. To demonstrate the effectiveness of our proposed model, we evaluate the performance on a public production log dataset. Extensive experimental results show that the proposed approach outperforms all comparison methods for anomaly detection.
2022-09-20
Dong, Xingbo, Jin, Zhe, Zhao, Leshan, Guo, Zhenhua.  2021.  BioCanCrypto: An LDPC Coded Bio-Cryptosystem on Fingerprint Cancellable Template. 2021 IEEE International Joint Conference on Biometrics (IJCB). :1—8.
Biometrics as a means of personal authentication has demonstrated strong viability in the past decade. However, directly deriving a unique cryptographic key from biometric data is a non-trivial task due to the fact that biometric data is usually noisy and presents large intra-class variations. Moreover, biometric data is permanently associated with the user, which leads to security and privacy issues. Cancellable biometrics and bio-cryptosystem are two main branches to address those issues, yet both approaches fall short in terms of accuracy performance, security, and privacy. In this paper, we propose a Bio-Crypto system on fingerprint Cancellable template (Bio-CanCrypto), which bridges cancellable biometrics and bio-cryptosystem to achieve a middle-ground for alleviating the limitations of both. Specifically, a cancellable transformation is applied on a fixed-length fingerprint feature vector to generate cancellable templates. Next, an LDPC coding mechanism is introduced into a reusable fuzzy extractor scheme and used to extract the stable cryptographic key from the generated cancellable templates. The proposed system can achieve both cancellability and reusability in one scheme. Experiments are conducted on a public fingerprint dataset, i.e., FVC2002. The results demonstrate that the proposed LDPC coded reusable fuzzy extractor is effective and promising.
Bentahar, Atef, Meraoumia, Abdallah, Bendjenna, Hakim, Chitroub, Salim, Zeroual, Abdelhakim.  2021.  Eigen-Fingerprints-Based Remote Authentication Cryptosystem. 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI). :1—6.
Nowadays, biometric is a most technique to authenticate /identify human been, because its resistance against theft, loss or forgetfulness. However, biometric is subject to different transmission attacks. Today, the protection of the sensitive biometric information is a big challenge, especially in current wireless networks such as internet of things where the transmitted data is easy to sniffer. For that, this paper proposes an Eigens-Fingerprint-based biometric cryptosystem, where the biometric feature vectors are extracted by the Principal Component Analysis technique with an appropriate quantification. The key-binding principle incorporated with bit-wise and byte-wise correcting code is used for encrypting data and sharing key. Several recognition rates and computation time are used to evaluate the proposed system. The findings show that the proposed cryptosystem achieves a high security without decreasing the accuracy.
Sreemol, R, Santosh Kumar, M B, Sreekumar, A.  2021.  Improvement of Security in Multi-Biometric Cryptosystem by Modulus Fuzzy Vault Algorithm. 2021 International Conference on Advances in Computing and Communications (ICACC). :1—7.
Numerous prevalent techniques build a Multi-Modal Biometric (MMB) system that struggles in offering security and also revocability onto the templates. This work proffered a MMB system centred on the Modulus Fuzzy Vault (MFV) aimed at resolving these issues. The methodology proposed includes Fingerprint (FP), Palmprint (PP), Ear and also Retina images. Utilizing the Boosted Double Plateau Histogram Equalization (BDPHE) technique, all images are improved. Aimed at removing the unnecessary things as of the ear and the blood vessels are segmented as of the retina images utilizing the Modified Balanced Iterative Reducing and Clustering using Hierarchy (MBIRCH) technique. Next, the input traits features are extracted; then the essential features are chosen as of the features extracted utilizing the Bidirectional Deer Hunting optimization Algorithm (BDHOA). The features chosen are merged utilizing the Normalized Feature Level and Score Level (NFLSL) fusion. The features fused are saved securely utilizing Modulus Fuzzy Vault. Upto fusion, the procedure is repeated aimed at the query image template. Next, the de-Fuzzy Vault procedure is executed aimed at the query template, and then the key is detached by matching the query template’s and input biometric template features. The key separated is analogized with the threshold that categorizes the user as genuine or else imposter. The proposed BDPHE and also MFV techniques function efficiently than the existent techniques.
Wang, Xuelei, Fidge, Colin, Nourbakhsh, Ghavameddin, Foo, Ernest, Jadidi, Zahra, Li, Calvin.  2021.  Feature Selection for Precise Anomaly Detection in Substation Automation Systems. 2021 13th IEEE PES Asia Pacific Power & Energy Engineering Conference (APPEEC). :1—6.
With the rapid advancement of the electrical grid, substation automation systems (SASs) have been developing continuously. However, with the introduction of advanced features, such as remote control, potential cyber security threats in SASs are also increased. Additionally, crucial components in SASs, such as protection relays, usually come from third-party vendors and may not be fully trusted. Untrusted devices may stealthily perform harmful or unauthorised behaviours which could compromise or damage SASs, and therefore, bring adverse impacts to the primary plant. Thus, it is necessary to detect abnormal behaviours from an untrusted device before it brings about catastrophic impacts. Anomaly detection techniques are suitable to detect anomalies in SASs as they only bring minimal side-effects to normal system operations. Many researchers have developed various machine learning algorithms and mathematical models to improve the accuracy of anomaly detection. However, without prudent feature selection, it is difficult to achieve high accuracy when detecting attacks launched from internal trusted networks, especially for stealthy message modification attacks which only modify message payloads slightly and imitate patterns of benign behaviours. Therefore, this paper presents choices of features which improve the accuracy of anomaly detection within SASs, especially for detecting “stealthy” attacks. By including two additional features, Boolean control data from message payloads and physical values from sensors, our method improved the accuracy of anomaly detection by decreasing the false-negative rate from 25% to 5% approximately.
2022-09-16
Almseidin, Mohammad, Al-Sawwa, Jamil, Alkasassbeh, Mouhammd.  2021.  Anomaly-based Intrusion Detection System Using Fuzzy Logic. 2021 International Conference on Information Technology (ICIT). :290—295.
Recently, the Distributed Denial of Service (DDOS) attacks has been used for different aspects to denial the number of services for the end-users. Therefore, there is an urgent need to design an effective detection method against this type of attack. A fuzzy inference system offers the results in a more readable and understandable form. This paper introduces an anomaly-based Intrusion Detection (IDS) system using fuzzy logic. The fuzzy logic inference system implemented as a detection method for Distributed Denial of Service (DDOS) attacks. The suggested method was applied to an open-source DDOS dataset. Experimental results show that the anomaly-based Intrusion Detection system using fuzzy logic obtained the best result by utilizing the InfoGain features selection method besides the fuzzy inference system, the results were 91.1% for the true-positive rate and 0.006% for the false-positive rate.
2022-09-09
hong, Xue, zhifeng, Liao, yuan, Wang, ruidi, Xu, zhuoran, Xu.  2020.  Research on risk severity decision of cluster supply chain based on data flow fuzzy clustering. 2020 Chinese Control And Decision Conference (CCDC). :2810—2815.
Based on the analysis of cluster supply chain risk characteristics, starting from the analysis of technical risk dimensions, information risk dimensions, human risk dimensions, and capital risk dimensions, a cluster supply chain risk severity assessment index system is designed. The fuzzy C-means clustering algorithm based on data flow is used to cluster each supply chain, analyze the risk severity of the supply chain, and evaluate the decision of the supply chain risk severity level based on the cluster weights and cluster center range. Based on the analytic hierarchy process, the risk severity of the entire clustered supply chain is made an early warning decision, and the clustered supply chain risk severity early warning level is obtained. The results of simulation experiments verify the feasibility of the decision method for cluster supply chain risk severity, and improve the theoretical support for cluster supply chain risk severity prediction.
Saini, Anu, Sri, Manepalli Ratna, Thakur, Mansi.  2021.  Intrinsic Plagiarism Detection System Using Stylometric Features and DBSCAN. 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). :13—18.
Plagiarism is the act of using someone else’s words or ideas without giving them due credit and representing it as one’s own work. In today's world, it is very easy to plagiarize others' work due to advancement in technology, especially by the use of the Internet or other offline sources such as books or magazines. Plagiarism can be classified into two broad categories on the basis of detection namely extrinsic and intrinsic plagiarism. Extrinsic plagiarism detection refers to detecting plagiarism in a document by comparing it against a given reference dataset, whereas, Intrinsic plagiarism detection refers to detecting plagiarism with the help of variation in writing styles without using any reference corpus. Although there are many approaches which can be adopted to detect extrinsic plagiarism, few are available for intrinsic plagiarism detection. In this paper, a simplified approach is proposed for developing an intrinsic plagiarism detector which is helpful in detecting plagiarism even when no reference corpus is available. The approach deals with development of an intrinsic plagiarism detection system by identifying the writing style of authors in the document using stylometric features and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering. The proposed system has an easy to use interactive interface where user has to upload a text document to be checked for plagiarism and the result is displayed on the web page itself. In addition, the user can also see the analysis of the document in the form of graphs.
Frankel, Sophia F., Ghosh, Krishnendu.  2021.  Machine Learning Approaches for Authorship Attribution using Source Code Stylometry. 2021 IEEE International Conference on Big Data (Big Data). :3298—3304.
Identification of source code authorship is vital for attribution. In this work, a machine learning framework is described to identify source code authorship. The framework integrates the features extracted using natural language processing based approaches and abstract syntax tree of the code. We evaluate the methodology on Google Code Jam dataset. We present the performance measures of the logistic regression and deep learning on the dataset.
Guo, Shaoying, Xu, Yanyun, Huang, Weiqing, Liu, Bo.  2021.  Specific Emitter Identification via Variational Mode Decomposition and Histogram of Oriented Gradient. 2021 28th International Conference on Telecommunications (ICT). :1—6.
Specific emitter identification (SEI) is a physical-layer-based approach for enhancing wireless communication network security. A well-done SEI method can be widely applied in identifying the individual wireless communication device. In this paper, we propose a novel specific emitter identification method based on variational mode decomposition and histogram of oriented gradient (VMD-HOG). The signal is decomposed into specific temporal modes via VMD and HOG features are obtained from the time-frequency spectrum of temporal modes. The performance of the proposed method is evaluated both in single hop and relaying scenarios and under three channels with the number of emitters varying. Results depict that our proposed method provides great identification performance for both simulated signals and realistic data of Zigbee devices and outperforms the two existing methods in identification accuracy and computational complexity.
2022-08-26
Kang, Dong Mug, Yoon, Sang Hun, Shin, Dae Kyo, Yoon, Young, Kim, Hyeon Min, Jang, Soo Hyun.  2021.  A Study on Attack Pattern Generation and Hybrid MR-IDS for In-Vehicle Network. 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). :291–294.
The CAN (Controller Area Network) bus, which transmits and receives ECU control information in vehicle, has a critical risk of external intrusion because there is no standardized security system. Recently, the need for IDS (Intrusion Detection System) to detect external intrusion of CAN bus is increasing, and high accuracy and real-time processing for intrusion detection are required. In this paper, we propose Hybrid MR (Machine learning and Ruleset) -IDS based on machine learning and ruleset to improve IDS performance. For high accuracy and detection rate, feature engineering was conducted based on the characteristics of the CAN bus, and the generated features were used in detection step. The proposed Hybrid MR-IDS can cope to various attack patterns that have not been learned in previous, as well as the learned attack patterns by using both advantages of rule set and machine learning. In addition, by collecting CAN data from an actual vehicle in driving and stop state, five attack scenarios including physical effects during all driving cycle are generated. Finally, the Hybrid MR-IDS proposed in this paper shows an average of 99% performance based on F1-score.
Ricks, Brian, Tague, Patrick, Thuraisingham, Bhavani.  2021.  DDoS-as-a-Smokescreen: Leveraging Netflow Concurrency and Segmentation for Faster Detection. 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). :217—224.
In the ever evolving Internet threat landscape, Distributed Denial-of-Service (DDoS) attacks remain a popular means to invoke service disruption. DDoS attacks, however, have evolved to become a tool of deceit, providing a smokescreen or distraction while some other underlying attack takes place, such as data exfiltration. Knowing the intent of a DDoS, and detecting underlying attacks which may be present concurrently with it, is a challenging problem. An entity whose network is under a DDoS attack may not have the support personnel to both actively fight a DDoS and try to mitigate underlying attacks. Therefore, any system that can detect such underlying attacks should do so only with a high degree of confidence. Previous work utilizing flow aggregation techniques with multi-class anomaly detection showed promise in both DDoS detection and detecting underlying attacks ongoing during an active DDoS attack. In this work, we head in the opposite direction, utilizing flow segmentation and concurrent flow feature aggregation, with the primary goal of greatly reduced detection times of both DDoS and underlying attacks. Using the same multi-class anomaly detection approach, we show greatly improved detection times with promising detection performance.
2022-08-12
Hakim, Mohammad Sadegh Seyyed, Karegar, Hossein Kazemi.  2021.  Detection of False Data Injection Attacks Using Cross Wavelet Transform and Machine Learning. 2021 11th Smart Grid Conference (SGC). :1—5.
Power grids are the most extensive man-made systems that are difficult to control and monitor. With the development of conventional power grids and moving toward smart grids, power systems have undergone vast changes since they use the Internet to transmit information and control commands to different parts of the power system. Due to the use of the Internet as a basic infrastructure for smart grids, attackers can sabotage the communication networks and alter the measurements. Due to the complexity of the smart grids, it is difficult for the network operator to detect such cyber-attacks. The attackers can implement the attack in a manner that conventional Bad Data detection (BDD) systems cannot detect since it may not violate the physical laws of the power system. This paper uses the cross wavelet transform (XWT) to detect stealth false data injections attacks (FDIAs) against state estimation (SE) systems. XWT can capture the coherency between measurements of adjacent buses and represent it in time and frequency space. Then, we train a machine learning classification algorithm to distinguish attacked measurements from normal measurements by applying a feature extraction technique.
Al Khayer, Aala, Almomani, Iman, Elkawlak, Khaled.  2020.  ASAF: Android Static Analysis Framework. 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH). :197–202.
Android Operating System becomes a major target for malicious attacks. Static analysis approach is widely used to detect malicious applications. Most of existing studies on static analysis frameworks are limited to certain features. This paper presents an Android Static Analysis Framework (ASAF) which models the overall static analysis phases and approaches for Android applications. ASAF can be implemented for different purposes including Android malicious apps detection. The proposed framework utilizes a parsing tool, Android Static Parse (ASParse) which is also introduced in this paper. Through the extendibility of the ASParse tool, future research studies can easily extend the parsed features and the parsed files to perform parsing based on their specific requirements and goals. Moreover, a case study is conducted to illustrate the implementation of the proposed ASAF.
Chao, Wang, Qun, Li, XiaoHu, Wang, TianYu, Ren, JiaHan, Dong, GuangXin, Guo, EnJie, Shi.  2020.  An Android Application Vulnerability Mining Method Based On Static and Dynamic Analysis. 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). :599–603.
Due to the advantages and limitations of the two kinds of vulnerability mining methods of static and dynamic analysis of android applications, the paper proposes a method of Android application vulnerability mining based on dynamic and static combination. Firstly, the static analysis method is used to obtain the basic vulnerability analysis results of the application, and then the input test case of dynamic analysis is constructed on this basis. The fuzzy input test is carried out in the real machine environment, and the application security vulnerability is verified with the taint analysis technology, and finally the application vulnerability report is obtained. Experimental results show that compared with static analysis results, the method can significantly improve the accuracy of vulnerability mining.
Zhang, Yanmiao, Ji, Xiaoyu, Cheng, Yushi, Xu, Wenyuan.  2019.  Vulnerability Detection for Smart Grid Devices via Static Analysis. 2019 Chinese Control Conference (CCC). :8915–8919.
As a modern power transmission network, smart grid connects abundant terminal devices and plays an important role in our daily life. However, along with its growth are the security threats. Different from the separated environment previously, an adversary nowadays can destroy the power system by attacking its terminal devices. As a result, it's critical to ensure the security and safety of terminal devices. To achieve it, detecting the pre-existing vulnerabilities in the terminal program and enhancing its security, are of great importance and necessity. In this paper, we introduce Cker, a novel vulnerability detection tool for smart grid devices, which generates an program model based on device sources and sets rules to perform model checking. We utilize the static analysis to extract necessary information and build corresponding program models. By further checking the model with pre-defined vulnerability patterns, we achieve security detection and error reporting. The evaluation results demonstrate that our method can effectively detect vulnerabilities in smart devices with an acceptable accuracy and false positive rate. In addition, as Cker is realized by pure python, it can be easily scaled to other platforms.
Aguinaldo, Roberto Daniel, Solano, Geoffrey, Pontiveros, Marc Jermaine, Balolong, Marilen Parungao.  2021.  NAMData: A Web-application for the Network Analysis of Microbiome Data. TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON). :341–346.
Recent projects regarding the exploration of the functions of microbiomes within communities brought about a plethora of new data. That specific field of study is called Metagenomics and one of its more advancing approach is the application of network analysis. The paper introduces NAMData which is a web-application tool for the network analysis of microbiome data. The system handles the compositionality and sparsity nature of microbiome data by applying taxa filtration, normalization, and zero treatment. Furthermore, compositionally aware correlation estimators were used to compute for the correlation between taxa and the system divides the network into the positive and negative correlation network. NAMData aims to capitalize on the unique network features namely network visualization, centrality scores, and community detection. The system enables researchers to include network analysis in their analysis pipelines even without any knowledge of programming. Biological concepts can be integrated with the network findings gathered from the system to either support existing facts or form new insights.
2022-07-28
Wang, Jingjing, Huang, Minhuan, Nie, Yuanping, Li, Jin.  2021.  Static Analysis of Source Code Vulnerability Using Machine Learning Techniques: A Survey. 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD). :76—86.

With the rapid increase of practical problem complexity and code scale, the threat of software security is increasingly serious. Consequently, it is crucial to pay attention to the analysis of software source code vulnerability in the development stage and take efficient measures to detect the vulnerability as soon as possible. Machine learning techniques have made remarkable achievements in various fields. However, the application of machine learning in the domain of vulnerability static analysis is still in its infancy and the characteristics and performance of diverse methods are quite different. In this survey, we focus on a source code-oriented static vulnerability analysis method using machine learning techniques. We review the studies on source code vulnerability analysis based on machine learning in the past decade. We systematically summarize the development trends and different technical characteristics in this field from the perspectives of the intermediate representation of source code and vulnerability prediction model and put forward several feasible research directions in the future according to the limitations of the current approaches.

2022-07-15
Figueiredo, Cainã, Lopes, João Gabriel, Azevedo, Rodrigo, Zaverucha, Gerson, Menasché, Daniel Sadoc, Pfleger de Aguiar, Leandro.  2021.  Software Vulnerabilities, Products and Exploits: A Statistical Relational Learning Approach. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :41—46.
Data on software vulnerabilities, products and exploits is typically collected from multiple non-structured sources. Valuable information, e.g., on which products are affected by which exploits, is conveyed by matching data from those sources, i.e., through their relations. In this paper, we leverage this simple albeit unexplored observation to introduce a statistical relational learning (SRL) approach for the analysis of vulnerabilities, products and exploits. In particular, we focus on the problem of determining the existence of an exploit for a given product, given information about the relations between products and vulnerabilities, and vulnerabilities and exploits, focusing on Industrial Control Systems (ICS), the National Vulnerability Database and ExploitDB. Using RDN-Boost, we were able to reach an AUC ROC of 0.83 and an AUC PR of 0.69 for the problem at hand. To reach that performance, we indicate that it is instrumental to include textual features, e.g., extracted from the description of vulnerabilities, as well as structured information, e.g., about product categories. In addition, using interpretable relational regression trees we report simple rules that shed insight on factors impacting the weaponization of ICS products.
Yu, Hongtao, Zheng, Haihong, Xu, Yishu, Ma, Ru, Gao, Dingli, Zhang, Fuzhi.  2021.  Detecting group shilling attacks in recommender systems based on maximum dense subtensor mining. 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :644—648.
Existing group shilling attack detection methods mainly depend on human feature engineering to extract group attack behavior features, which requires a high knowledge cost. To address this problem, we propose a group shilling attack detection method based on maximum density subtensor mining. First, the rating time series of each item is divided into time windows and the item tensor groups are generated by establishing the user-rating-time window data models of three-dimensional tensor. Second, the M-Zoom model is applied to mine the maximum dense subtensor of each item, and the subtensor groups with high consistency of behaviors are selected as candidate groups. Finally, a dual-input convolutional neural network model is designed to automatically extract features for the classification of real users and group attack users. The experimental results on the Amazon and Netflix datasets show the effectiveness of the proposed method.