Visible to the public Biblio

Filters: Keyword is TF-IDF  [Clear All Filters]
2022-07-15
Wang, Yan, Allouache, Yacine, Joubert, Christian.  2021.  A Staffing Recommender System based on Domain-Specific Knowledge Graph. 2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS). :1—6.
In the economics environment, Job Matching is always a challenge involving the evolution of knowledge and skills. A good matching of skills and jobs can stimulate the growth of economics. Recommender System (RecSys), as one kind of Job Matching, can help the candidates predict the future job relevant to their preferences. However, RecSys still has the problem of cold start and data sparsity. The content-based filtering in RecSys needs the adaptive data for the specific staffing tasks of Bidirectional Encoder Representations from Transformers (BERT). In this paper, we propose a job RecSys based on skills and locations using a domain-specific Knowledge Graph (KG). This system has three parts: a pipeline of Named Entity Recognition (NER) and Relation Extraction (RE) using BERT; a standardization system for pre-processing, semantic enrichment and semantic similarity measurement; a domain-specific Knowledge Graph (KG). Two different relations in the KG are computed by cosine similarity and Term Frequency-Inverse Document Frequency (TF-IDF) respectively. The raw data used in the staffing RecSys include 3000 descriptions of job offers from Indeed, 126 Curriculum Vitae (CV) in English from Kaggle and 106 CV in French from Linx of Capgemini Engineering. The staffing RecSys is integrated under an architecture of Microservices. The autonomy and effectiveness of the staffing RecSys are verified through the experiment using Discounted Cumulative Gain (DCG). Finally, we propose several potential research directions for this research.
2019-04-05
Wen, Senhao, Rao, Yu, Yan, Hanbing.  2018.  Information Protecting Against APT Based on the Study of Cyber Kill Chain with Weighted Bayesian Classification with Correction Factor. Proceedings of the 7th International Conference on Informatics, Environment, Energy and Applications. :231-235.

To avoid being discovered by the defenders of a target, APT attackers are using encrypted communication to hide communication features, using code obfuscation and file-less technology to avoid malicious code being easily reversed and leaking out its internal working mechanism, and using misleading content to conceal their identities. And it is clearly ineffective to detect APT attacks by relying on one single technology. All of these tough situation make information security and privacy protection face increasingly serious threats. In this paper, through a deep study of Cyber Kill Chain behaviors, combining with intelligence analysis technology, we transform APT detecting problem to be a measurable mathematical problem through weighted Bayesian classification with correction factor so as to detect APTs and perceive threats. In the solution, we adopted intelligence acquisition technology from massive data, and TFIDF algorithm for calculate attack behavior's weight. Also we designed a correction factor to improve the Markov Weighted Bayesian Model with multiple behaviors being detected by modifying the value of the probability of APT attack.

2019-02-18
Fukushima, Keishiro, Nakamura, Toru, Ikeda, Daisuke, Kiyomoto, Shinsaku.  2018.  Challenges in Classifying Privacy Policies by Machine Learning with Word-based Features. Proceedings of the 2Nd International Conference on Cryptography, Security and Privacy. :62–66.

In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier.We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.

2017-09-19
Bo, Li, Jinzhen, Wang, Ping, Zhao, Zhongjiang, Yan, Mao, Yang.  2016.  Research of Recognition System of Web Intrusion Detection Based on Storm. Proceedings of the Fifth International Conference on Network, Communication and Computing. :98–102.

Based on Storm, a distributed, reliable, fault-tolerant real-time data stream processing system, we propose a recognition system of web intrusion detection. The system is based on machine learning, feature selection algorithm by TF-IDF(Term Frequency–Inverse Document Frequency) and the optimised cosine similarity algorithm, at low false positive rate and a higher detection rate of attacks and malicious behavior in real-time to protect the security of user data. From comparative analysis of experiments we find that the system for intrusion recognition rate and false positive rate has improved to some extent, it can be better to complete the intrusion detection work.