Biblio
To avoid being discovered by the defenders of a target, APT attackers are using encrypted communication to hide communication features, using code obfuscation and file-less technology to avoid malicious code being easily reversed and leaking out its internal working mechanism, and using misleading content to conceal their identities. And it is clearly ineffective to detect APT attacks by relying on one single technology. All of these tough situation make information security and privacy protection face increasingly serious threats. In this paper, through a deep study of Cyber Kill Chain behaviors, combining with intelligence analysis technology, we transform APT detecting problem to be a measurable mathematical problem through weighted Bayesian classification with correction factor so as to detect APTs and perceive threats. In the solution, we adopted intelligence acquisition technology from massive data, and TFIDF algorithm for calculate attack behavior's weight. Also we designed a correction factor to improve the Markov Weighted Bayesian Model with multiple behaviors being detected by modifying the value of the probability of APT attack.
In this paper, we discuss challenges when we try to automatically classify privacy policies using machine learning with words as the features. Since it is difficult for general public to understand privacy policies, it is necessary to support them to do that. To this end, the authors believe that machine learning is one of the promising ways because users can grasp the meaning of policies through outputs by a machine learning algorithm. Our final goal is to develop a system which automatically translates privacy policies into privacy labels [1]. Toward this goal, we classify sentences in privacy policies with category labels, using popular machine learning algorithms, such as a naive Bayes classifier.We choose these algorithms because we could use trained classifiers to evaluate keywords appropriate for privacy labels. Therefore, we adopt words as the features of those algorithms. Experimental results show about 85% accuracy. We think that much higher accuracy is necessary to achieve our final goal. By changing learning settings, we identified one reason of low accuracies such that privacy policies include many sentences which are not direct description of information about categories. It seems that such sentences are redundant but maybe they are essential in case of legal documents in order to prevent misinterpreting. Thus, it is important for machine learning algorithms to handle these redundant sentences appropriately.
Based on Storm, a distributed, reliable, fault-tolerant real-time data stream processing system, we propose a recognition system of web intrusion detection. The system is based on machine learning, feature selection algorithm by TF-IDF(Term Frequency–Inverse Document Frequency) and the optimised cosine similarity algorithm, at low false positive rate and a higher detection rate of attacks and malicious behavior in real-time to protect the security of user data. From comparative analysis of experiments we find that the system for intrusion recognition rate and false positive rate has improved to some extent, it can be better to complete the intrusion detection work.