Visible to the public Biblio

Filters: Keyword is association rules  [Clear All Filters]
2022-06-08
Imtiaz, Sayem Mohammad, Sultana, Kazi Zakia, Varde, Aparna S..  2021.  Mining Learner-friendly Security Patterns from Huge Published Histories of Software Applications for an Intelligent Tutoring System in Secure Coding. 2021 IEEE International Conference on Big Data (Big Data). :4869–4876.

Security patterns are proven solutions to recurring problems in software development. The growing importance of secure software development has introduced diverse research efforts on security patterns that mostly focused on classification schemes, evolution and evaluation of the patterns. Despite a huge mature history of research and popularity among researchers, security patterns have not fully penetrated software development practices. Besides, software security education has not been benefited by these patterns though a commonly stated motivation is the dissemination of expert knowledge and experience. This is because the patterns lack a simple embodiment to help students learn about vulnerable code, and to guide new developers on secure coding. In order to address this problem, we propose to conduct intelligent data mining in the context of software engineering to discover learner-friendly software security patterns. Our proposed model entails knowledge discovery from large scale published real-world vulnerability histories in software applications. We harness association rule mining for frequent pattern discovery to mine easily comprehensible and explainable learner-friendly rules, mainly of the type "flaw implies fix" and "attack type implies flaw", so as to enhance training in secure coding which in turn would augment secure software development. We propose to build a learner-friendly intelligent tutoring system (ITS) based on the newly discovered security patterns and rules explored. We present our proposed model based on association rule mining in secure software development with the goal of building this ITS. Our proposed model and prototype experiments are discussed in this paper along with challenges and ongoing work.

2022-03-15
Li, Yang, Bai, Liyun, Zhang, Mingqi, Wang, Siyuan, Wu, Jing, Jiang, Hao.  2021.  Network Protocol Reverse Parsing Based on Bit Stream. 2021 8th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2021 7th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom). :83—90.
The network security problem brought by the cloud computing has become an important issue to be dealt with in information construction. Since anomaly detection and attack detection in cloud environment need to find the vulnerability through the reverse analysis of data flow, it is of great significance to carry out the reverse analysis of unknown network protocol in the security application of cloud environment. To solve this problem, an improved mining method on bitstream protocol association rules with unknown type and format is proposed. The method combines the location information of the protocol framework to make the frequent extraction process more concise and accurate. In addition, for the frame separation problem of unknown protocol, we design a hierarchical clustering algorithm based on Jaccard distance and a frame field delimitation method based on the proximity of information entropy between bytes. The experimental results show that this technology can correctly resolve the protocol format and realize the purpose of anomaly detection in cloud computing, and ensure the security of cloud services.
2021-04-08
Yang, Z., Sun, Q., Zhang, Y., Zhu, L., Ji, W..  2020.  Inference of Suspicious Co-Visitation and Co-Rating Behaviors and Abnormality Forensics for Recommender Systems. IEEE Transactions on Information Forensics and Security. 15:2766—2781.
The pervasiveness of personalized collaborative recommender systems has shown the powerful capability in a wide range of E-commerce services such as Amazon, TripAdvisor, Yelp, etc. However, fundamental vulnerabilities of collaborative recommender systems leave space for malicious users to affect the recommendation results as the attackers desire. A vast majority of existing detection methods assume certain properties of malicious attacks are given in advance. In reality, improving the detection performance is usually constrained due to the challenging issues: (a) various types of malicious attacks coexist, (b) limited representations of malicious attack behaviors, and (c) practical evidences for exploring and spotting anomalies on real-world data are scarce. In this paper, we investigate a unified detection framework in an eye for an eye manner without being bothered by the details of the attacks. Firstly, co-visitation and co-rating graphs are constructed using association rules. Then, attribute representations of nodes are empirically developed from the perspectives of linkage pattern, structure-based property and inherent association of nodes. Finally, both attribute information and connective coherence of graph are combined in order to infer suspicious nodes. Extensive experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed detection approach compared with competing benchmarks. Additionally, abnormality forensics metrics including distribution of rating intention, time aggregation of suspicious ratings, degree distributions before as well as after removing suspicious nodes and time series analysis of historical ratings, are provided so as to discover interesting findings such as suspicious nodes (items or ratings) on real-world data.
2021-03-29
Mar, Z., Oo, K. K..  2020.  An Improvement of Apriori Mining Algorithm using Linked List Based Hash Table. 2020 International Conference on Advanced Information Technologies (ICAIT). :165–169.
Today, the huge amount of data was using in organizations around the world. This huge amount of data needs to process so that we can acquire useful information. Consequently, a number of industry enterprises discovered great information from shopper purchases found in any respect times. In data mining, the most important algorithms for find frequent item sets from large database is Apriori algorithm and discover the knowledge using the association rule. Apriori algorithm was wasted times for scanning the whole database and searching the frequent item sets and inefficient of memory requirement when large numbers of transactions are in consideration. The improved Apriori algorithm is adding and calculating third threshold may increase the overhead. So, in the aims of proposed research, Improved Apriori algorithm with LinkedList and hash tabled is used to mine frequent item sets from the transaction large amount of database. This method includes database is scanning with Improved Apriori algorithm and frequent 1-item sets counts with using the hash table. Then, in the linked list saved the next frequent item sets and scanning the database. The hash table used to produce the frequent 2-item sets Therefore, the database scans the only two times and necessary less processing time and memory space.
2020-06-08
Sun, Wenhua, Wang, Xiaojuan, Jin, Lei.  2019.  An Efficient Hash-Tree-Based Algorithm in Mining Sequential Patterns with Topology Constraint. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :2782–2789.
Warnings happen a lot in real transmission networks. These warnings can affect people's lives. It is significant to analyze the alarm association rules in the network. Many algorithms can help solve this problem but not considering the actual physical significance. Therefore, in this study, we mine the association rules in warning weblogs based on a sequential mining algorithm (GSP) with topology structure. We define a topology constraint from network physical connection data. Under the topology constraint, network nodes have topology relation if they are directly connected or have a common adjacency node. In addition, due to the large amount of data, we implement the hash-tree search method to improve the mining efficiency. The theoretical solution is feasible and the simulation results verify our method. In simulation, the topology constraint improves the accuracy for 86%-96% and decreases the run time greatly at the same time. The hash-tree based mining results show that hash tree efficiency improvements are in 3-30% while the number of patterns remains unchanged. In conclusion, using our method can mine association rules efficiently and accurately in warning weblogs.
2019-01-16
Peake, Georgina, Wang, Jun.  2018.  Explanation Mining: Post Hoc Interpretability of Latent Factor Models for Recommendation Systems. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. :2060–2069.
The widescale use of machine learning algorithms to drive decision-making has highlighted the critical importance of ensuring the interpretability of such models in order to engender trust in their output. The state-of-the-art recommendation systems use black-box latent factor models that provide no explanation of why a recommendation has been made, as they abstract their decision processes to a high-dimensional latent space which is beyond the direct comprehension of humans. We propose a novel approach for extracting explanations from latent factor recommendation systems by training association rules on the output of a matrix factorisation black-box model. By taking advantage of the interpretable structure of association rules, we demonstrate that predictive accuracy of the recommendation model can be maintained whilst yielding explanations with high fidelity to the black-box model on a unique industry dataset. Our approach mitigates the accuracy-interpretability trade-off whilst avoiding the need to sacrifice flexibility or use external data sources. We also contribute to the ill-defined problem of evaluating interpretability.
2017-02-27
Li-xiong, Z., Xiao-lin, X., Jia, L., Lu, Z., Xuan-chen, P., Zhi-yuan, M., Li-hong, Z..  2015.  Malicious URL prediction based on community detection. 2015 International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC). :1–7.

Traditional Anti-virus technology is primarily based on static analysis and dynamic monitoring. However, both technologies are heavily depended on application files, which increase the risk of being attacked, wasting of time and network bandwidth. In this study, we propose a new graph-based method, through which we can preliminary detect malicious URL without application file. First, the relationship between URLs can be found through the relationship between people and URLs. Then the association rules can be mined with confidence of each frequent URLs. Secondly, the networks of URLs was built through the association rules. When the networks of URLs were finished, we clustered the date with modularity to detect communities and every community represents different types of URLs. We suppose that a URL has association with one community, then the URL is malicious probably. In our experiments, we successfully captured 82 % of malicious samples, getting a higher capture than using traditional methods.

2017-02-23
A. Soliman, L. Bahri, B. Carminati, E. Ferrari, S. Girdzijauskas.  2015.  "DIVa: Decentralized identity validation for social networks". 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :383-391.

Online Social Networks exploit a lightweight process to identify their users so as to facilitate their fast adoption. However, such convenience comes at the price of making legitimate users subject to different threats created by fake accounts. Therefore, there is a crucial need to empower users with tools helping them in assigning a level of trust to whomever they interact with. To cope with this issue, in this paper we introduce a novel model, DIVa, that leverages on mining techniques to find correlations among user profile attributes. These correlations are discovered not from user population as a whole, but from individual communities, where the correlations are more pronounced. DIVa exploits a decentralized learning approach and ensures privacy preservation as each node in the OSN independently processes its local data and is required to know only its direct neighbors. Extensive experiments using real-world OSN datasets show that DIVa is able to extract fine-grained community-aware correlations among profile attributes with average improvements up to 50% than the global approach.

2017-02-14
F. Quader, V. Janeja, J. Stauffer.  2015.  "Persistent threat pattern discovery". 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). :179-181.

Advanced Persistent Threat (APT) is a complex (Advanced) cyber-attack (Threat) against specific targets over long periods of time (Persistent) carried out by nation states or terrorist groups with highly sophisticated levels of expertise to establish entries into organizations, which are critical to a country's socio-economic status. The key identifier in such persistent threats is that patterns are long term, could be high priority, and occur consistently over a period of time. This paper focuses on identifying persistent threat patterns in network data, particularly data collected from Intrusion Detection Systems. We utilize Association Rule Mining (ARM) to detect persistent threat patterns on network data. We identify potential persistent threat patterns, which are frequent but at the same time unusual as compared with the other frequent patterns.

2015-05-05
Lomotey, R.K., Deters, R..  2014.  Terms Mining in Document-Based NoSQL: Response to Unstructured Data. Big Data (BigData Congress), 2014 IEEE International Congress on. :661-668.

Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.

Haoliang Lou, Yunlong Ma, Feng Zhang, Min Liu, Weiming Shen.  2014.  Data mining for privacy preserving association rules based on improved MASK algorithm. Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on. :265-270.

With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
 

Haoliang Lou, Yunlong Ma, Feng Zhang, Min Liu, Weiming Shen.  2014.  Data mining for privacy preserving association rules based on improved MASK algorithm. Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on. :265-270.

With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.