Visible to the public Biblio

Filters: Keyword is Itemsets  [Clear All Filters]
2022-08-12
Ji, Yi, Ohsawa, Yukio.  2021.  Mining Frequent and Rare Itemsets With Weighted Supports Using Additive Neural Itemset Embedding. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.
Over the past two decades, itemset mining techniques have become an integral part of pattern mining in large databases. We present a novel system for mining frequent and rare itemsets simultaneously with supports weighted by cardinality in transactional datasets. Based on our neural item embedding with additive compositionality, the original mining problems are approximately reduced to polynomial-time convex optimization, namely a series of vector subset selection problems in Euclidean space. The numbers of transactions and items are no longer exponential factors of the time complexity under such reduction, except only the Euclidean space dimension, which can be assigned arbitrarily for a trade-off between mining speed and result quality. The efficacy of our method reveals that additive compositionality can be represented by linear translation in the itemset vector space, which resembles the linguistic regularities in word embedding by similar neural modeling. Experiments show that our learned embedding can bring pattern itemsets with higher accuracy than sampling-based lossy mining techniques in most cases, and the scalability of our mining approach triumphs over several state-of-the-art distributed mining algorithms.
2021-05-05
Hossain, Md. Turab, Hossain, Md. Shohrab, Narman, Husnu S..  2020.  Detection of Undesired Events on Real-World SCADA Power System through Process Monitoring. 2020 11th IEEE Annual Ubiquitous Computing, Electronics Mobile Communication Conference (UEMCON). :0779—0785.
A Supervisory Control and Data Acquisition (SCADA) system used in controlling or monitoring purpose in industrial process automation system is the process of collecting data from instruments and sensors located at remote sites and transmitting data at a central site. Most of the existing works on SCADA system focused on simulation-based study which cannot always mimic the real world situations. We propose a novel methodology that analyzes SCADA logs on offline basis and helps to detect process-related threats. This threat takes place when an attacker performs malicious actions after gaining user access. We conduct our experiments on a real-life SCADA system of a Power transmission utility. Our proposed methodology will automate the analysis of SCADA logs and systemically identify undesired events. Moreover, it will help to analyse process-related threats caused by user activity. Several test study suggest that our approach is powerful in detecting undesired events that might caused by possible malicious occurrence.
2021-03-29
Mar, Z., Oo, K. K..  2020.  An Improvement of Apriori Mining Algorithm using Linked List Based Hash Table. 2020 International Conference on Advanced Information Technologies (ICAIT). :165–169.
Today, the huge amount of data was using in organizations around the world. This huge amount of data needs to process so that we can acquire useful information. Consequently, a number of industry enterprises discovered great information from shopper purchases found in any respect times. In data mining, the most important algorithms for find frequent item sets from large database is Apriori algorithm and discover the knowledge using the association rule. Apriori algorithm was wasted times for scanning the whole database and searching the frequent item sets and inefficient of memory requirement when large numbers of transactions are in consideration. The improved Apriori algorithm is adding and calculating third threshold may increase the overhead. So, in the aims of proposed research, Improved Apriori algorithm with LinkedList and hash tabled is used to mine frequent item sets from the transaction large amount of database. This method includes database is scanning with Improved Apriori algorithm and frequent 1-item sets counts with using the hash table. Then, in the linked list saved the next frequent item sets and scanning the database. The hash table used to produce the frequent 2-item sets Therefore, the database scans the only two times and necessary less processing time and memory space.
2021-02-16
Wu, J. M.-T., Srivastava, G., Pirouz, M., Lin, J. C.-W..  2020.  A GA-based Data Sanitization for Hiding Sensitive Information with Multi-Thresholds Constraint. 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). :29—34.
In this work, we propose a new concept of multiple support thresholds to sanitize the database for specific sensitive itemsets. The proposed method assigns a stricter threshold to the sensitive itemset for data sanitization. Furthermore, a genetic-algorithm (GA)-based model is involved in the designed algorithm to minimize side effects. In our experimental results, the GA-based PPDM approach is compared with traditional compact GA-based model and results clearly showed that our proposed method can obtain better performance with less computational cost.
2019-12-16
Lin, Jerry Chun-Wei, Zhang, Yuyu, Chen, Chun-Hao, Wu, Jimmy Ming-Tai, Chen, Chien-Ming, Hong, Tzung-Pei.  2018.  A Multiple Objective PSO-Based Approach for Data Sanitization. 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI). :148–151.
In this paper, a multi-objective particle swarm optimization (MOPSO)-based framework is presented to find the multiple solutions rather than a single one. The presented grid-based algorithm is used to assign the probability of the non-dominated solution for next iteration. Based on the designed algorithm, it is unnecessary to pre-define the weights of the side effects for evaluation but the non-dominated solutions can be discovered as an alternative way for data sanitization. Extensive experiments are carried on two datasets to show that the designed grid-based algorithm achieves good performance than the traditional single-objective evolution algorithms.
Wu, Jimmy Ming-Tai, Chun-Wei Lin, Jerry, Djenouri, Youcef, Fournier-Viger, Philippe, Zhang, Yuyu.  2019.  A Swarm-based Data Sanitization Algorithm in Privacy-Preserving Data Mining. 2019 IEEE Congress on Evolutionary Computation (CEC). :1461–1467.
In recent decades, data protection (PPDM), which not only hides information, but also provides information that is useful to make decisions, has become a critical concern. We present a sanitization algorithm with the consideration of four side effects based on multi-objective PSO and hierarchical clustering methods to find optimized solutions for PPDM. Experiments showed that compared to existing approaches, the designed sanitization algorithm based on the hierarchical clustering method achieves satisfactory performance in terms of hiding failure, missing cost, and artificial cost.
2019-03-06
Leung, C. K., Hoi, C. S. H., Pazdor, A. G. M., Wodi, B. H., Cuzzocrea, A..  2018.  Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data. 2018 IEEE International Conference on Big Data (Big Data). :5101-5110.
As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data. In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present- in this paper-a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.
2017-02-23
G. DAngelo, S. Rampone, F. Palmieri.  2015.  "An Artificial Intelligence-Based Trust Model for Pervasive Computing". 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC). :701-706.

Pervasive Computing is one of the latest and more advanced paradigms currently available in the computers arena. Its ability to provide the distribution of computational services within environments where people live, work or socialize leads to make issues such as privacy, trust and identity more challenging compared to traditional computing environments. In this work we review these general issues and propose a Pervasive Computing architecture based on a simple but effective trust model that is better able to cope with them. The proposed architecture combines some Artificial Intelligence techniques to achieve close resemblance with human-like decision making. Accordingly, Apriori algorithm is first used in order to extract the behavioral patterns adopted from the users during their network interactions. Naïve Bayes classifier is then used for final decision making expressed in term of probability of user trustworthiness. To validate our approach we applied it to some typical ubiquitous computing scenarios. The obtained results demonstrated the usefulness of such approach and the competitiveness against other existing ones.

2017-02-14
K. F. Hong, C. C. Chen, Y. T. Chiu, K. S. Chou.  2015.  "Ctracer: Uncover C amp;amp;C in Advanced Persistent Threats Based on Scalable Framework for Enterprise Log Data". 2015 IEEE International Congress on Big Data. :551-558.

Advanced Persistent Threat (APT), unlike traditional hacking attempts, carries out specific attacks on a specific target to illegally collect information and data from it. These targeted attacks use special-crafted malware and infrequent activity to avoid detection, so that hackers can retain control over target systems unnoticed for long periods of time. In order to detect these stealthy activities, a large-volume of traffic data generated in a period of time has to be analyzed. We proposed a scalable solution, Ctracer to detect stealthy command and control channel in a large-volume of traffic data. APT uses multiple command and control (C&C) channel and change them frequently to avoid detection, but there are common signatures in those C&C sessions. By identifying common network signature, Ctracer is able to group the C&C sessions. Therefore, we can detect an APT and all the C&C session used in an APT attack. The Ctracer is evaluated in a large enterprise for four months, twenty C&C servers, three APT attacks are reported. After investigated by the enterprise's Security Operations Center (SOC), the forensic report shows that there is specific enterprise targeted APT cases and not ever discovered for over 120 days.

2015-05-05
Haoliang Lou, Yunlong Ma, Feng Zhang, Min Liu, Weiming Shen.  2014.  Data mining for privacy preserving association rules based on improved MASK algorithm. Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on. :265-270.

With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
 

Haoliang Lou, Yunlong Ma, Feng Zhang, Min Liu, Weiming Shen.  2014.  Data mining for privacy preserving association rules based on improved MASK algorithm. Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on. :265-270.

With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.