Biblio
With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
Multiple string matching plays a fundamental role in network intrusion detection systems. Automata-based multiple string matching algorithms like AC, SBDM and SBOM are widely used in practice, but the huge memory usage of automata prevents them from being applied to a large-scale pattern set. Meanwhile, poor cache locality of huge automata degrades the matching speed of algorithms. Here we propose a space-efficient multiple string matching algorithm BVM, which makes use of bit-vector and succinct hash table to replace the automata used in factor-searching-based algorithms. Space complexity of the proposed algorithm is O(rm2 + ΣpϵP |p|), that is more space-efficient than the classic automata-based algorithms. Experiments on datasets including Snort, ClamAV, URL blacklist and synthetic rules show that the proposed algorithm significantly reduces memory usage and still runs at a fast matching speed. Above all, BVM costs less than 0.75% of the memory usage of AC, and is capable of matching millions of patterns efficiently.
Multiple string matching plays a fundamental role in network intrusion detection systems. Automata-based multiple string matching algorithms like AC, SBDM and SBOM are widely used in practice, but the huge memory usage of automata prevents them from being applied to a large-scale pattern set. Meanwhile, poor cache locality of huge automata degrades the matching speed of algorithms. Here we propose a space-efficient multiple string matching algorithm BVM, which makes use of bit-vector and succinct hash table to replace the automata used in factor-searching-based algorithms. Space complexity of the proposed algorithm is O(rm2 + ΣpϵP |p|), that is more space-efficient than the classic automata-based algorithms. Experiments on datasets including Snort, ClamAV, URL blacklist and synthetic rules show that the proposed algorithm significantly reduces memory usage and still runs at a fast matching speed. Above all, BVM costs less than 0.75% of the memory usage of AC, and is capable of matching millions of patterns efficiently.