Biblio
The most of the organizations tend to accumulate the data related to security, which goes up-to terabytes in every month. They collect this data to meet the security requirements. The data is mostly in the shape of logs like Dns logs, Pcap files, and Firewall data etc. The data can be related to any communication network like cloud, telecom, or smart grid network. Generally, these logs are stored in databases or warehouses which becomes ultimately gigantic in size. Such a huge size of data upsurge the importance of security analytics in big data. In surveys, the security experts grumble about the existing tools and recommend for special tools and methods for big data security analysis. In this paper, we are using a big data analysis tool, which is known as apache spark. Although this tool is used for general purpose but we have used this for security analysis. It offers a very good library for machine learning algorithms including the clustering which is the main algorithm used in our work. In this work, we have developed a novel model, which combines rule based and clustering analysis for security analysis of big dataset. The dataset we are using in our experiment is the Kddcup99 which is a widely used dataset for intrusion detection. It is of MBs in size but can be used as a test case for big data security analysis.