Biblio
Community structure detection in social networks has become a big challenge. Various methods in the literature have been presented to solve this challenge. Recently, several methods have also been proposed to solve this challenge based on a mapping-reduction model, in which data and algorithms are divided between different process nodes so that the complexity of time and memory of community detection in large social networks is reduced. In this paper, a mapping-reduction model is first proposed to detect the structure of communities. Then the proposed framework is rewritten according to a new mechanism called distributed cache memory; distributed cache memory can store different values associated with different keys and, if necessary, put them at different computational nodes. Finally, the proposed rewritten framework has been implemented using SPARK tools and its implementation results have been reported on several major social networks. The performed experiments show the effectiveness of the proposed framework by varying the values of various parameters.
Peer-to-Peer botnets have become one of the significant threat against network security due to their distributed properties. The decentralized nature makes their detection challenging. It is important to take measures to detect bots as soon as possible to minimize their harm. In this paper, we propose PeerGrep, a novel system capable of identifying P2P bots. PeerGrep starts from identifying hosts that are likely engaged in P2P communications, and then distinguishes P2P bots from P2P hosts by analyzing their active ratio, packet size and the periodicity of connection to destination IP addresses. The evaluation shows that PeerGrep can identify all P2P bots with quite low FPR even if the malicious P2P application and benign P2P application coexist within the same host or there is only one bot in the monitored network.
This research proposes a system for detecting known and unknown Distributed Denial of Service (DDoS) Attacks. The proposed system applies two different intrusion detection approaches anomaly-based distributed artificial neural networks(ANNs) and signature-based approach. The Amazon public cloud was used for running Spark as the fast cluster engine with varying cores of machines. The experiment results achieved the highest detection accuracy and detection rate comparing to signature based or neural networks-based approach.
Data intensive computing research and technology developments offer the potential of providing significant improvements in several security log management challenges. Approaches to address the complexity, timeliness, expense, diversity, and noise issues have been identified. These improvements are motivated by the increasingly important role of analytics. Machine learning and expert systems that incorporate attack patterns are providing greater detection insights. Finding actionable indicators requires the analysis to combine security event log data with other network data such and access control lists, making the big-data problem even bigger. Automation of threat intelligence is recognized as not complete with limited adoption of standards. With limited progress in anomaly signature detection, movement towards using expert systems has been identified as the path forward. Techniques focus on matching behaviors of attackers to patterns of abnormal activity in the network. The need to stream, parse, and analyze large volumes of small, semi-structured data files can be feasibly addressed through a variety of techniques identified by researchers. This report highlights research in key areas, including protection of the data, performance of the systems and network bandwidth utilization.
Enormous amount of educational data has been accumulated through Massive Open Online Courses (MOOCs), as well as commercial and non-commercial learning platforms. This is in addition to the educational data released by US government since 2012 to facilitate disruption in education by making data freely available. The high volume, variety and velocity of collected data necessitate use of big data tools and storage systems such as distributed databases for storage and Apache Spark for analysis. This tutorial will introduce researchers and faculty to real-world applications involving data mining and predictive analytics in learning sciences. In addition, the tutorial will introduce statistics required to validate and accurately report results. Topics will cover how big data is being used to transform education. Specifically, we will demonstrate how exploratory data analysis, data mining, predictive analytics, machine learning, and visualization techniques are being applied to educational big data to improve learning and scale insights driven from millions of student's records. The tutorial will be held over a half day and will be hands on with pre-posted material. Due to the interdisciplinary nature of work, the tutorial appeals to researchers from a wide range of backgrounds including big data, predictive analytics, learning sciences, educational data mining, and in general, those interested in how big data analytics can transform learning. As a prerequisite, attendees are required to have familiarity with at least one programming language.
Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark.