Biblio

List
Filter

Found 27 results

Filters: Keyword is Big Data analytics [Clear All Filters]

2023-09-08

Sengul, M. Kutlu, Tarhan, Cigdem, Tecim, Vahap. 2022. Application of Intelligent Transportation System Data using Big Data Technologies. 2022 Innovations in Intelligent Systems and Applications Conference (ASYU). :1–6.

Problems such as the increase in the number of private vehicles with the population, the rise in environmental pollution, the emergence of unmet infrastructure and resource problems, and the decrease in time efficiency in cities have put local governments, cities, and countries in search of solutions. These problems faced by cities and countries are tried to be solved in the concept of smart cities and intelligent transportation by using information and communication technologies in line with the needs. While designing intelligent transportation systems (ITS), beyond traditional methods, big data should be designed in a state-of-the-art and appropriate way with the help of methods such as artificial intelligence, machine learning, and deep learning. In this study, a data-driven decision support system model was established to help the business make strategic decisions with the help of intelligent transportation data and to contribute to the elimination of public transportation problems in the city. Our study model has been established using big data technologies and business intelligence technologies: a decision support system including data sources layer, data ingestion/ collection layer, data storage and processing layer, data analytics layer, application/presentation layer, developer layer, and data management/ data security layer stages. In our study, the decision support system was modeled using ITS data supported by big data technologies, where the traditional structure could not find a solution. This paper aims to create a basis for future studies looking for solutions to the problems of integration, storage, processing, and analysis of big data and to add value to the literature that is missing within the framework of the model. We provide both the lack of literature, eliminate the lack of models before the application process of existing data sets to the business intelligence architecture and a model study before the application to be carried out by the authors.

ISSN: 2770-7946

2023-03-31

Khelifi, Hakima, Belouahri, Amani. 2022. The Impact of Big Data Analytics on Traffic Prediction. 2022 International Conference on Advanced Aspects of Software Engineering (ICAASE). :1–6.

The Internet of Vehicles (IoVs) performs the rapid expansion of connected devices. This massive number of devices is constantly generating a massive and near-real-time data stream for numerous applications, which is known as big data. Analyzing such big data to find, predict, and control decisions is a critical solution for IoVs to enhance service quality and experience. Thus, the main goal of this paper is to study the impact of big data analytics on traffic prediction in IoVs. In which we have used big data analytics steps to predict the traffic flow, and based on different deep neural models such as LSTM, CNN-LSTM, and GRU. The models are validated using evaluation metrics, MAE, MSE, RMSE, and R2. Hence, a case study based on a real-world road is used to implement and test the efficiency of the traffic prediction models.

Habbak, Hany, Metwally, Khaled, Mattar, Ahmed Maher. 2022. Securing Big Data: A Survey on Security Solutions. 2022 13th International Conference on Electrical Engineering (ICEENG). :145–149.

Big Data (BD) is the combination of several technologies which address the gathering, analyzing and storing of massive heterogeneous data. The tremendous spurt of the Internet of Things (IoT) and different technologies are the fundamental incentive behind this enduring development. Moreover, the analysis of this data requires high-performance servers for advanced and parallel data analytics. Thus, data owners with their limited capabilities may outsource their data to a powerful but untrusted environment, i.e., the Cloud. Furthermore, data analytic techniques performed on external cloud may arise various security intimidations regarding the confidentiality and the integrity of the aforementioned; transferred, analyzed, and stored data. To countermeasure these security issues and challenges, several techniques have been addressed. This survey paper aims to summarize and emphasize the security threats within Big Data framework, in addition, it is worth mentioning research work related to Big Data Analytics (BDA).

2021-10-12

Franchina, L., Socal, A.. 2020. Innovative Predictive Model for Smart City Security Risk Assessment. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). :1831–1836.

In a Smart City, new technologies such as big data analytics, data fusion and artificial intelligence will increase awareness by measuring many phenomena and storing a huge amount of data. 5G will allow communication of these data among different infrastructures instantaneously. In a Smart City, security aspects are going to be a major concern. Some drawbacks, such as vulnerabilities of a highly integrated system and information overload, must be considered. To overcome these downsides, an innovative predictive model for Smart City security risk assessment has been developed. Risk metrics and indicators are defined by considering data coming from a wide range of sensors. An innovative ``what if'' algorithm is introduced to identify critical infrastructures functional relationship. Therefore, it is possible to evaluate the effects of an incident that involves one infrastructure over the others.

2020-09-18

Zolanvari, Maede, Teixeira, Marcio A., Gupta, Lav, Khan, Khaled M., Jain, Raj. 2019. Machine Learning-Based Network Vulnerability Analysis of Industrial Internet of Things. IEEE Internet of Things Journal. 6:6822—6834.

It is critical to secure the Industrial Internet of Things (IIoT) devices because of potentially devastating consequences in case of an attack. Machine learning (ML) and big data analytics are the two powerful leverages for analyzing and securing the Internet of Things (IoT) technology. By extension, these techniques can help improve the security of the IIoT systems as well. In this paper, we first present common IIoT protocols and their associated vulnerabilities. Then, we run a cyber-vulnerability assessment and discuss the utilization of ML in countering these susceptibilities. Following that, a literature review of the available intrusion detection solutions using ML models is presented. Finally, we discuss our case study, which includes details of a real-world testbed that we have built to conduct cyber-attacks and to design an intrusion detection system (IDS). We deploy backdoor, command injection, and Structured Query Language (SQL) injection attacks against the system and demonstrate how a ML-based anomaly detection system can perform well in detecting these attacks. We have evaluated the performance through representative metrics to have a fair point of view on the effectiveness of the methods.

2020-08-28

He, Chengkang, Cui, Aijiao, Chang, Chip-Hong. 2019. Identification of State Registers of FSM Through Full Scan by Data Analytics. 2019 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). :1—6.

Finite-state machine (FSM) is widely used as control unit in most digital designs. Many intellectual property protection and obfuscation techniques leverage on the exponential number of possible states and state transitions of large FSM to secure a physical design with the reason that it is challenging to retrieve the FSM design from its downstream design or physical implementation without knowledge of the design. In this paper, we postulate that this assumption may not be sustainable with big data analytics. We demonstrate by applying a data mining technique to analyze sufficiently large amount of data collected from a full scan design to identify its FSM state registers. An impact metric is introduced to discriminate FSM state registers from other registers. A decision tree algorithm is constructed from the scan data for the regression analysis of the dependency of other registers on a chosen register to deduce its impact. The registers with the greater impact are more likely to be the FSM state registers. The proposed scheme is applied on several complex designs from OpenCores. The experiment results show the feasibility of our scheme in correctly identifying most FSM state registers with a high hit rate for a large majority of the designs.

Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Leevy, Joffrey L.. 2019. A Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). :83—88.

Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.

2020-08-24

Fargo, Farah, Franza, Olivier, Tunc, Cihan, Hariri, Salim. 2019. Autonomic Resource Management for Power, Performance, and Security in Cloud Environment. 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA). :1–4.

High performance computing is widely used for large-scale simulations, designs and analysis of critical problems especially through the use of cloud computing systems nowadays because cloud computing provides ubiquitous, on-demand computing capabilities with large variety of hardware configurations including GPUs and FPGAs that are highly used for high performance computing. However, it is well known that inefficient management of such systems results in excessive power consumption affecting the budget, cooling challenges, as well as reducing reliability due to the overheating and hotspots. Furthermore, considering the latest trends in the attack scenarios and crypto-currency based intrusions, security has become a major problem for high performance computing. Therefore, to address both challenges, in this paper we present an autonomic management methodology for both security and power/performance. Our proposed approach first builds knowledge of the environment in terms of power consumption and the security tools' deployment. Next, it provisions virtual resources so that the power consumption can be reduced while maintaining the required performance and deploy the security tools based on the system behavior. Using this approach, we can utilize a wide range of secure resources efficiently in HPC system, cloud computing systems, servers, embedded systems, etc.

2020-07-27

Tun, May Thet, Nyaung, Dim En, Phyu, Myat Pwint. 2019. Performance Evaluation of Intrusion Detection Streaming Transactions Using Apache Kafka and Spark Streaming. 2019 International Conference on Advanced Information Technologies (ICAIT). :25–30.

In the information era, the size of network traffic is complex because of massive Internet-based services and rapid amounts of data. The more network traffic has enhanced, the more cyberattacks have dramatically increased. Therefore, cybersecurity intrusion detection has been a challenge in the current research area in recent years. The Intrusion detection system requires high-level protection and detects modern and complex attacks with more accuracy. Nowadays, big data analytics is the main key to solve marketing, security and privacy in an extremely competitive financial market and government. If a huge amount of stream data flows within a short period time, it is difficult to analyze real-time decision making. Performance analysis is extremely important for administrators and developers to avoid bottlenecks. The paper aims to reduce time-consuming by using Apache Kafka and Spark Streaming. Experiments on the UNSWNB-15 dataset indicate that the integration of Apache Kafka and Spark Streaming can perform better in terms of processing time and fault-tolerance on the huge amount of data. According to the results, the fault tolerance can be provided by the multiple brokers of Kafka and parallel recovery of Spark Streaming. And then, the multiple partitions of Apache Kafka increase the processing time in the integration of Apache Kafka and Spark Streaming.

2020-07-13

Andrew, J., Karthikeyan, J., Jebastin, Jeffy. 2019. Privacy Preserving Big Data Publication On Cloud Using Mondrian Anonymization Techniques and Deep Neural Networks. 2019 5th International Conference on Advanced Computing Communication Systems (ICACCS). :722–727.

In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.

2020-04-20

Liu, Kai-Cheng, Kuo, Chuan-Wei, Liao, Wen-Chiuan, Wang, Pang-Chieh. 2018. Optimized Data de-Identification Using Multidimensional k-Anonymity. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1610–1614.

In the globalized knowledge economy, big data analytics have been widely applied in diverse areas. A critical issue in big data analysis on personal information is the possible leak of personal privacy. Therefore, it is necessary to have an anonymization-based de-identification method to avoid undesirable privacy leak. Such method can prevent published data form being traced back to personal privacy. Prior empirical researches have provided approaches to reduce privacy leak risk, e.g. Maximum Distance to Average Vector (MDAV), Condensation Approach and Differential Privacy. However, previous methods inevitably generate synthetic data of different sizes and is thus unsuitable for general use. To satisfy the need of general use, k-anonymity can be chosen as a privacy protection mechanism in the de-identification process to ensure the data not to be distorted, because k-anonymity is strong in both protecting privacy and preserving data authenticity. Accordingly, this study proposes an optimized multidimensional method for anonymizing data based on both the priority weight-adjusted method and the mean difference recommending tree method (MDR tree method). The results of this study reveal that this new method generate more reliable anonymous data and reduce the information loss rate.

2020-04-10

Robic-Butez, Pierrick, Win, Thu Yein. 2019. Detection of Phishing websites using Generative Adversarial Network. 2019 IEEE International Conference on Big Data (Big Data). :3216—3221.

Phishing is typically deployed as an attack vector in the initial stages of a hacking endeavour. Due to it low-risk rightreward nature it has seen a widespread adoption, and detecting it has become a challenge in recent times. This paper proposes a novel means of detecting phishing websites using a Generative Adversarial Network. Taking into account the internal structure and external metadata of a website, the proposed approach uses a generator network which generates both legitimate as well as synthetic phishing features to train a discriminator network. The latter then determines if the features are either normal or phishing websites, before improving its detection accuracy based on the classification error. The proposed approach is evaluated using two different phishing datasets and is found to achieve a detection accuracy of up to 94%.

2020-02-17

Luntovskyy, Andriy, Globa, Larysa. 2019. Performance, Reliability and Scalability for IoT. 2019 International Conference on Information and Digital Technologies (IDT). :316–321.

So-called IoT, based on use of enabling technologies like 5G, Wi-Fi, BT, NFC, RFID, IPv6 as well as being widely applied for sensor networks, robots, Wearable and Cyber-PHY, invades rapidly to our every day. There are a lot of apps and software platforms to IoT support. However, a most important problem of QoS optimization, which lays in Performance, Reliability and Scalability for IoT, is not yet solved. The extended Internet of the future needs these solutions based on the cooperation between fog and clouds with delegating of the analytics blocks via agents, adaptive interfaces and protocols. The next problem is as follows: IoT can generate large arrays of unmanaged, weakly-structured, and non-configured data of various types, known as "Big Data". The given papers deals with the both problems. A special problem is Security and Privacy in potentially "dangerous" IoTscenarios. Anyway, this subject needs as special discussion for risks evaluation and cooperative intrusion detection. Some advanced approaches for optimization of Performance, Reliability and Scalability for IoT-solutions are offered within the paper. The paper discusses the Best Practises and Case Studies aimed to solution of the established problems.

Hadar, Ethan, Hassanzadeh, Amin. 2019. Big Data Analytics on Cyber Attack Graphs for Prioritizing Agile Security Requirements. 2019 IEEE 27th International Requirements Engineering Conference (RE). :330–339.

In enterprise environments, the amount of managed assets and vulnerabilities that can be exploited is staggering. Hackers' lateral movements between such assets generate a complex big data graph, that contains potential hacking paths. In this vision paper, we enumerate risk-reduction security requirements in large scale environments, then present the Agile Security methodology and technologies for detection, modeling, and constant prioritization of security requirements, agile style. Agile Security models different types of security requirements into the context of an attack graph, containing business process targets and critical assets identification, configuration items, and possible impacts of cyber-attacks. By simulating and analyzing virtual adversary attack paths toward cardinal assets, Agile Security examines the business impact on business processes and prioritizes surgical requirements. Thus, handling these requirements backlog that are constantly evaluated as an outcome of employing Agile Security, gradually increases system hardening, reduces business risks and informs the IT service desk or Security Operation Center what remediation action to perform next. Once remediated, Agile Security constantly recomputes residual risk, assessing risk increase by threat intelligence or infrastructure changes versus defender's remediation actions in order to drive overall attack surface reduction.

Li, Zhifeng, Li, Yintao, Lin, Peng. 2019. The Security Evaluation of Big Data Research for Smart Grid. 2019 15th International Wireless Communications Mobile Computing Conference (IWCMC). :1055–1059.

The technological development of the energy sector also produced complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing which areas of the smart grid system use big data technologies and technologies, big data technologies for detecting smart grid attacks have received attention. Big data analytics can produce efficient solutions and it is especially important to choose which algorithms and metrics to use. For this reason, an application prototype has been proposed that uses a big data method to detect attacks on the smart grid. The algorithm with high accuracy was determined to be 92% for random forests and 87% for decision trees.

2020-01-27

Hibti, Meryem, Baïna, Karim, Benatallah, Boualem. 2019. Towards Swarm Intelligence Architectural Patterns: an IoT-Big Data-AI-Blockchain convergence perspective. Proceedings of the 4th International Conference on Big Data and Internet of Things. :1–8.

The Internet of Things (IoT) is exploding. It is made up of billions of smart devices -from minuscule chips to mammoth machines - that use wireless technology to talk to each other (and to us). IoT infrastructures can vary from instrumented connected devices providing data externally to smart, and autonomous systems. To accompany data explosion resulting, among others, from IoT, Big data analytics processes examine large data sets to uncover hidden patterns, unknown correlations between collected events, either at a very technical level (incident/anomaly detection, predictive maintenance) or at business level (customer preferences, market trends, revenue opportunities) to provide improved operational efficiency, better customer service, competitive advantages over rival organizations, etc. In order to capitalize business value of the data generated by IoT sensors, IoT, Big Data Analytics/IA need to meet in the middle. One critical use case for IoT is to warn organizations when a product or service is at risk. The aim of this paper is to present a first proposal of IoT-Big Data-IA architectural patterns catalogues with a Blockchain implementation perspective in seek of design methodologies artifacts.

2019-03-22

Feder, Oshrit, Gershinsky, Gidon, Tsfadia, Eliad. 2018. RestAssured: Securing Cloud Analytics. Proceedings of the 11th ACM International Systems and Storage Conference. :120-120.

Protecting sensitive business and personal information is a cornerstone requirement when enterprises and organizations move to the cloud. Many aspects of this requirement are already handled at various levels. Data-at-rest can be secured in cloud stores by encrypting it before persisting the data to storage, while data-in-flight is transmitted using protected channels such as TLS and HTTPS. Data-in-use, processed in cloud compute nodes, is the most vulnerable link in the end-to-end information flow, since the process memory can be accessed by malicious privileged software or system administrators. IBM Research - Haifa takes part in a European H2020 research project RestAssured [2] that aims to deliver end-to-end cloud architectures and methodologies for assuring secure data processing in the cloud. We build a trusted analytic platform based on a combination of hardware and software components, and collaborate with the RestAssured partners to implement cloud analytic use cases ranging from social care services to pay-as-you-drive insurance policies. The platform uses the Intel SGX (Software Guard Extension) technology [4], available in Skylake and later processors, that allows to create memory regions (enclaves) protected with hardware encryption in the SoC (system on chip). The data resides unencrypted only inside the processor. It is encrypted in SoC before being written to main memory, and decrypted in SoC upon fetching from main memory. Our team has designed and developed a framework for trust management in SGX enclaves [3] that performs verification (remote attestation) of the enclave hardware and software components, and assists in trusted delivery of secrets (such as data encryption keys) to the enclaves. Apache Spark SQL [1] is the analytic engine of the RestAssured platform. We use the Opaque [6] open source technology [5] from the Berkeley RISELab that integrates the Spark SQL with Intel SGX hardware, and offers data protection by running SQL transformations inside trusted enclaves. We have augmented Opaque with a few key mechanisms for secure data processing in SGX enclaves, by integrating Opaque with our trust management framework to enable remote attestation and data encryption key delivery to Opaque enclaves. We have also developed a component that serves as a gateway between RestAssured use case applications and Opaque clusters. The gateway supports a REST endpoint that accepts SQL query from applications, sends the query for governance verification and modification by a rule engine, and executes the modified query in Opaque. The results are serialized into a JSON object and sent back to the application on a secure REST channel.

2018-12-03

Molka-Danielsen, J., Engelseth, P., Olešnaníková, V., Šarafín, P., Žalman, R.. 2017. Big Data Analytics for Air Quality Monitoring at a Logistics Shipping Base via Autonomous Wireless Sensor Network Technologies. 2017 5th International Conference on Enterprise Systems (ES). :38–45.

The indoor air quality in industrial workplace buildings, e.g. air temperature, humidity and levels of carbon dioxide (CO2), play a critical role in the perceived levels of workers' comfort and in reported medical health. CO2 can act as an oxygen displacer, and in confined spaces humans can have, for example, reactions of dizziness, increased heart rate and blood pressure, headaches, and in more serious cases loss of consciousness. Specialized organizations can be brought in to monitor the work environment for limited periods. However, new low cost wireless sensor network (WSN) technologies offer potential for more continuous and autonomous assessment of industrial workplace air quality. Central to effective decision making is the data analytics approach and visualization of what is potentially, big data (BD) in monitoring the air quality in industrial workplaces. This paper presents a case study that monitors air quality that is collected with WSN technologies. We discuss the potential BD problems. The case trials are from two workshops that are part of a large on-shore logistics base a regional shipping industry in Norway. This small case study demonstrates a monitoring and visualization approach for facilitating BD in decision making for health and safety in the shipping industry. We also identify other potential applications of WSN technologies and visualization of BD in the workplace environments; for example, for monitoring of other substances for worker safety in high risk industries and for quality of goods in supply chain management.

2018-05-24

Mehnaz, Shagufta, Bellala, Gowtham, Bertino, Elisa. 2017. A Secure Sum Protocol and Its Application to Privacy-Preserving Multi-Party Analytics. Proceedings of the 22Nd ACM on Symposium on Access Control Models and Technologies. :219–230.

Many enterprises are transitioning towards data-driven business processes. There are numerous situations where multiple parties would like to share data towards a common goal if it were possible to simultaneously protect the privacy and security of the individuals and organizations described in the data. Existing solutions for multi-party analytics that follow the so called Data Lake paradigm have parties transfer their raw data to a trusted third-party (i.e., mediator), which then performs the desired analysis on the global data, and shares the results with the parties. However, such a solution does not fit many applications such as Healthcare, Finance, and the Internet-of-Things, where privacy is a strong concern. Motivated by the increasing demands for data privacy, we study the problem of privacy-preserving multi-party data analytics, where the goal is to enable analytics on multi-party data without compromising the data privacy of each individual party. In this paper, we first propose a secure sum protocol with strong security guarantees. The proposed secure sum protocol is resistant to collusion attacks even with N-2 parties colluding, where N denotes the total number of collaborating parties. We then use this protocol to propose two secure gradient descent algorithms, one for horizontally partitioned data, and the other for vertically partitioned data. The proposed framework is generic and applies to a wide class of machine learning problems. We demonstrate our solution for two popular use-cases, regression and classification, and evaluate the performance of the proposed solution in terms of the obtained model accuracy, latency and communication cost. In addition, we perform a scalability analysis to evaluate the performance of the proposed solution as the data size and the number of parties increase.

2018-02-06

Iqbal, H., Ma, J., Mu, Q., Ramaswamy, V., Raymond, G., Vivanco, D., Zuena, J.. 2017. Augmenting Security of Internet-of-Things Using Programmable Network-Centric Approaches: A Position Paper. 2017 26th International Conference on Computer Communication and Networks (ICCCN). :1–6.

Advances in nanotechnology, large scale computing and communications infrastructure, coupled with recent progress in big data analytics, have enabled linking several billion devices to the Internet. These devices provide unprecedented automation, cognitive capabilities, and situational awareness. This new ecosystem–termed as the Internet-of-Things (IoT)–also provides many entry points into the network through the gadgets that connect to the Internet, making security of IoT systems a complex problem. In this position paper, we argue that in order to build a safer IoT system, we need a radically new approach to security. We propose a new security framework that draws ideas from software defined networks (SDN), and data analytics techniques; this framework provides dynamic policy enforcements on every layer of the protocol stack and can adapt quickly to a diverse set of industry use-cases that IoT deployments cater to. Our proposal does not make any assumptions on the capabilities of the devices - it can work with already deployed as well as new types of devices, while also conforming to a service-centric architecture. Even though our focus is on industrial IoT systems, the ideas presented here are applicable to IoT used in a wide array of applications. The goal of this position paper is to initiate a dialogue among standardization bodies and security experts to help raise awareness about network-centric approaches to IoT security.

Marciani, G., Porretta, M., Nardelli, M., Italiano, G. F.. 2017. A Data Streaming Approach to Link Mining in Criminal Networks. 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW). :138–143.

The ability to discover patterns of interest in criminal networks can support and ease the investigation tasks by security and law enforcement agencies. By considering criminal networks as a special case of social networks, we can properly reuse most of the state-of-the-art techniques to discover patterns of interests, i.e., hidden and potential links. Nevertheless, in time-sensible scenarios, like the one involving criminal actions, the ability to discover patterns in a (near) real-time manner can be of primary importance.In this paper, we investigate the identification of patterns for link detection and prediction on an evolving criminal network. To extract valuable information as soon as data is generated, we exploit a stream processing approach. To this end, we also propose three new similarity social network metrics, specifically tailored for criminal link detection and prediction. Then, we develop a flexible data stream processing application relying on the Apache Flink framework; this solution allows us to deploy and evaluate the newly proposed metrics as well as the ones existing in literature. The experimental results show that the new metrics we propose can reach up to 83% accuracy in detection and 82% accuracy in prediction, resulting competitive with the state of the art metrics.

Vimalkumar, K., Radhika, N.. 2017. A Big Data Framework for Intrusion Detection in Smart Grids Using Apache Spark. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :198–204.

Technological advancement enables the need of internet everywhere. The power industry is not an exception in the technological advancement which makes everything smarter. Smart grid is the advanced version of the traditional grid, which makes the system more efficient and self-healing. Synchrophasor is a device used in smart grids to measure the values of electric waves, voltages and current. The phasor measurement unit produces immense volume of current and voltage data that is used to monitor and control the performance of the grid. These data are huge in size and vulnerable to attacks. Intrusion Detection is a common technique for finding the intrusions in the system. In this paper, a big data framework is designed using various machine learning techniques, and intrusions are detected based on the classifications applied on the synchrophasor dataset. In this approach various machine learning techniques like deep neural networks, support vector machines, random forest, decision trees and naive bayes classifications are done for the synchrophasor dataset and the results are compared using metrics of accuracy, recall, false rate, specificity, and prediction time. Feature selection and dimensionality reduction algorithms are used to reduce the prediction time taken by the proposed approach. This paper uses apache spark as a platform which is suitable for the implementation of Intrusion Detection system in smart grids using big data analytics.

Heifetz, A., Mugunthan, V., Kagal, L.. 2017. Shade: A Differentially-Private Wrapper for Enterprise Big Data. 2017 IEEE International Conference on Big Data (Big Data). :1033–1042.

Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.

2017-12-20

Dong, B., Wang, H.(. 2017. EARRING: Efficient Authentication of Outsourced Record Matching. 2017 IEEE International Conference on Information Reuse and Integration (IRI). :225–234.

Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. In this paper, we consider the outsourcing model in which a third-party server provides record matching as a service. In particular, given a target record, the service provider returns all records from the outsourced dataset that match the target according to specific distance metrics. Identifying matching records in databases plays an important role in information integration and entity resolution. A major security concern of this outsourcing paradigm is whether the service provider returns the correct record matching results. To solve the problem, we design EARRING, an Efficient Authentication of outsouRced Record matchING framework. EARRING requires the service provider to construct the verification object (VO) of the record matching results. From the VO, the client is able to catch any incorrect result with cheap computational cost. Experiment results on real-world datasets demonstrate the efficiency of EARRING.