Biblio
Lately mining of information from online life is pulling in more consideration because of the blast in the development of Big Data. In security, Big Data manages an assortment of immense advanced data for investigating, envisioning and to draw the bits of knowledge for the expectation and anticipation of digital assaults. Big Data Analytics (BDA) is the term composed by experts to portray the art of dealing with, taking care of and gathering a great deal of data for future evaluation. Data is being made at an upsetting rate. The quick improvement of the Internet, Internet of Things (IoT) and other creative advances are the rule liable gatherings behind this proceeded with advancement. The data made is an impression of the earth, it is conveyed out of, along these lines can use the data got away from structures to understand the internal exercises of that system. This has become a significant element in cyber security where the objective is to secure resources. Moreover, the developing estimation of information has made large information a high worth objective. Right now, investigate ongoing exploration works in cyber security comparable to huge information and feature how Big information is secured and how huge information can likewise be utilized as a device for cyber security. Simultaneously, a Big Data based concentrated log investigation framework is actualized to distinguish the system traffic happened with assailants through DDOS, SQL Injection and Bruce Force assault. The log record is naturally transmitted to the brought together cloud server and big information is started in the investigation process.
Partitional Clustering Algorithm (PCA) on the Hadoop Distributed File System is to perform big data securities using the Perturbation Technique is the main idea of the proposed work. There are numerous clustering methods available that are used to categorize the information from the big data. PCA discovers the cluster based on the initial partition of the data. In this approach, it is possible to develop a security safeguarding of data that is impoverished to allow the calculations and communication. The performances were analyzed on Health Care database under the studies of various parameters like precision, accuracy, and F-score measure. The outcome of the results is to demonstrate that this method is used to decrease the complication in preserving privacy and better accuracy than that of the existing techniques.
Cloud computing is an Internet-based technology that emerging rapidly in the last few years due to popular and demanded services required by various institutions, organizations, and individuals. structured, unstructured, semistructured data is transfer at a record pace on to the cloud server. These institutions, businesses, and organizations are shifting more and more increasing workloads on cloud server, due to high cost, space and maintenance issues from big data, cloud computing will become a potential choice for the storage of data. In Cloud Environment, It is obvious that data is not secure completely yet from inside and outside attacks and intrusions because cloud servers are under the control of a third party. The Security of data becomes an important aspect due to the storage of sensitive data in a cloud environment. In this paper, we give an overview of characteristics and state of art of big data and data security & privacy top threats, open issues and current challenges and their impact on business are discussed for future research perspective and review & analysis of previous and recent frameworks and architectures for data security that are continuously established against threats to enhance how to keep and store data in the cloud environment.
Aiming at the problems of poor stability and low accuracy of current communication data informatization processing methods, this paper proposes a research on nonlinear frequency hopping communication data informatization under the framework of big data security evaluation. By adding a frequency hopping mediation module to the frequency hopping communication safety evaluation framework, the communication interference information is discretely processed, and the data parameters of the nonlinear frequency hopping communication data are corrected and converted by combining a fast clustering analysis algorithm, so that the informatization processing of the nonlinear frequency hopping communication data under the big data safety evaluation framework is completed. Finally, experiments prove that the research on data informatization of nonlinear frequency hopping communication under the framework of big data security evaluation could effectively improve the accuracy and stability.
Smart technologies at hand have facilitated generation and collection of huge volumes of data, on daily basis. It involves highly sensitive and diverse data like personal, organisational, environment, energy, transport and economic data. Data Analytics provide solution for various issues being faced by smart cities like crisis response, disaster resilience, emergence management, smart traffic management system etc.; it requires distribution of sensitive data among various entities within or outside the smart city,. Sharing of sensitive data creates a need for efficient usage of smart city data to provide smart applications and utility to the end users in a trustworthy and safe mode. This shared sensitive data if get leaked as a consequence can cause damage and severe risk to the city's resources. Fortification of critical data from unofficial disclosure is biggest issue for success of any project. Data Leakage Detection provides a set of tools and technology that can efficiently resolves the concerns related to smart city critical data. The paper, showcase an approach to detect the leakage which is caused intentionally or unintentionally. The model represents allotment of data objects between diverse agents using Bigraph. The objective is to make critical data secure by revealing the guilty agent who caused the data leakage.
The huge volume, variety, and velocity of big data have empowered Machine Learning (ML) techniques and Artificial Intelligence (AI) systems. However, the vast portion of data used to train AI systems is sensitive information. Hence, any vulnerability has a potentially disastrous impact on privacy aspects and security issues. Nevertheless, the increased demands for high-quality AI from governments and companies require the utilization of big data in the systems. Several studies have highlighted the threats of big data on different platforms and the countermeasures to reduce the risks caused by attacks. In this paper, we provide an overview of the existing threats which violate privacy aspects and security issues inflicted by big data as a primary driving force within the AI/ML workflow. We define an adversarial model to investigate the attacks. Additionally, we analyze and summarize the defense strategies and countermeasures of these attacks. Furthermore, due to the impact of AI systems in the market and the vast majority of business sectors, we also investigate Standards Developing Organizations (SDOs) that are actively involved in providing guidelines to protect the privacy and ensure the security of big data and AI systems. Our far-reaching goal is to bridge the research and standardization frame to increase the consistency and efficiency of AI systems developments guaranteeing customer satisfaction while transferring a high degree of trustworthiness.
Cloud service providers offer a low-cost and convenient solution to host unstructured data. However, cloud services act as third-party solutions and do not provide control of the data to users. This has raised security and privacy concerns for many organizations (users) with sensitive data to utilize cloud-based solutions. User-side encryption can potentially address these concerns by establishing user-centric cloud services and granting data control to the user. Nonetheless, user-side encryption limits the ability to process (e.g., search) encrypted data on the cloud. Accordingly, in this research, we provide a framework that enables processing (in particular, searching) of encrypted multiorganizational (i.e., multi-source) big data without revealing the data to cloud provider. Our framework leverages locality feature of edge computing to offer a user-centric search ability in a realtime manner. In particular, the edge system intelligently predicts the user's search pattern and prunes the multi-source big data search space to reduce the search time. The pruning system is based on efficient sampling from the clustered big dataset on the cloud. For each cluster, the pruning system dynamically samples appropriate number of terms based on the user's search tendency, so that the cluster is optimally represented. We developed a prototype of a user-centric search system and evaluated it against multiple datasets. Experimental results demonstrate 27% improvement in the pruning quality and search accuracy.
Withgrowing times and technology, and the data related to it is increasing on daily basis and so is the daunting task to manage it. The present solution to this problem i.e our present databases, are not the long-term solutions. These data volumes need to be stored safely and retrieved safely to use. This paper presents an overview of security issues for big data. Big Data encompasses data configuration, distribution and analysis of the data that overcome the drawbacks of traditional data processing technology. Big data manages, stores and acquires data in a speedy and cost-effective manner with the help of tools, technologies and frameworks.
Tele-radiology is a technology that helps in bringing the communication between the radiologist, patients and healthcare units situated at distant places. This involves exchange of medical centric data. The medical data may be stored as Electronic Health Records (EHR). These EHRs contain X-Rays, CT scans, MRI reports. Hundreds of scans across multiple radiology centers lead to medical big data (MBD). Healthcare Cloud can be used to handle MBD. Since lack of security to EHRs can cause havoc in medical IT, healthcare cloud must be secure. It should ensure secure sharing and storage of EHRs. This paper proposes the application of decoy technique to provide security to EHRs. The EHRs have the risk of internal attacks and external intrusion. This work addresses and handles internal attacks. It also involves study on honey-pots and intrusion detection techniques. Further it identifies the possibility of an intrusion and alerts the administrator. Also the details of intrusions are logged.
In the era of mass agriculture to keep up with the increasing demand for food production, advanced monitoring systems are required in order to handle several challenges such as perishable products, food waste, unpredictable supply variations and stringent food safety and sustainability requirements. The evolution of Internet of Things have provided means for collecting, processing, and communicating data associated with agricultural processes. This have opened several opportunities to sustain, improve productivity and reduce waste in every step in the food supply chain system. On the hand, this resulted in several new challenges, such as, the security of the data, recording and representation of data, providing real time control, reliability of the system, and dealing with big data. This paper proposes an architecture for security of big data in the agricultural supply chain management system. This can help in reducing food waste, increasing the reliability of the supply chain, and enhance the performance of the food supply chain system.
With the wide use of smart device made huge amount of information arise. This information needed new methods to deal with it from that perspective big data concept arise. Most of the concerns on big data are given to handle data without concentrating on its security. Encryption is the best use to keep data safe from malicious users. However, ordinary encryption methods are not suitable for big data. Selective encryption is an encryption method that encrypts only the important part of the message. However, we deal with uncertainty to evaluate the important part of the message. The problem arises when the important part is not encrypted. This is the motivation of the paper. In this paper we propose security framework to secure important and unimportant portion of the message to overcome the uncertainty. However, each will take a different encryption technique for better performance without losing security. The framework selects the important parts of the message to be encrypted with a strong algorithm and the weak part with a medium algorithm. The important of the word is defined according to how its origin frequently appears. This framework is applied on amazon EC2 (elastic compute cloud). A comparison between the proposed framework, the full encryption method and Toss-A-Coin method are performed according to encryption time and throughput. The results showed that the proposed method gives better performance according to encryption time, throughput than full encryption.
Malicious traffic has garnered more attention in recent years, owing to the rapid growth of information technology in today's world. In 2007 alone, an estimated loss of 13 billion dollars was made from malware attacks. Malware data in today's context is massive. To understand such information using primitive methods would be a tedious task. In this publication we demonstrate some of the most advanced deep learning techniques available, multilayer perceptron (MLP) and J48 (also known as C4.5 or ID3) on our selected dataset, Advanced Security Network Metrics & Non-Payload-Based Obfuscations (ASNM-NPBO) to show that the answer to managing cyber security threats lie in the fore-mentioned methodologies.
As an information hinge of various trades and professions in the era of big data, cloud data center bears the responsibility to provide uninterrupted service. To cope with the impact of failure and interruption during the operation on the Quality of Service (QoS), it is important to guarantee the resilience of cloud data center. Thus, different resilience actions are conducted in its life circle, that is, resilience strategy. In order to measure the effect of resilience strategy on the system resilience, this paper propose a new approach to model and evaluate the resilience strategy for cloud data center focusing on its core part of service providing-IT architecture. A comprehensive resilience metric based on resilience loss is put forward considering the characteristic of cloud data center. Furthermore, mapping model between system resilience and resilience strategy is built up. Then, based on a hierarchical colored generalized stochastic petri net (HCGSPN) model depicting the procedure of the system processing the service requests, simulation is conducted to evaluate the resilience strategy through the metric calculation. With a case study of a company's cloud data center, the applicability and correctness of the approach is demonstrated.
Machine-to-Machine (M2M) networks being connected to the internet at large, inherit all the cyber-vulnerabilities of the standard Information Technology (IT) systems. Since perfect cyber-security and robustness is an idealistic construct, it is worthwhile to design intrusion detection schemes to quickly detect and mitigate the harmful consequences of cyber-attacks. Volumetric anomaly detection have been popularized due to their low-complexity, but they cannot detect low-volume sophisticated attacks and also suffer from high false-alarm rate. To overcome these limitations, feature-based detection schemes have been studied for IT networks. However these schemes cannot be easily adapted to M2M systems due to the fundamental architectural and functional differences between the M2M and IT systems. In this paper, we propose novel feature-based detection schemes for a general M2M uplink to detect Distributed Denial-of-Service (DDoS) attacks, emergency scenarios and terminal device failures. The detection for DDoS attack and emergency scenarios involves building up a database of legitimate M2M connections during a training phase and then flagging the new M2M connections as anomalies during the evaluation phase. To distinguish between DDoS attack and emergency scenarios that yield similar signatures for anomaly detection schemes, we propose a modified Canberra distance metric. It basically measures the similarity or differences in the characteristics of inter-arrival time epochs for any two anomalous streams. We detect device failures by inspecting for the decrease in active M2M connections over a reasonably large time interval. Lastly using Monte-Carlo simulations, we show that the proposed anomaly detection schemes have high detection performance and low-false alarm rate.
To ensure reliable and predictable service in the electrical grid it is important to gauge the level of trust present within critical components and substations. Although trust throughout a smart grid is temporal and dynamically varies according to measured states, it is possible to accurately formulate communications and service level strategies based on such trust measurements. Utilizing an effective set of machine learning and statistical methods, it is shown that establishment of trust levels between substations using behavioral pattern analysis is possible. It is also shown that the establishment of such trust can facilitate simple secure communications routing between substations.
We present an effective machine learning method for malicious activity detection in enterprise security logs. Our method involves feature engineering, or generating new features by applying operators on features of the raw data. We generate DNF formulas from raw features, extract Boolean functions from them, and leverage Fourier analysis to generate new parity features and rank them based on their highest Fourier coefficients. We demonstrate on real enterprise data sets that the engineered features enhance the performance of a wide range of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall, while sacrificing no more than 0.47% in accuracy. We also observe better isolation of malicious clusters, when performing clustering on engineered features. In general, a small number of engineered features achieve higher performance than raw data features according to our metrics of interest. Our feature engineering method also retains interpretability, an important consideration in cyber security applications.
Moving Target Defence (MTD) has been recently proposed and is an emerging proactive approach which provides an asynchronous defensive strategies. Unlike traditional security solutions that focused on removing vulnerabilities, MTD makes a system dynamic and unpredictable by continuously changing attack surface to confuse attackers. MTD can be utilized in cloud computing to address the cloud's security-related problems. There are many literature proposing MTD methods in various contexts, but it still lacks approaches to evaluate the effectiveness of proposed MTD method. In this paper, we proposed a combination of Shuffle and Diversity MTD techniques and investigate on the effects of deploying these techniques from two perspectives lying on two groups of security metrics (i) system risk: which is the cloud providers' perspective and (ii) attack cost and return on attack: which are attacker's point of view. Moreover, we utilize a scalable Graphical Security Model (GSM) to enhance the security analysis complexity. Finally, we show that combining MTD techniques can improve both aforementioned two groups of security metrics while individual technique cannot.
Image retrieval systems have been an active area of research for more than thirty years progressively producing improved algorithms that improve performance metrics, operate in different domains, take advantage of different features extracted from the images to be retrieved, and have different desirable invariance properties. With the ever-growing visual databases of images and videos produced by a myriad of devices comes the challenge of selecting effective features and performing fast retrieval on such databases. In this paper, we incorporate Fourier descriptors (FD) along with a metric-based balanced indexing tree as a viable solution to DHS (Department of Homeland Security) needs to for quick identification and retrieval of weapon images. The FDs allow a simple but effective outline feature representation of an object, while the M-tree provide a dynamic, fast, and balanced search over such features. Motivated by looking for applications of interest to DHS, we have created a basic guns and rifles databases that can be used to identify weapons in images and videos extracted from media sources. Our simulations show excellent performance in both representation and fast retrieval speed.
Technological developments in the energy sector while offering new business insights, also produces complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing where the big data techniques and technologies are used in which areas of smart grid systems, the big data technologies used to detect attacks on smart grids have been focused on. Big data analytics produces efficient solutions, but it is more critical to choose which algorithm and metric. For this reason, an application prototype has been proposed using big data approaches to detect attacks on smart grids. The algorithms with high accuracy were determined as 92% with Random Forest and 87% with Decision Tree.
Web application technologies are growing rapidly with continuous innovation and improvements. This paper focuses on the popular Spring Boot [1] java-based framework for building web and enterprise applications and how it provides the flexibility for service-oriented architecture (SOA). One challenge with any Spring-based applications is its level of complexity with configurations. Spring Boot makes it easy to create and deploy stand-alone, production-grade Spring applications with very little Spring configuration. Example, if we consider Spring Model-View-Controller (MVC) framework [2], we need to configure dispatcher servlet, web jars, a view resolver, and component scan among other things. To solve this, Spring Boot provides several Auto Configuration options to setup the application with any needed dependencies. Another challenge is to identify the framework dependencies and associated library versions required to develop a web application. Spring Boot offers simpler dependency management by using a comprehensive, but flexible, framework and the associated libraries in one single dependency, which provides all the Spring related technology that you need for starter projects as compared to CRUD web applications. This framework provides a range of additional features that are common across many projects such as embedded server, security, metrics, health checks, and externalized configuration. Web applications are generally packaged as war and deployed to a web server, but Spring Boot application can be packaged either as war or jar file, which allows to run the application without the need to install and/or configure on the application server. In this paper, we discuss how Atmospheric Radiation Measurement (ARM) Data Center (ADC) at Oak Ridge National Laboratory, is using Spring Boot to create a SOA based REST [4] service API, that bridges the gap between frontend user interfaces and backend database. Using this REST service API, ARM scientists are now able to submit reports via a user form or a command line interface, which captures the same data quality or other important information about ARM data.