Biblio
A lot of organizations need effective resolutions to record and evaluate the existing enormous volume of information. Cloud computing as a facilitator offers scalable resources and noteworthy economic assistances as the decreased operational expenditures. This model increases a wide set of security and privacy problems that have to be taken into reflexion. Multi-occupancy, loss of control, and confidence are the key issues in cloud computing situations. This paper considers the present know-hows and a comprehensive assortment of both previous and high-tech tasks on cloud security and confidentiality. The paradigm shift that supplements the usage of cloud computing is progressively enabling augmentation to safety and privacy contemplations linked with the different facades of cloud computing like multi-tenancy, reliance, loss of control and responsibility. So, cloud platforms that deal with big data that have sensitive information are necessary to use technical methods and structural precautions to circumvent data defence failures that might lead to vast and costly harms.
Proper evaluation of classifier predictive models requires the selection of appropriate metrics to gauge the effectiveness of a model's performance. The Area Under the Receiver Operating Characteristic Curve (AUC) has become the de facto standard metric for evaluating this classifier performance. However, recent studies have suggested that AUC is not necessarily the best metric for all types of datasets, especially those in which there exists a high or severe level of class imbalance. There is a need to assess which specific metrics are most beneficial to evaluate the performance of highly imbalanced big data. In this work, we evaluate the performance of eight machine learning techniques on a severely imbalanced big dataset pertaining to the cyber security domain. We analyze the behavior of six different metrics to determine which provides the best representation of a model's predictive performance. We also evaluate the impact that adjusting the classification threshold has on our metrics. Our results find that the C4.5N decision tree is the optimal learner when evaluating all presented metrics for severely imbalanced Slow HTTP DoS attack data. Based on our results, we propose that the use of AUC alone as a primary metric for evaluating highly imbalanced big data may be ineffective, and the evaluation of metrics such as F-measure and Geometric mean can offer substantial insight into the true performance of a given model.
Increasingly organizations are collecting ever larger amounts of data to build complex data analytics, machine learning and AI models. Furthermore, the data needed for building such models may be unstructured (e.g., text, image, and video). Hence such data may be stored in different data management systems ranging from relational databases to newer NoSQL databases tailored for storing unstructured data. Furthermore, data scientists are increasingly using programming languages such as Python, R etc. to process data using many existing libraries. In some cases, the developed code will be automatically executed by the NoSQL system on the stored data. These developments indicate the need for a data security and privacy solution that can uniformly protect data stored in many different data management systems and enforce security policies even if sensitive data is processed using a data scientist submitted complex program. In this paper, we introduce our vision for building such a solution for protecting big data. Specifically, our proposed system system allows organizations to 1) enforce policies that control access to sensitive data, 2) keep necessary audit logs automatically for data governance and regulatory compliance, 3) sanitize and redact sensitive data on-the-fly based on the data sensitivity and AI model needs, 4) detect potentially unauthorized or anomalous access to sensitive data, 5) automatically create attribute-based access control policies based on data sensitivity and data type.
With the development of the Internet, the network attack technology has undergone tremendous changes. The forms of network attack and defense have also changed, which are features in attacks are becoming more diverse, attacks are more widespread and traditional security protection methods are invalid. In recent years, with the development of software defined security, network anomaly detection technology and big data technology, these challenges have been effectively addressed. This paper proposes a data-driven software defined security architecture with core features including data-driven orchestration engine, scalable network anomaly detection module and security data platform. Based on the construction of the analysis layer in the security data platform, real-time online detection of network data can be realized by integrating network anomaly detection module and security data platform under software defined security architecture. Then, data-driven security business orchestration can be realized to achieve efficient, real-time and dynamic response to detected anomalies. Meanwhile, this paper designs a deep learning-based HTTP anomaly detection algorithm module and integrates it with data-driven software defined security architecture so that demonstrating the flow of the whole system.
In the era of mass agriculture to keep up with the increasing demand for food production, advanced monitoring systems are required in order to handle several challenges such as perishable products, food waste, unpredictable supply variations and stringent food safety and sustainability requirements. The evolution of Internet of Things have provided means for collecting, processing, and communicating data associated with agricultural processes. This have opened several opportunities to sustain, improve productivity and reduce waste in every step in the food supply chain system. On the hand, this resulted in several new challenges, such as, the security of the data, recording and representation of data, providing real time control, reliability of the system, and dealing with big data. This paper proposes an architecture for security of big data in the agricultural supply chain management system. This can help in reducing food waste, increasing the reliability of the supply chain, and enhance the performance of the food supply chain system.
Searchable encryption will become more important as medical services intensify their use of big data and artificial intelligence. To use searchable encryption safely, the resistance of terminals with embedded searchable encryption to illegal attacks (tamper resistance) is extremely important. This study proposes a searchable encryption system embedded in terminals and evaluate the tamper resistance of the proposed system. This study also proposes attack scenarios and quantitatively evaluates the tamper resistance of the proposed system by performing experiments following the proposed attack scenarios.
Big data which is collected by IoT devices is utilized in various businesses. For security and privacy, some data must be encrypted. IoT devices for encryption require not only to tamper resistance but also low latency and low power. PRINCE is one of the lowest latency cryptography. A glitch canceller reduces power consumption, although it affects tamper resistance. Therefore, this study evaluates the tamper resistance of dedicated hardware with glitch canceller for PRINCE by statistical power analysis and T-test. The evaluation experiments in this study performed on field-programmable gate array (FPGA), and the results revealed the vulnerability of dedicated hardware implementation with glitch canceller.
Big data processing systems are becoming increasingly more present in cloud workloads. Consequently, they are starting to incorporate more sophisticated mechanisms from traditional database and distributed systems. We focus in this work on the use of caching policies, which for big data raise important new challenges. Not only they must respond to new variants of the trade-off between hit rate, response time, and the space consumed by the cache, but they must do so at possibly higher volume and velocity than web and database workloads. Previous caching policies have not been tested experimentally with big data workloads. We address these challenges in this work. We propose the Read Density family of policies, which is a principled approach to quantify the utility of cached objects through a family of utility functions that depend on the frequency of reads of an object. We further design the Approximate Histogram, which is a policy-based technique based on an array of counters. This technique promises to achieve runtime-space efficient computation of the metric required by the cache policy. We evaluate through trace-based simulation the caching policies from the Read Density family, and compare them with over ten state-of-the-art alternatives. We use two workload traces representative for big data processing, collected from commercial Spark and MapReduce deployments. While we achieve comparable performance to the state-of-art with less parameters, meaningful performance improvement for big data workloads remain elusive.
Before accessing Internet websites or applications, network users first ask the Domain Name System (DNS) for the corresponding IP address, and then the user's browser or application accesses the required resources through the IP address. The server log of DNS keeps records of all users' requesting queries. This paper analyzes the user network accessing behavior by analyzing network DNS log in campus, constructing a behavior fingerprint model for each user. Different users and even same user's fingerprints in different periods can be used to determine whether the user's access is abnormal or safe, whether it is infected with malicious code. After detecting the behavior of abnormal user accessing, preventing the spread of viruses, Trojans, bots and attacks is made possible, which further assists the protection of users' network access security through corresponding techniques. Finally, analysis of user behavior fingerprints of campus network access is conducted.
In the network security risk assessment on critical information infrastructure of smart city, to describe attack vectors for predicting possible initial access is a challenging task. In this paper, an attack vector evaluation model based on weakness, path and action is proposed, and the formal representation and quantitative evaluation method are given. This method can support the assessment of attack vectors based on known and unknown weakness through combination of depend conditions. In addition, defense factors are also introduced, an attack vector evaluation model of integrated defense is proposed, and an application example of the model is given. The research work in this paper can provide a reference for the vulnerability assessment of attack vector.
In enterprise environments, the amount of managed assets and vulnerabilities that can be exploited is staggering. Hackers' lateral movements between such assets generate a complex big data graph, that contains potential hacking paths. In this vision paper, we enumerate risk-reduction security requirements in large scale environments, then present the Agile Security methodology and technologies for detection, modeling, and constant prioritization of security requirements, agile style. Agile Security models different types of security requirements into the context of an attack graph, containing business process targets and critical assets identification, configuration items, and possible impacts of cyber-attacks. By simulating and analyzing virtual adversary attack paths toward cardinal assets, Agile Security examines the business impact on business processes and prioritizes surgical requirements. Thus, handling these requirements backlog that are constantly evaluated as an outcome of employing Agile Security, gradually increases system hardening, reduces business risks and informs the IT service desk or Security Operation Center what remediation action to perform next. Once remediated, Agile Security constantly recomputes residual risk, assessing risk increase by threat intelligence or infrastructure changes versus defender's remediation actions in order to drive overall attack surface reduction.
The technological development of the energy sector also produced complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing which areas of the smart grid system use big data technologies and technologies, big data technologies for detecting smart grid attacks have received attention. Big data analytics can produce efficient solutions and it is especially important to choose which algorithms and metrics to use. For this reason, an application prototype has been proposed that uses a big data method to detect attacks on the smart grid. The algorithm with high accuracy was determined to be 92% for random forests and 87% for decision trees.
Currently, organisations find it difficult to design a Decision Support System (DSS) that can predict various operational risks, such as financial and quality issues, with operational risks responsible for significant economic losses and damage to an organisation's reputation in the market. This paper proposes a new DSS for risk assessment, called the Fuzzy Inference DSS (FIDSS) mechanism, which uses fuzzy inference methods based on an organisation's big data collection. It includes the Emerging Association Patterns (EAP) technique that identifies the important features of each risk event. Then, the Mamdani fuzzy inference technique and several membership functions are evaluated using the firm's data sources. The FIDSS mechanism can enhance an organisation's decision-making processes by quantifying the severity of a risk as low, medium or high. When it automatically predicts a medium or high level, it assists organisations in taking further actions that reduce this severity level.