Biblio
As an information hinge of various trades and professions in the era of big data, cloud data center bears the responsibility to provide uninterrupted service. To cope with the impact of failure and interruption during the operation on the Quality of Service (QoS), it is important to guarantee the resilience of cloud data center. Thus, different resilience actions are conducted in its life circle, that is, resilience strategy. In order to measure the effect of resilience strategy on the system resilience, this paper propose a new approach to model and evaluate the resilience strategy for cloud data center focusing on its core part of service providing-IT architecture. A comprehensive resilience metric based on resilience loss is put forward considering the characteristic of cloud data center. Furthermore, mapping model between system resilience and resilience strategy is built up. Then, based on a hierarchical colored generalized stochastic petri net (HCGSPN) model depicting the procedure of the system processing the service requests, simulation is conducted to evaluate the resilience strategy through the metric calculation. With a case study of a company's cloud data center, the applicability and correctness of the approach is demonstrated.
Image retrieval systems have been an active area of research for more than thirty years progressively producing improved algorithms that improve performance metrics, operate in different domains, take advantage of different features extracted from the images to be retrieved, and have different desirable invariance properties. With the ever-growing visual databases of images and videos produced by a myriad of devices comes the challenge of selecting effective features and performing fast retrieval on such databases. In this paper, we incorporate Fourier descriptors (FD) along with a metric-based balanced indexing tree as a viable solution to DHS (Department of Homeland Security) needs to for quick identification and retrieval of weapon images. The FDs allow a simple but effective outline feature representation of an object, while the M-tree provide a dynamic, fast, and balanced search over such features. Motivated by looking for applications of interest to DHS, we have created a basic guns and rifles databases that can be used to identify weapons in images and videos extracted from media sources. Our simulations show excellent performance in both representation and fast retrieval speed.
Technological developments in the energy sector while offering new business insights, also produces complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing where the big data techniques and technologies are used in which areas of smart grid systems, the big data technologies used to detect attacks on smart grids have been focused on. Big data analytics produces efficient solutions, but it is more critical to choose which algorithm and metric. For this reason, an application prototype has been proposed using big data approaches to detect attacks on smart grids. The algorithms with high accuracy were determined as 92% with Random Forest and 87% with Decision Tree.
Web application technologies are growing rapidly with continuous innovation and improvements. This paper focuses on the popular Spring Boot [1] java-based framework for building web and enterprise applications and how it provides the flexibility for service-oriented architecture (SOA). One challenge with any Spring-based applications is its level of complexity with configurations. Spring Boot makes it easy to create and deploy stand-alone, production-grade Spring applications with very little Spring configuration. Example, if we consider Spring Model-View-Controller (MVC) framework [2], we need to configure dispatcher servlet, web jars, a view resolver, and component scan among other things. To solve this, Spring Boot provides several Auto Configuration options to setup the application with any needed dependencies. Another challenge is to identify the framework dependencies and associated library versions required to develop a web application. Spring Boot offers simpler dependency management by using a comprehensive, but flexible, framework and the associated libraries in one single dependency, which provides all the Spring related technology that you need for starter projects as compared to CRUD web applications. This framework provides a range of additional features that are common across many projects such as embedded server, security, metrics, health checks, and externalized configuration. Web applications are generally packaged as war and deployed to a web server, but Spring Boot application can be packaged either as war or jar file, which allows to run the application without the need to install and/or configure on the application server. In this paper, we discuss how Atmospheric Radiation Measurement (ARM) Data Center (ADC) at Oak Ridge National Laboratory, is using Spring Boot to create a SOA based REST [4] service API, that bridges the gap between frontend user interfaces and backend database. Using this REST service API, ARM scientists are now able to submit reports via a user form or a command line interface, which captures the same data quality or other important information about ARM data.
The Agave Platform first appeared in 2011 as a pilot project for the iPlant Collaborative [11]. In its first two years, Foundation saw over 40% growth per month, supporting 1000+ clients, 600+ applications, 4 HPC systems at 3 centers across the US. It also gained users outside of plant biology. To better serve the needs of the general open science community, we rewrote Foundation as a scalable, cloud native application and named it the Agave Platform. In this paper we present the Agave Platform, a Science-as-a-Service (ScaaS) platform for reproducible science. We provide a brief history and technical overview of the project, and highlight three case studies leveraging the platform to create synergistic value for their users.
With the rapid development of computer science, Internet and information technology, the application scale of network is expanding constantly, and the data volume is increasing day by day. Therefore, the demand for data processing needs to be improved urgently, and Cloud computing and big data technology as the product of the development of computer networks came into being. However, the following data collection, storage, and the security and privacy issues in the process of use are faced with many risks. How to protect the security and privacy of cloud data has become one of the urgent problems to be solved. Aiming at the problem of security and privacy of data in cloud computing environment, the security of the data is ensured from two aspects: the storage scheme and the encryption mode of the cloud data.
Numerous authorization models have been proposed for relational databases. On the other hand, several NoSQL databases used in Big Data applications use a new model appropriate to their requirements for structure, speed, and large amount of data. This model protects each individual cell in key-value databases by labeling them with authorization rights following a Role-Based Access Control model or similar. We present here a pattern to describe this model as it exists in several Big Data systems.
The alarming rate of big data usage in the cloud makes data exposed easily. Cloud which consists of many servers linked to each other is used for data storage. Having owned by third parties, the security of the cloud needs to be looked at. Risks of storing data in cloud need to be checked further on the severity level. There should be a way to access the risks. Thus, the objective of this paper is to use SLR so that we can have extensive background of literatures on risk assessment for big data in cloud computing environment from the perspective of security, privacy and trust.
This paper presents a review on how to benefit from software-defined networking (SDN) to enhance smart grid security. For this purpose, the attacks threatening traditional smart grid systems are classified according to availability, integrity, and confidentiality, which are the main cyber-security objectives. The traditional smart grid architecture is redefined with SDN and a conceptual model for SDN-based smart grid systems is proposed. SDN based solutions to the mentioned security threats are also classified and evaluated. Our conclusions suggest that SDN helps to improve smart grid security by providing real-time monitoring, programmability, wide-area security management, fast recovery from failures, distributed security and smart decision making based on big data analytics.
Data security has become a very serious parf of any organizational information system. More and more threats across the Internet has evolved and capable to deceive firewall as well as antivirus software. In addition, the number of attacks become larger and become more dificult to be processed by the firewall or antivirus software. To improve the security of the system is usually done by adding Intrusion Detection System(IDS), which divided into anomaly-based detection and signature-based detection. In this research to process a huge amount of data, Big Data technique is used. Anomaly-based detection is proposed using Learning Vector Quantization Algorithm to detect the attacks. Learning Vector Quantization is a neural network technique that learn the input itself and then give the appropriate output according to the input. Modifications were made to improve test accuracy by varying the test parameters that present in LVQ. Varying the learning rate, epoch and k-fold cross validation resulted in a more efficient output. The output is obtained by calculating the value of information retrieval from the confusion matrix table from each attack classes. Principal Component Analysis technique is used along with Learning Vector Quantization to improve system performance by reducing the data dimensionality. By using 18-Principal Component, dataset successfully reduced by 47.3%, with the best Recognition Rate of 96.52% and time efficiency improvement up to 43.16%.
Recently, cloud computing is an emerging technology along with big data. Both technologies come together. Due to the enormous size of data in big data, it is impossible to store them in local storage. Alternatively, even we want to store them locally, we have to spend much money to create bit data center. One way to save money is store big data in cloud storage service. Cloud storage service provides users space and security to store the file. However, relying on single cloud storage may cause trouble for the customer. CSP may stop its service anytime. It is too risky if data owner hosts his file only single CSP. Also, the CSP is the third party that user have to trust without verification. After deploying his file to CSP, the user does not know who access his file. Even CSP provides a security mechanism to prevent outsider attack. However, how user ensure that there is no insider attack to steal or corrupt the file. This research proposes the way to minimize the risk, ensure data privacy, also accessing control. The big data file is split into chunks and distributed to multiple cloud storage provider. Even there is insider attack; the attacker gets only part of the file. He cannot reconstruct the whole file. After splitting the file, metadata is generated. Metadata is a place to keep chunk information, includes, chunk locations, access path, username and password of data owner to connect each CSP. Asymmetric security concept is applied to this research. The metadata will be encrypted and transfer to the user who requests to access the file. The file accessing, monitoring, metadata transferring is functions of dew computing which is an intermediate server between the users and cloud service.
The analysis of security-related event logs is an important step for the investigation of cyber-attacks. It allows tracing malicious activities and lets a security operator find out what has happened. However, since IT landscapes are growing in size and diversity, the amount of events and their highly different representations are becoming a Big Data challenge. Unfortunately, current solutions for the analysis of security-related events, so called Security Information and Event Management (SIEM) systems, are not able to keep up with the load. In this work, we propose a distributed SIEM platform that makes use of highly efficient distributed normalization and persists event data into an in-memory database. We implement the normalization on common distribution frameworks, i.e. Spark, Storm, Trident and Heron, and compare their performance with our custom-built distribution solution. Additionally, different tuning options are introduced and their speed advantage is presented. In the end, we show how the writing into an in-memory database can be tuned to achieve optimal persistence speed. Using the proposed approach, we are able to not only fully normalize, but also persist more than 20 billion events per day with relatively small client hardware. Therefore, we are confident that our approach can handle the load of events in even very large IT landscapes.
The need for security1 continues to grow in distributed computing. Today's security solutions require greater scalability and convenience in cloud-computing architectures, in addition to the ability to store and process larger volumes of data to address very sophisticated attacks. This paper explores some of the existing architectures for big data intelligence analytics, and proposes an architecture that promises to provide greater security for data intensive environments. The architecture is designed to leverage the wealth in the multi-source information for security intelligence.
Attack graph approach is a common tool for the analysis of network security. However, analysis of attack graphs could be complicated and difficult depending on the attack graph size. This paper presents an approximate analysis approach for attack graphs based on Q-learning. First, we employ multi-host multi-stage vulnerability analysis (MulVAL) to generate an attack graph for a given network topology. Then we refine the attack graph and generate a simplified graph called a transition graph. Next, we use a Q-learning model to find possible attack routes that an attacker could use to compromise the security of the network. Finally, we evaluate the approach by applying it to a typical IT network scenario with specific services, network configurations, and vulnerabilities.
Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cybersecurity threats and attacks by utilizing data mining techniques in the field of Artificial Intelligence. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection utilizing machine learning algorithms for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.