Visible to the public Biblio

Found 478 results

Filters: Keyword is Big Data  [Clear All Filters]
2019-03-22
Liu, Y., Li, X., Xiao, L..  2018.  Service Oriented Resilience Strategy for Cloud Data Center. 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). :269-274.

As an information hinge of various trades and professions in the era of big data, cloud data center bears the responsibility to provide uninterrupted service. To cope with the impact of failure and interruption during the operation on the Quality of Service (QoS), it is important to guarantee the resilience of cloud data center. Thus, different resilience actions are conducted in its life circle, that is, resilience strategy. In order to measure the effect of resilience strategy on the system resilience, this paper propose a new approach to model and evaluate the resilience strategy for cloud data center focusing on its core part of service providing-IT architecture. A comprehensive resilience metric based on resilience loss is put forward considering the characteristic of cloud data center. Furthermore, mapping model between system resilience and resilience strategy is built up. Then, based on a hierarchical colored generalized stochastic petri net (HCGSPN) model depicting the procedure of the system processing the service requests, simulation is conducted to evaluate the resilience strategy through the metric calculation. With a case study of a company's cloud data center, the applicability and correctness of the approach is demonstrated.

Quweider, M., Lei, H., Zhang, L., Khan, F..  2018.  Managing Big Data in Visual Retrieval Systems for DHS Applications: Combining Fourier Descriptors and Metric Space Indexing. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :188-193.

Image retrieval systems have been an active area of research for more than thirty years progressively producing improved algorithms that improve performance metrics, operate in different domains, take advantage of different features extracted from the images to be retrieved, and have different desirable invariance properties. With the ever-growing visual databases of images and videos produced by a myriad of devices comes the challenge of selecting effective features and performing fast retrieval on such databases. In this paper, we incorporate Fourier descriptors (FD) along with a metric-based balanced indexing tree as a viable solution to DHS (Department of Homeland Security) needs to for quick identification and retrieval of weapon images. The FDs allow a simple but effective outline feature representation of an object, while the M-tree provide a dynamic, fast, and balanced search over such features. Motivated by looking for applications of interest to DHS, we have created a basic guns and rifles databases that can be used to identify weapons in images and videos extracted from media sources. Our simulations show excellent performance in both representation and fast retrieval speed.

Terzi, D. S., Arslan, B., Sagiroglu, S..  2018.  Smart Grid Security Evaluation with a Big Data Use Case. 2018 IEEE 12th International Conference on Compatibility, Power Electronics and Power Engineering (CPE-POWERENG 2018). :1-6.

Technological developments in the energy sector while offering new business insights, also produces complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing where the big data techniques and technologies are used in which areas of smart grid systems, the big data technologies used to detect attacks on smart grids have been focused on. Big data analytics produces efficient solutions, but it is more critical to choose which algorithm and metric. For this reason, an application prototype has been proposed using big data approaches to detect attacks on smart grids. The algorithms with high accuracy were determined as 92% with Random Forest and 87% with Decision Tree.

Guntupally, K., Devarakonda, R., Kehoe, K..  2018.  Spring Boot Based REST API to Improve Data Quality Report Generation for Big Scientific Data: ARM Data Center Example. 2018 IEEE International Conference on Big Data (Big Data). :5328-5329.

Web application technologies are growing rapidly with continuous innovation and improvements. This paper focuses on the popular Spring Boot [1] java-based framework for building web and enterprise applications and how it provides the flexibility for service-oriented architecture (SOA). One challenge with any Spring-based applications is its level of complexity with configurations. Spring Boot makes it easy to create and deploy stand-alone, production-grade Spring applications with very little Spring configuration. Example, if we consider Spring Model-View-Controller (MVC) framework [2], we need to configure dispatcher servlet, web jars, a view resolver, and component scan among other things. To solve this, Spring Boot provides several Auto Configuration options to setup the application with any needed dependencies. Another challenge is to identify the framework dependencies and associated library versions required to develop a web application. Spring Boot offers simpler dependency management by using a comprehensive, but flexible, framework and the associated libraries in one single dependency, which provides all the Spring related technology that you need for starter projects as compared to CRUD web applications. This framework provides a range of additional features that are common across many projects such as embedded server, security, metrics, health checks, and externalized configuration. Web applications are generally packaged as war and deployed to a web server, but Spring Boot application can be packaged either as war or jar file, which allows to run the application without the need to install and/or configure on the application server. In this paper, we discuss how Atmospheric Radiation Measurement (ARM) Data Center (ADC) at Oak Ridge National Laboratory, is using Spring Boot to create a SOA based REST [4] service API, that bridges the gap between frontend user interfaces and backend database. Using this REST service API, ARM scientists are now able to submit reports via a user form or a command line interface, which captures the same data quality or other important information about ARM data.

Dooley, Rion, Brandt, Steven R., Fonner, John.  2018.  The Agave Platform: An Open, Science-as-a-Service Platform for Digital Science. Proceedings of the Practice and Experience on Advanced Research Computing. :28:1-28:8.

The Agave Platform first appeared in 2011 as a pilot project for the iPlant Collaborative [11]. In its first two years, Foundation saw over 40% growth per month, supporting 1000+ clients, 600+ applications, 4 HPC systems at 3 centers across the US. It also gained users outside of plant biology. To better serve the needs of the general open science community, we rewrote Foundation as a scalable, cloud native application and named it the Agave Platform. In this paper we present the Agave Platform, a Science-as-a-Service (ScaaS) platform for reproducible science. We provide a brief history and technical overview of the project, and highlight three case studies leveraging the platform to create synergistic value for their users.

Maohong, Zhang, Aihua, Yang, Hui, Liu.  2018.  Research on Security and Privacy of Big Data Under Cloud Computing Environment. Proceedings of the 2Nd International Conference on Big Data Research. :52-55.

With the rapid development of computer science, Internet and information technology, the application scale of network is expanding constantly, and the data volume is increasing day by day. Therefore, the demand for data processing needs to be improved urgently, and Cloud computing and big data technology as the product of the development of computer networks came into being. However, the following data collection, storage, and the security and privacy issues in the process of use are faced with many risks. How to protect the security and privacy of cloud data has become one of the urgent problems to be solved. Aiming at the problem of security and privacy of data in cloud computing environment, the security of the data is ensured from two aspects: the storage scheme and the encryption mode of the cloud data.

Moreno, Julio, Fernandez, Eduardo B., Fernandez-Medina, Eduardo, Serrano, Manuel A..  2018.  A Security Pattern for Key-Value NoSQL Database Authorization. Proceedings of the 23rd European Conference on Pattern Languages of Programs. :12:1-12:4.

Numerous authorization models have been proposed for relational databases. On the other hand, several NoSQL databases used in Big Data applications use a new model appropriate to their requirements for structure, speed, and large amount of data. This model protects each individual cell in key-value databases by labeling them with authorization rights following a Role-Based Access Control model or similar. We present here a pattern to describe this model as it exists in several Big Data systems.

bt Yusof Ali, Hazirah Bee, bt Abdullah, Lili Marziana, Kartiwi, Mira, Nordin, Azlin.  2018.  Risk Assessment for Big Data in Cloud: Security, Privacy and Trust. Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference. :63-67.

The alarming rate of big data usage in the cloud makes data exposed easily. Cloud which consists of many servers linked to each other is used for data storage. Having owned by third parties, the security of the cloud needs to be looked at. Risks of storing data in cloud need to be checked further on the severity level. There should be a way to access the risks. Thus, the objective of this paper is to use SLR so that we can have extensive background of literatures on risk assessment for big data in cloud computing environment from the perspective of security, privacy and trust.

2019-03-18
Demirci, S., Sagiroglu, S..  2018.  Software-Defined Networking for Improving Security in Smart Grid Systems. 2018 7th International Conference on Renewable Energy Research and Applications (ICRERA). :1021–1026.

This paper presents a review on how to benefit from software-defined networking (SDN) to enhance smart grid security. For this purpose, the attacks threatening traditional smart grid systems are classified according to availability, integrity, and confidentiality, which are the main cyber-security objectives. The traditional smart grid architecture is redefined with SDN and a conceptual model for SDN-based smart grid systems is proposed. SDN based solutions to the mentioned security threats are also classified and evaluated. Our conclusions suggest that SDN helps to improve smart grid security by providing real-time monitoring, programmability, wide-area security management, fast recovery from failures, distributed security and smart decision making based on big data analytics.

2019-03-15
Salman, Muhammad, Husna, Diyanatul, Apriliani, Stella Gabriella, Pinem, Josua Geovani.  2018.  Anomaly Based Detection Analysis for Intrusion Detection System Using Big Data Technique with Learning Vector Quantization (LVQ) and Principal Component Analysis (PCA). Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality. :20-23.

Data security has become a very serious parf of any organizational information system. More and more threats across the Internet has evolved and capable to deceive firewall as well as antivirus software. In addition, the number of attacks become larger and become more dificult to be processed by the firewall or antivirus software. To improve the security of the system is usually done by adding Intrusion Detection System(IDS), which divided into anomaly-based detection and signature-based detection. In this research to process a huge amount of data, Big Data technique is used. Anomaly-based detection is proposed using Learning Vector Quantization Algorithm to detect the attacks. Learning Vector Quantization is a neural network technique that learn the input itself and then give the appropriate output according to the input. Modifications were made to improve test accuracy by varying the test parameters that present in LVQ. Varying the learning rate, epoch and k-fold cross validation resulted in a more efficient output. The output is obtained by calculating the value of information retrieval from the confusion matrix table from each attack classes. Principal Component Analysis technique is used along with Learning Vector Quantization to improve system performance by reducing the data dimensionality. By using 18-Principal Component, dataset successfully reduced by 47.3%, with the best Recognition Rate of 96.52% and time efficiency improvement up to 43.16%.

2019-03-06
Suwansrikham, P., She, K..  2018.  Asymmetric Secure Storage Scheme for Big Data on Multiple Cloud Providers. 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). :121-125.

Recently, cloud computing is an emerging technology along with big data. Both technologies come together. Due to the enormous size of data in big data, it is impossible to store them in local storage. Alternatively, even we want to store them locally, we have to spend much money to create bit data center. One way to save money is store big data in cloud storage service. Cloud storage service provides users space and security to store the file. However, relying on single cloud storage may cause trouble for the customer. CSP may stop its service anytime. It is too risky if data owner hosts his file only single CSP. Also, the CSP is the third party that user have to trust without verification. After deploying his file to CSP, the user does not know who access his file. Even CSP provides a security mechanism to prevent outsider attack. However, how user ensure that there is no insider attack to steal or corrupt the file. This research proposes the way to minimize the risk, ensure data privacy, also accessing control. The big data file is split into chunks and distributed to multiple cloud storage provider. Even there is insider attack; the attacker gets only part of the file. He cannot reconstruct the whole file. After splitting the file, metadata is generated. Metadata is a place to keep chunk information, includes, chunk locations, access path, username and password of data owner to connect each CSP. Asymmetric security concept is applied to this research. The metadata will be encrypted and transfer to the user who requests to access the file. The file accessing, monitoring, metadata transferring is functions of dew computing which is an intermediate server between the users and cloud service.

Mito, M., Murata, K., Eguchi, D., Mori, Y., Toyonaga, M..  2018.  A Data Reconstruction Method for The Big-Data Analysis. 2018 9th International Conference on Awareness Science and Technology (iCAST). :319-323.
In recent years, the big-data approach has become important within various business operations and sales judgment tactics. Contrarily, numerous privacy problems limit the progress of their analysis technologies. To mitigate such problems, this paper proposes several privacy-preserving methods, i.e., anonymization, extreme value record elimination, fully encrypted analysis, and so on. However, privacy-cracking fears still remain that prevent the open use of big-data by other, external organizations. We propose a big-data reconstruction method that does not intrinsically use privacy data. The method uses only the statistical features of big-data, i.e., its attribute histograms and their correlation coefficients. To verify whether valuable information can be extracted using this method, we evaluate the data by using Self Organizing Map (SOM) as one of the big-data analysis tools. The results show that the same pieces of information are extracted from our data and the big-data.
Leung, C. K., Hoi, C. S. H., Pazdor, A. G. M., Wodi, B. H., Cuzzocrea, A..  2018.  Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data. 2018 IEEE International Conference on Big Data (Big Data). :5101-5110.
As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data. In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present- in this paper-a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.
Cuzzocrea, A., Damiani, E..  2018.  Pedigree-Ing Your Big Data: Data-Driven Big Data Privacy in Distributed Environments. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). :675-681.
This paper introduces a general framework for supporting data-driven privacy-preserving big data management in distributed environments, such as emerging Cloud settings. The proposed framework can be viewed as an alternative to classical approaches where the privacy of big data is ensured via security-inspired protocols that check several (protocol) layers in order to achieve the desired privacy. Unfortunately, this injects considerable computational overheads in the overall process, thus introducing relevant challenges to be considered. Our approach instead tries to recognize the "pedigree" of suitable summary data representatives computed on top of the target big data repositories, hence avoiding computational overheads due to protocol checking. We also provide a relevant realization of the framework above, the so-called Data-dRIven aggregate-PROvenance privacypreserving big Multidimensional data (DRIPROM) framework, which specifically considers multidimensional data as the case of interest.
El Haourani, Lamia, Elkalam, Anas Abou, Ouahman, Abdelah Ait.  2018.  Knowledge Based Access Control a Model for Security and Privacy in the Big Data. Proceedings of the 3rd International Conference on Smart City Applications. :16:1-16:8.
The most popular features of Big Data revolve around the so-called "3V" criterion: Volume, Variety and Velocity. Big Data is based on the massive collection and in-depth analysis of personal data, with a view to profiling, or even marketing and commercialization, thus violating citizens' privacy and the security of their data. In this article we discuss security and privacy solutions in the context of Big Data. We then focus on access control and present our new model called Knowledge-based Access Control (KBAC); this strengthens the access control already deployed in the target company (e.g., based on "RBAC" role or "ABAC" attributes for example) by adding a semantic access control layer. KBAC offers thinner access control, tailored to Big Data, with effective protection against intrusion attempts and unauthorized data inferences.
Guerriero, Michele, Tamburri, Damian Andrew, Di Nitto, Elisabetta.  2018.  Defining, Enforcing and Checking Privacy Policies in Data-Intensive Applications. Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems. :172-182.
The rise of Big Data is leading to an increasing demand for large-scale data-intensive applications (DIAs), which have to analyse massive amounts of personal data (e.g. customers' location, cars' speed, people heartbeat, etc.), some of which can be sensitive, meaning that its confidentiality has to be protected. In this context, DIA providers are responsible for enforcing privacy policies that account for the privacy preferences of data subjects as well as for general privacy regulations. This is the case, for instance, of data brokers, i.e. companies that continuously collect and analyse data in order to provide useful analytics to their clients. Unfortunately, the enforcement of privacy policies in modern DIAs tends to become cumbersome because (i) the number of policies can easily explode, depending on the number of data subjects, (ii) policy enforcement has to autonomously adapt to the application context, thus, requiring some non-trivial runtime reasoning, and (iii) designing and developing modern DIAs is complex per se. For the above reasons, we need specific design and runtime methods enabling so called privacy-by-design in a Big Data context. In this article we propose an approach for specifying, enforcing and checking privacy policies on DIAs designed according to the Google Dataflow model and we show that the enforcement approach behaves correctly in the considered cases and introduces a performance overhead that is acceptable given the requirements of a typical DIA.
Colombo, Pietro, Ferrari, Elena.  2018.  Access Control in the Era of Big Data: State of the Art and Research Directions. Proceedings of the 23Nd ACM on Symposium on Access Control Models and Technologies. :185-192.
Data security and privacy issues are magnified by the volume, the variety, and the velocity of Big Data and by the lack, up to now, of a standard data model and related data manipulation language. In this paper, we focus on one of the key data security services, that is, access control, by highlighting the differences with traditional data management systems and describing a set of requirements that any access control solution for Big Data platforms may fulfill. We then describe the state of the art and discuss open research issues.
Jaeger, D., Cheng, F., Meinel, C..  2018.  Accelerating Event Processing for Security Analytics on a Distributed In-Memory Platform. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). :634-643.

The analysis of security-related event logs is an important step for the investigation of cyber-attacks. It allows tracing malicious activities and lets a security operator find out what has happened. However, since IT landscapes are growing in size and diversity, the amount of events and their highly different representations are becoming a Big Data challenge. Unfortunately, current solutions for the analysis of security-related events, so called Security Information and Event Management (SIEM) systems, are not able to keep up with the load. In this work, we propose a distributed SIEM platform that makes use of highly efficient distributed normalization and persists event data into an in-memory database. We implement the normalization on common distribution frameworks, i.e. Spark, Storm, Trident and Heron, and compare their performance with our custom-built distribution solution. Additionally, different tuning options are introduced and their speed advantage is presented. In the end, we show how the writing into an in-memory database can be tuned to achieve optimal persistence speed. Using the proposed approach, we are able to not only fully normalize, but also persist more than 20 billion events per day with relatively small client hardware. Therefore, we are confident that our approach can handle the load of events in even very large IT landscapes.

2019-03-04
Herald, N. E., David, M. W..  2018.  A Framework for Making Effective Responses to Cyberattacks. 2018 IEEE International Conference on Big Data (Big Data). :4798–4805.
The process for determining how to respond to a cyberattack involves evaluating many factors, including some with competing risks. Consequentially, decision makers in the private sector and policymakers in the U.S. government (USG) need a framework in order to make effective response decisions. The authors' research identified two competing risks: 1) the risk of not responding forcefully enough to deter a suspected attacker, and 2) responding in a manner that escalates a situation with an attacker. The authors also identified three primary factors that influence these risks: attribution confidence/time, the scale of the attack, and the relationship with the suspected attacker. This paper provides a framework to help decision makers understand how these factors interact to influence the risks associated with potential response options to cyberattacks. The views expressed do not reflect the official policy or position of the National Intelligence University, the Department of Defense, the U.S. Intelligence Community, or the U.S. Government.
2019-02-25
Lekshmi, M. B., Deepthi, V. R..  2018.  Spam Detection Framework for Online Reviews Using Hadoop’ s Computational Capability. 2018 International CET Conference on Control, Communication, and Computing (IC4). :436–440.
Nowadays, online reviews have become one of the vital elements for customers to do online shopping. Organizations and individuals use this information to buy the right products and make business decisions. This has influenced the spammers or unethical business people to create false reviews and promote their products to out-beat competitions. Sophisticated systems are developed by spammers to create bulk of spam reviews in any websites within hours. To tackle this problem, studies have been conducted to formulate effective ways to detect the spam reviews. Various spam detection methods have been introduced in which most of them extracts meaningful features from the text or used machine learning techniques. These approaches gave little importance on extracted feature type and processing rate. NetSpam[1] defines a framework which can classify the review dataset based on spam features and maps them to a spam detection procedure which performs better than previous works in predictive accuracy. In this work, a method is proposed that can improve the processing rate by applying a distributed approach on review dataset using MapReduce feature. Parallel programming concept using MapReduce is used for processing big data in Hadoop. The solution involves parallelising the algorithm defined in NetSpam and it defines a spam detection procedure with better predictive accuracy and processing rate.
Brahem, Mariem, Yeh, Laurent, Zeitouni, Karine.  2018.  Efficient Astronomical Query Processing Using Spark. Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. :229–238.
Sky surveys represent a fundamental data source in astronomy. Today, these surveys are moving into a petascale regime produced by modern telescopes. Due to the exponential growth of astronomical data, there is a pressing need to provide efficient astronomical query processing. Our goal is to bridge the gap between existing distributed systems and high-level languages for astronomers. In this paper, we present efficient techniques for query processing of astronomical data using ASTROIDE. Our framework helps astronomers to take advantage of the richness of the astronomical data. The proposed model supports complex astronomical operators expressed using ADQL (Astronomical Data Query Language), an extension of SQL commonly used by astronomers. ASTROIDE proposes spatial indexing and partitioning techniques to better filter the data access. It also implements a query optimizer that injects spatial-aware optimization rules and strategies. Experimental evaluation based on real datasets demonstrates that the present framework is scalable and efficient.
2019-02-14
Dauda, Ahmed, Mclean, Scott, Almehmadi, Abdulaziz, El-Khatib, Khalil.  2018.  Big Data Analytics Architecture for Security Intelligence. Proceedings of the 11th International Conference on Security of Information and Networks. :19:1-19:4.

The need for security1 continues to grow in distributed computing. Today's security solutions require greater scalability and convenience in cloud-computing architectures, in addition to the ability to store and process larger volumes of data to address very sophisticated attacks. This paper explores some of the existing architectures for big data intelligence analytics, and proposes an architecture that promises to provide greater security for data intensive environments. The architecture is designed to leverage the wealth in the multi-source information for security intelligence.

2019-02-13
Yasumura, Y., Imabayashi, H., Yamana, H..  2018.  Attribute-based proxy re-encryption method for revocation in cloud storage: Reduction of communication cost at re-encryption. 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA). :312–318.
In recent years, many users have uploaded data to the cloud for easy storage and sharing with other users. At the same time, security and privacy concerns for the data are growing. Attribute-based encryption (ABE) enables both data security and access control by defining users with attributes so that only those users who have matching attributes can decrypt them. For real-world applications of ABE, revocation of users or their attributes is necessary so that revoked users can no longer decrypt the data. In actual implementations, ABE is used in hybrid with a symmetric encryption scheme such as the advanced encryption standard (AES) where data is encrypted with AES and the AES key is encrypted with ABE. The hybrid encryption scheme requires re-encryption of the data upon revocation to ensure that the revoked users can no longer decrypt that data. To re-encrypt the data, the data owner (DO) must download the data from the cloud, then decrypt, encrypt, and upload the data back to the cloud, resulting in both huge communication costs and computational burden on the DO depending on the size of the data to be re-encrypted. In this paper, we propose an attribute-based proxy re-encryption method in which data can be re-encrypted in the cloud without downloading any data by adopting both ABE and Syalim's encryption scheme. Our proposed scheme reduces the communication cost between the DO and cloud storage. Experimental results show that the proposed method reduces the communication cost by as much as one quarter compared to that of the trivial solution.
2019-02-08
Yousefi, M., Mtetwa, N., Zhang, Y., Tianfield, H..  2018.  A Reinforcement Learning Approach for Attack Graph Analysis. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :212-217.

Attack graph approach is a common tool for the analysis of network security. However, analysis of attack graphs could be complicated and difficult depending on the attack graph size. This paper presents an approximate analysis approach for attack graphs based on Q-learning. First, we employ multi-host multi-stage vulnerability analysis (MulVAL) to generate an attack graph for a given network topology. Then we refine the attack graph and generate a simplified graph called a transition graph. Next, we use a Q-learning model to find possible attack routes that an attacker could use to compromise the security of the network. Finally, we evaluate the approach by applying it to a typical IT network scenario with specific services, network configurations, and vulnerabilities.

Sisiaridis, D., Markowitch, O..  2018.  Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics. 2018 1st International Conference on Data Intelligence and Security (ICDIS). :43-48.

Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cybersecurity threats and attacks by utilizing data mining techniques in the field of Artificial Intelligence. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection utilizing machine learning algorithms for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.