Biblio
Lately mining of information from online life is pulling in more consideration because of the blast in the development of Big Data. In security, Big Data manages an assortment of immense advanced data for investigating, envisioning and to draw the bits of knowledge for the expectation and anticipation of digital assaults. Big Data Analytics (BDA) is the term composed by experts to portray the art of dealing with, taking care of and gathering a great deal of data for future evaluation. Data is being made at an upsetting rate. The quick improvement of the Internet, Internet of Things (IoT) and other creative advances are the rule liable gatherings behind this proceeded with advancement. The data made is an impression of the earth, it is conveyed out of, along these lines can use the data got away from structures to understand the internal exercises of that system. This has become a significant element in cyber security where the objective is to secure resources. Moreover, the developing estimation of information has made large information a high worth objective. Right now, investigate ongoing exploration works in cyber security comparable to huge information and feature how Big information is secured and how huge information can likewise be utilized as a device for cyber security. Simultaneously, a Big Data based concentrated log investigation framework is actualized to distinguish the system traffic happened with assailants through DDOS, SQL Injection and Bruce Force assault. The log record is naturally transmitted to the brought together cloud server and big information is started in the investigation process.
This work examines metrics that can be used to measure the ability of agile software development methods to meet security and privacy requirements of communications applications. Many implementations of communication protocols, including those in vehicular networks, occur within regulated environments where agile development methods are traditionally discouraged. We propose a framework and metrics to measure adherence to security, quality and software effectiveness regulations if developers desire the cost and schedule benefits of agile methods. After providing an overview of specific challenges that a regulated environment imposes on communications software development, we proceed to examine the 12 agile principles and how they relate to a regulatory environment. From this review we identify two metrics to measure performance of three key regulatory attributes of software for communications applications, and then recommend an approach of either tools, agile methods or DevOps that is best positioned to satisfy its regulated environment attributes. By considering the recommendations in this paper, managers of software-dominant communications programs in a regulated environment can gain insight into leveraging the benefits of agile methods.
Machine learning algorithms used to detect attacks are limited by the fact that they cannot incorporate the back-ground knowledge that an analyst has. This limits their suitability in detecting new attacks. Reinforcement learning is different from traditional machine learning algorithms used in the cybersecurity domain. Compared to traditional ML algorithms, reinforcement learning does not need a mapping of the input-output space or a specific user-defined metric to compare data points. This is important for the cybersecurity domain, especially for malware detection and mitigation, as not all problems have a single, known, correct answer. Often, security researchers have to resort to guided trial and error to understand the presence of a malware and mitigate it.In this paper, we incorporate prior knowledge, represented as Cybersecurity Knowledge Graphs (CKGs), to guide the exploration of an RL algorithm to detect malware. CKGs capture semantic relationships between cyber-entities, including that mined from open source. Instead of trying out random guesses and observing the change in the environment, we aim to take the help of verified knowledge about cyber-attack to guide our reinforcement learning algorithm to effectively identify ways to detect the presence of malicious filenames so that they can be deleted to mitigate a cyber-attack. We show that such a guided system outperforms a base RL system in detecting malware.
Internet of Things devices and data sources areseeing increased use in various application areas. The pro-liferation of cheaper sensor hardware has allowed for widerscale data collection deployments. With increased numbers ofdeployed sensors and the use of heterogeneous sensor typesthere is increased scope for collecting erroneous, inaccurate orinconsistent data. This in turn may lead to inaccurate modelsbuilt from this data. It is important to evaluate this data asit is collected to determine its validity. This paper presents ananalysis of data quality as it is represented in Internet of Things(IoT) systems and some of the limitations of this representation. The paper discusses the use of trust as a heuristic to drive dataquality measurements. Trust is a well-established metric that hasbeen used to determine the validity of a piece or source of datain crowd sourced or other unreliable data collection techniques. The analysis extends to detail an appropriate framework forrepresenting data quality effectively within the big data modeland why a trust backed framework is important especially inheterogeneously sourced IoT data streams.
A metric and structure of computing 2020 is proposed in the form of Top 12 Technology Trends, which will influence on investment in science, education and industry in developing countries. The primary social and technological problem of the protection of society and critical facilities through the creation of Global Intelligent Cyber Security is formulated. The axioms of the constructive formation of developing countries on the basis of the adoption of moral relations are formulated. Models, methods and algorithms of cyber-social computing are proposed that are focused on processing big data, searching for keywords and test fragments. New characteristic equations of similarity - differences between the processes and phenomena are synthesized for the exact information retrieval by keywords in cyber-physical space. A computing model of the development of the Universe is formulated, where the binary interactions of entities and forms are harmonic functions of the phase state. A structure of interactive computing of the creative process based on a metric assessment of the development status with world achievements is proposed.
Information security is a process of securing data from security breaches, hackers. The program of intrusion detection is a software framework that keeps tracking and analyzing the data in the network to identify the attacks by using traditional techniques. These traditional intrusion techniques work very efficient when it uses on small data. but when the same techniques used for big data, process of analyzing the data properties take long time and become not efficient and need to use the big data technologies like Apache Spark, Hadoop, Flink etc. to design modern Intrusion Detection System (IDS). In this paper, the design of Apache Spark and classification algorithm-based IDS is presented and employed Chi-square as a feature selection method for selecting the features from network security events data. The performance of Logistic Regression, Decision Tree and SVM is evaluated with SGD in the design of Apache Spark based IDS with AUROC and AUPR used as metrics. Also tabulated the training and testing time of each algorithm and employed NSL-KDD dataset for designing all our experiments.
In the past decades, learning an effective distance metric between pairs of instances has played an important role in the classification and retrieval task, for example, the person identification or malware retrieval in the IoT service. The core motivation of recent efforts focus on improving the metric forms, and already showed promising results on the various applications. However, such models often fail to produce a reliable metric on the ambiguous test set. It happens mainly due to the sampling process of the training set, which is not representative of the distribution of the negative samples, especially the examples that are closer to the boundary of different categories (also called hard negative samples). In this paper, we focus on addressing such problems and propose an adaptive margin deep adversarial metric learning (AMDAML) framework. It exploits numerous common negative samples to generate potential hard (adversarial) negatives and applies them to facilitate robust metric learning. Apart from the previous approaches that typically depend on the search or data augmentation to find hard negative samples, the generation of adversarial negative instances could avoid the limitation of domain knowledge and constraint pairs' amount. Specifically, in order to prevent over fitting or underfitting during the training step, we propose an adaptive margin loss that preserves a flexible margin between the negative (include the adversarial and original) and positive samples. We simultaneously train both the adversarial negative generator and conventional metric objective in an adversarial manner and learn the feature representations that are more precise and robust. The experimental results on practical data sets clearly demonstrate the superiority of AMDAML to representative state-of-the-art metric learning models.
Various studies have been performed to explore the feasibility of detection of web-based attacks by machine learning techniques. False-positive and false-negative results have been reported as a major issue to be addressed to make machine learning-based detection and prevention of web-based attacks reliable and trustworthy. In our research, we tried to identify and address the root cause of the false-positive and false-negative results. In our experiment, we used the CSIC 2010 HTTP dataset, which contains the generated traffic targeted to an e-commerce web application. Our experimental results demonstrate that applying the proposed fine-tuned feature set extraction results in improved detection and classification of web-based attacks for all tested machine learning algorithms. The performance of the machine learning algorithm in the detection of attacks was evaluated by the Precision, Recall, Accuracy, and F-measure metrics. Among three tested algorithms, the J48 decision tree algorithm provided the highest True Positive rate, Precision, and Recall.
Data value (DV) is a novel concept that is introduced as one of the Big Data phenomenon features. While continuing an investigation of the DV ontology and its relationship with the data quality (DQ) on the conceptual level, this paper researches possible applications and use of the DV in the practical design of security and privacy protection systems and tools. We present a novel approach to DV evaluation that maps DQ metrics into DV value. Developed methods allow DV and DQ use in a wide range of application domains. To demonstrate DQ and DV concept employment in real tasks we present two real-life scenarios. The first use case demonstrates the DV use in crowdsensing application design. It shows up how DV can be calculated by integrating various metrics characterizing data application functionality, accuracy, and security. The second one incorporates the privacy consideration into DV calculus by exploring the relationship between privacy, DQ, and DV in the defense against web-site fingerprinting in The Onion Router (TOR) networks. These examples demonstrate how our methods of the DV and DQ evaluation may be employed in the design of real systems with security and privacy consideration.
Browser extensions have by and large become a normal and accepted omnipresent feature within modern browsers. However, since their inception, browser extensions have remained under scrutiny for opening vulnerabilities for users. While a large amount of effort has been dedicated to patching such issues as they arise, including the implementation of extension sandboxes and explicit permissions, issues remain within the browser extension ecosystem through user-scripts. User-scripts, or micro-script extensions hosted by a top-level extension, are largely unregulated but inherit the permissions of the top-level application manager, which popularly includes extensions such as Greasemonkey, Tampermonkey, or xStyle. While most user-scripts are docile and serve a specific beneficial functionality, due to their inherently open nature and the unregulated ecosystem, they are easy for malicious parties to exploit. Common attacks through this method involve hijacking of DOM elements to execute malicious javascript and/or XSS attacks, although other more advanced attacks can be deployed as well. User-scripts have not received much attention, and this vulnerability has persisted despite attempts to make browser extensions more secure. This ongoing vulnerability remains an unknown threat to many users who employ user-scripts, and circumvents security mechanisms otherwise put in place by browsers. This paper discusses this extension derivative vulnerability as it pertains to current browser security paradigms.
The objective of this paper is to propose a model of a distributed intrusion detection system based on the multi-agent paradigm and the distributed file system (HDFS). Multi-agent systems (MAS) are very suitable to intrusion detection systems as they can address the issue of geographic data security in terms of autonomy, distribution and performance. The proposed system is based on a set of autonomous agents that cooperate and collaborate with each other to effectively detect intrusions and suspicious activities that may impact geographic information systems. Our system allows the detection of known and unknown computer attacks without any human intervention (Security Experts) unlike traditional intrusion detection systems that rely on knowledge bases as a mechanism to detect known attacks. The proposed model allows a real time detection of known and unknown attacks within large networks hosting geographic data.
Throughout the life cycle of any technical project, the enterprise needs to assess the risks associated with its development, commissioning, operation and decommissioning. This article defines the task of researching risks in relation to the operation of a data storage subsystem in the cloud infrastructure of a geographically distributed company and the tools that are required for this. Analysts point out that, compared to 2018, in 2019 there were 3.5 times more cases of confidential information leaks from storages on unprotected (freely accessible due to incorrect configuration) servers in cloud services. The total number of compromised personal data and payment information records increased 5.4 times compared to 2018 and amounted to more than 8.35 billion records. Moreover, the share of leaks of payment information has decreased, but the percentage of leaks of personal data has grown and accounts for almost 90% of all leaks from cloud storage. On average, each unsecured service identified resulted in 33.7 million personal data records being leaked. Leaks are mainly related to misconfiguration of services and stored resources, as well as human factors. These impacts can be minimized by improving the skills of cloud storage administrators and regularly auditing storage. Despite its seeming insecurity, the cloud is a reliable way of storing data. At the same time, leaks are still occurring. According to Kaspersky Lab, every tenth (11%) data leak from the cloud became possible due to the actions of the provider, while a third of all cyber incidents in the cloud (31% in Russia and 33% in the world) were due to gullibility company employees caught up in social engineering techniques. Minimizing the risks associated with the storage of personal data is one of the main tasks when operating a company's cloud infrastructure.
With the rapid development of Internet technology, the era of big data is coming. SQL injection attack is the most common and the most dangerous threat to database. This paper studies the working mode and workflow of the GreenSQL database firewall. Based on the analysis of the characteristics and patterns of SQL injection attack command, the input model of GreenSQL learning is optimized by constructing the patterned input and optimized whitelist. The research method can improve the learning efficiency of GreenSQL and intercept samples in IPS mode, so as to effectively maintain the security of background database.
The facial recognition time by time takes more importance, due to the extend kind of applications it has, but it is still challenging when faces big variations in the characteristics of the biometric data used in the process and especially referring to the transportation of information through the internet in the internet of things context. Based on the systematic review and rigorous study that supports the extraction of the most relevant information on this topic [1], a software architecture proposal which contains basic security requirements necessary for the treatment of the data involved in the application of facial recognition techniques, oriented to an IoT environment was generated. Concluding that the security and privacy considerations of the information registered in IoT devices represent a challenge and it is a priority to be able to guarantee that the data circulating on the network are only accessible to the user that was designed for this.
With the development of Internet technology, the attacker gets more and more complex background knowledge, which makes the anonymous model susceptible to background attack. Although the differential privacy model can resist the background attack, it reduces the versatility of the data. In this paper, this paper proposes a differential privacy information publishing algorithm based on clustering anonymity. The algorithm uses the cluster anonymous algorithm based on KD tree to cluster the original data sets and gets anonymous tables by anonymous operation. Finally, the algorithm adds noise to the anonymous table to satisfy the definition of differential privacy. The algorithm is compared with the DCMDP (Density-Based Clustering Mechanism with Differential Privacy, DCMDP) algorithm under different privacy budgets. The experiments show that as the privacy budget increases, the algorithm reduces the information loss by about 80% of the published data.
Partitional Clustering Algorithm (PCA) on the Hadoop Distributed File System is to perform big data securities using the Perturbation Technique is the main idea of the proposed work. There are numerous clustering methods available that are used to categorize the information from the big data. PCA discovers the cluster based on the initial partition of the data. In this approach, it is possible to develop a security safeguarding of data that is impoverished to allow the calculations and communication. The performances were analyzed on Health Care database under the studies of various parameters like precision, accuracy, and F-score measure. The outcome of the results is to demonstrate that this method is used to decrease the complication in preserving privacy and better accuracy than that of the existing techniques.