Biblio
Finite-state machine (FSM) is widely used as control unit in most digital designs. Many intellectual property protection and obfuscation techniques leverage on the exponential number of possible states and state transitions of large FSM to secure a physical design with the reason that it is challenging to retrieve the FSM design from its downstream design or physical implementation without knowledge of the design. In this paper, we postulate that this assumption may not be sustainable with big data analytics. We demonstrate by applying a data mining technique to analyze sufficiently large amount of data collected from a full scan design to identify its FSM state registers. An impact metric is introduced to discriminate FSM state registers from other registers. A decision tree algorithm is constructed from the scan data for the regression analysis of the dependency of other registers on a chosen register to deduce its impact. The registers with the greater impact are more likely to be the FSM state registers. The proposed scheme is applied on several complex designs from OpenCores. The experiment results show the feasibility of our scheme in correctly identifying most FSM state registers with a high hit rate for a large majority of the designs.
Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
Phishing is typically deployed as an attack vector in the initial stages of a hacking endeavour. Due to it low-risk rightreward nature it has seen a widespread adoption, and detecting it has become a challenge in recent times. This paper proposes a novel means of detecting phishing websites using a Generative Adversarial Network. Taking into account the internal structure and external metadata of a website, the proposed approach uses a generator network which generates both legitimate as well as synthetic phishing features to train a discriminator network. The latter then determines if the features are either normal or phishing websites, before improving its detection accuracy based on the classification error. The proposed approach is evaluated using two different phishing datasets and is found to achieve a detection accuracy of up to 94%.
In enterprise environments, the amount of managed assets and vulnerabilities that can be exploited is staggering. Hackers' lateral movements between such assets generate a complex big data graph, that contains potential hacking paths. In this vision paper, we enumerate risk-reduction security requirements in large scale environments, then present the Agile Security methodology and technologies for detection, modeling, and constant prioritization of security requirements, agile style. Agile Security models different types of security requirements into the context of an attack graph, containing business process targets and critical assets identification, configuration items, and possible impacts of cyber-attacks. By simulating and analyzing virtual adversary attack paths toward cardinal assets, Agile Security examines the business impact on business processes and prioritizes surgical requirements. Thus, handling these requirements backlog that are constantly evaluated as an outcome of employing Agile Security, gradually increases system hardening, reduces business risks and informs the IT service desk or Security Operation Center what remediation action to perform next. Once remediated, Agile Security constantly recomputes residual risk, assessing risk increase by threat intelligence or infrastructure changes versus defender's remediation actions in order to drive overall attack surface reduction.
The technological development of the energy sector also produced complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing which areas of the smart grid system use big data technologies and technologies, big data technologies for detecting smart grid attacks have received attention. Big data analytics can produce efficient solutions and it is especially important to choose which algorithms and metrics to use. For this reason, an application prototype has been proposed that uses a big data method to detect attacks on the smart grid. The algorithm with high accuracy was determined to be 92% for random forests and 87% for decision trees.
Protecting sensitive business and personal information is a cornerstone requirement when enterprises and organizations move to the cloud. Many aspects of this requirement are already handled at various levels. Data-at-rest can be secured in cloud stores by encrypting it before persisting the data to storage, while data-in-flight is transmitted using protected channels such as TLS and HTTPS. Data-in-use, processed in cloud compute nodes, is the most vulnerable link in the end-to-end information flow, since the process memory can be accessed by malicious privileged software or system administrators. IBM Research - Haifa takes part in a European H2020 research project RestAssured [2] that aims to deliver end-to-end cloud architectures and methodologies for assuring secure data processing in the cloud. We build a trusted analytic platform based on a combination of hardware and software components, and collaborate with the RestAssured partners to implement cloud analytic use cases ranging from social care services to pay-as-you-drive insurance policies. The platform uses the Intel SGX (Software Guard Extension) technology [4], available in Skylake and later processors, that allows to create memory regions (enclaves) protected with hardware encryption in the SoC (system on chip). The data resides unencrypted only inside the processor. It is encrypted in SoC before being written to main memory, and decrypted in SoC upon fetching from main memory. Our team has designed and developed a framework for trust management in SGX enclaves [3] that performs verification (remote attestation) of the enclave hardware and software components, and assists in trusted delivery of secrets (such as data encryption keys) to the enclaves. Apache Spark SQL [1] is the analytic engine of the RestAssured platform. We use the Opaque [6] open source technology [5] from the Berkeley RISELab that integrates the Spark SQL with Intel SGX hardware, and offers data protection by running SQL transformations inside trusted enclaves. We have augmented Opaque with a few key mechanisms for secure data processing in SGX enclaves, by integrating Opaque with our trust management framework to enable remote attestation and data encryption key delivery to Opaque enclaves. We have also developed a component that serves as a gateway between RestAssured use case applications and Opaque clusters. The gateway supports a REST endpoint that accepts SQL query from applications, sends the query for governance verification and modification by a rule engine, and executes the modified query in Opaque. The results are serialized into a JSON object and sent back to the application on a secure REST channel.
Many enterprises are transitioning towards data-driven business processes. There are numerous situations where multiple parties would like to share data towards a common goal if it were possible to simultaneously protect the privacy and security of the individuals and organizations described in the data. Existing solutions for multi-party analytics that follow the so called Data Lake paradigm have parties transfer their raw data to a trusted third-party (i.e., mediator), which then performs the desired analysis on the global data, and shares the results with the parties. However, such a solution does not fit many applications such as Healthcare, Finance, and the Internet-of-Things, where privacy is a strong concern. Motivated by the increasing demands for data privacy, we study the problem of privacy-preserving multi-party data analytics, where the goal is to enable analytics on multi-party data without compromising the data privacy of each individual party. In this paper, we first propose a secure sum protocol with strong security guarantees. The proposed secure sum protocol is resistant to collusion attacks even with N-2 parties colluding, where N denotes the total number of collaborating parties. We then use this protocol to propose two secure gradient descent algorithms, one for horizontally partitioned data, and the other for vertically partitioned data. The proposed framework is generic and applies to a wide class of machine learning problems. We demonstrate our solution for two popular use-cases, regression and classification, and evaluate the performance of the proposed solution in terms of the obtained model accuracy, latency and communication cost. In addition, we perform a scalability analysis to evaluate the performance of the proposed solution as the data size and the number of parties increase.
Advances in nanotechnology, large scale computing and communications infrastructure, coupled with recent progress in big data analytics, have enabled linking several billion devices to the Internet. These devices provide unprecedented automation, cognitive capabilities, and situational awareness. This new ecosystem–termed as the Internet-of-Things (IoT)–also provides many entry points into the network through the gadgets that connect to the Internet, making security of IoT systems a complex problem. In this position paper, we argue that in order to build a safer IoT system, we need a radically new approach to security. We propose a new security framework that draws ideas from software defined networks (SDN), and data analytics techniques; this framework provides dynamic policy enforcements on every layer of the protocol stack and can adapt quickly to a diverse set of industry use-cases that IoT deployments cater to. Our proposal does not make any assumptions on the capabilities of the devices - it can work with already deployed as well as new types of devices, while also conforming to a service-centric architecture. Even though our focus is on industrial IoT systems, the ideas presented here are applicable to IoT used in a wide array of applications. The goal of this position paper is to initiate a dialogue among standardization bodies and security experts to help raise awareness about network-centric approaches to IoT security.
The ability to discover patterns of interest in criminal networks can support and ease the investigation tasks by security and law enforcement agencies. By considering criminal networks as a special case of social networks, we can properly reuse most of the state-of-the-art techniques to discover patterns of interests, i.e., hidden and potential links. Nevertheless, in time-sensible scenarios, like the one involving criminal actions, the ability to discover patterns in a (near) real-time manner can be of primary importance.In this paper, we investigate the identification of patterns for link detection and prediction on an evolving criminal network. To extract valuable information as soon as data is generated, we exploit a stream processing approach. To this end, we also propose three new similarity social network metrics, specifically tailored for criminal link detection and prediction. Then, we develop a flexible data stream processing application relying on the Apache Flink framework; this solution allows us to deploy and evaluate the newly proposed metrics as well as the ones existing in literature. The experimental results show that the new metrics we propose can reach up to 83% accuracy in detection and 82% accuracy in prediction, resulting competitive with the state of the art metrics.
Technological advancement enables the need of internet everywhere. The power industry is not an exception in the technological advancement which makes everything smarter. Smart grid is the advanced version of the traditional grid, which makes the system more efficient and self-healing. Synchrophasor is a device used in smart grids to measure the values of electric waves, voltages and current. The phasor measurement unit produces immense volume of current and voltage data that is used to monitor and control the performance of the grid. These data are huge in size and vulnerable to attacks. Intrusion Detection is a common technique for finding the intrusions in the system. In this paper, a big data framework is designed using various machine learning techniques, and intrusions are detected based on the classifications applied on the synchrophasor dataset. In this approach various machine learning techniques like deep neural networks, support vector machines, random forest, decision trees and naive bayes classifications are done for the synchrophasor dataset and the results are compared using metrics of accuracy, recall, false rate, specificity, and prediction time. Feature selection and dimensionality reduction algorithms are used to reduce the prediction time taken by the proposed approach. This paper uses apache spark as a platform which is suitable for the implementation of Intrusion Detection system in smart grids using big data analytics.
Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.
Cloud computing enables the outsourcing of big data analytics, where a third-party server is responsible for data management and processing. In this paper, we consider the outsourcing model in which a third-party server provides record matching as a service. In particular, given a target record, the service provider returns all records from the outsourced dataset that match the target according to specific distance metrics. Identifying matching records in databases plays an important role in information integration and entity resolution. A major security concern of this outsourcing paradigm is whether the service provider returns the correct record matching results. To solve the problem, we design EARRING, an Efficient Authentication of outsouRced Record matchING framework. EARRING requires the service provider to construct the verification object (VO) of the record matching results. From the VO, the client is able to catch any incorrect result with cheap computational cost. Experiment results on real-world datasets demonstrate the efficiency of EARRING.