Biblio
The huge volume, variety, and velocity of big data have empowered Machine Learning (ML) techniques and Artificial Intelligence (AI) systems. However, the vast portion of data used to train AI systems is sensitive information. Hence, any vulnerability has a potentially disastrous impact on privacy aspects and security issues. Nevertheless, the increased demands for high-quality AI from governments and companies require the utilization of big data in the systems. Several studies have highlighted the threats of big data on different platforms and the countermeasures to reduce the risks caused by attacks. In this paper, we provide an overview of the existing threats which violate privacy aspects and security issues inflicted by big data as a primary driving force within the AI/ML workflow. We define an adversarial model to investigate the attacks. Additionally, we analyze and summarize the defense strategies and countermeasures of these attacks. Furthermore, due to the impact of AI systems in the market and the vast majority of business sectors, we also investigate Standards Developing Organizations (SDOs) that are actively involved in providing guidelines to protect the privacy and ensure the security of big data and AI systems. Our far-reaching goal is to bridge the research and standardization frame to increase the consistency and efficiency of AI systems developments guaranteeing customer satisfaction while transferring a high degree of trustworthiness.
Inference of unknown opinions with uncertain, adversarial (e.g., incorrect or conflicting) evidence in large datasets is not a trivial task. Without proper handling, it can easily mislead decision making in data mining tasks. In this work, we propose a highly scalable opinion inference probabilistic model, namely Adversarial Collective Opinion Inference (Adv-COI), which provides a solution to infer unknown opinions with high scalability and robustness under the presence of uncertain, adversarial evidence by enhancing Collective Subjective Logic (CSL) which is developed by combining SL and Probabilistic Soft Logic (PSL). The key idea behind the Adv-COI is to learn a model of robust ways against uncertain, adversarial evidence which is formulated as a min-max problem. We validate the out-performance of the Adv-COI compared to baseline models and its competitive counterparts under possible adversarial attacks on the logic-rule based structured data and white and black box adversarial attacks under both clean and perturbed semi-synthetic and real-world datasets in three real world applications. The results show that the Adv-COI generates the lowest mean absolute error in the expected truth probability while producing the lowest running time among all.
Realizing the importance of the concept of “smart city” and its impact on the quality of life, many infrastructures, such as power plants, began their digital transformation process by leveraging modern computing and advanced communication technologies. Unfortunately, by increasing the number of connections, power plants become more and more vulnerable and also an attractive target for cyber-physical attacks. The analysis of interdependencies among system components reveals interdependent connections, and facilitates the identification of those among them that are in need of special protection. In this paper, we review the recent literature which utilizes graph-based models and network-based models to study these interdependencies. A comprehensive overview, based on the main features of the systems including communication direction, control parameters, research target, scalability, security and safety, is presented. We also assess the computational complexity associated with the approaches presented in the reviewed papers, and we use this metric to assess the scalability of the approaches.
Security challenges present in Machine-to-Machine Communication (M2M-C) and big data paradigm are fundamentally different from conventional network security challenges. In M2M-C paradigms, “Trust” is a vital constituent of security solutions that address security threats and for such solutions,it is important to quantify and evaluate the amount of trust in the information and its source. In this work, we focus on Machine Learning (ML) Based Trust (MLBT) evaluation model for detecting malicious activities in a vehicular Based M2M-C (VBM2M-C) network. In particular, we present an Entropy Based Feature Engineering (EBFE) coupled Extreme Gradient Boosting (XGBoost) model which is optimized with Binary Particle Swarm optimization technique. Based on three performance metrics, i.e., Accuracy Rate (AR), True Positive Rate (TPR), False Positive Rate (FPR), the effectiveness of the proposed method is evaluated in comparison to the state-of-the-art ensemble models, such as XGBoost and Random Forest. The simulation results demonstrates the superiority of the proposed model with approximately 10% improvement in accuracy, TPR and FPR, with reference to the attacker density of 30% compared with the start-of-the-art algorithms.
In this work, we will present a new hybrid cryptography method based on two hard problems: 1- The problem of the discrete logarithm on an elliptic curve defined on a finite local ring. 2- The closest vector problem in lattice and the conjugate problem on square matrices. At first, we will make the exchange of keys to the Diffie-Hellman. The encryption of a message is done with a bad basis of a lattice.
Data can be stored securely in various storage servers. But in the case of a server failure, or data theft from a certain number of servers, the remaining data becomes inadequate for use. Data is stored securely using secret sharing schemes, so that data can be reconstructed even if some of the servers fail. But not much work has been carried out in the direction of updation of this data. This leads to the problem of updation when two or more concurrent requests arrive and thus, it results in inconsistency. Our work proposes a novel method to store data securely with concurrent update requests using Petri Nets, under the assumption that the number of nodes is very large and the requests for updates are very frequent.
Network attack is a significant security issue for modern society. From small mobile devices to large cloud platforms, almost all computing products, used in our daily life, are networked and potentially under the threat of network intrusion. With the fast-growing network users, network intrusions become more and more frequent, volatile and advanced. Being able to capture intrusions in time for such a large scale network is critical and very challenging. To this end, the machine learning (or AI) based network intrusion detection (NID), due to its intelligent capability, has drawn increasing attention in recent years. Compared to the traditional signature-based approaches, the AI-based solutions are more capable of detecting variants of advanced network attacks. However, the high detection rate achieved by the existing designs is usually accompanied by a high rate of false alarms, which may significantly discount the overall effectiveness of the intrusion detection system. In this paper, we consider the existence of spatial and temporal features in the network traffic data and propose a hierarchical CNN+RNN neural network, LuNet. In LuNet, the convolutional neural network (CNN) and the recurrent neural network (RNN) learn input traffic data in sync with a gradually increasing granularity such that both spatial and temporal features of the data can be effectively extracted. Our experiments on two network traffic datasets show that compared to the state-of-the-art network intrusion detection techniques, LuNet not only offers a high level of detection capability but also has a much low rate of false positive-alarm.
Malicious domain names are consistently changing. It is challenging to keep blacklists of malicious domain names up-to-date because of the time lag between its creation and detection. Even if a website is clean itself, it does not necessarily mean that it won't be used as a pivot point to redirect users to malicious destinations. To address this issue, this paper demonstrates how to use linkage analysis and open-source threat intelligence to visualize the relationship of malicious domain names whilst verifying their categories, i.e., drive-by download, unwanted software etc. Featured by a graph-based model that could present the inter-connectivity of malicious domain names in a dynamic fashion, the proposed approach proved to be helpful for revealing the group patterns of different kinds of malicious domain names. When applied to analyze a blacklisted set of URLs in a real enterprise network, it showed better effectiveness than traditional methods and yielded a clearer view of the common patterns in the data.
Vehicular communication systems increase traffic efficiency and safety by allowing vehicles to share safety-related information and location-based services. Pseudonym schemes are the standard solutions providing driver/vehicle anonymity, whilst enforcing vehicle accountability in case of liability issues. State-of-the-art PKI-based pseudonym schemes present scalability issues, notably due to the centralized architecture of certificate-based solutions. The first Direct Anonymous Attestation (DAA)-based pseudonym scheme was introduced at VNC 2017, providing a decentralized approach to the pseudonym generation and update phases. The DAA-based construction leverages the properties of trusted computing, allowing vehicles to autonomously generate their own pseudonyms by using a (resource constrained) Trusted Hardware Module or Component (TC). This proposition however requires the TC to delegate part of the (heavy) pseudonym generation computations to the (more powerful) vehicle's On-Board Unit (OBU), introducing security and privacy issues in case the OBU becomes compromised. In this paper, we introduce a novel pseudonym scheme based on a variant of DAA, namely a pre-DAA-based pseudonym scheme. All secure computations in the pre-DAA pseudonym lifecycle are executed by the secure element, thus creating a secure enclave for pseudonym generation, update, and revocation. We instantiate vehicle-to-everything (V2X) with our pre-DAA solution, thus ensuring user anonymity and user-controlled traceability within the vehicular network. In addition, the pre-DAA-based construction transfers accountability from the vehicle to the user, thus complying with the many-to-many driver/vehicle relation. We demonstrate the efficiency of our solution with a prototype implementation on a standard Javacard (acting as a TC), showing that messages can be anonymously signed and verified in less than 50 ms.
Public key cryptography plays a vital role in many information and communication systems for secure data transaction, authentication, identification, digital signature, and key management purpose. Elliptic curve cryptography (ECC) is a widely used public key cryptographic algorithm. In this paper, we propose a hardware-software codesign implementation of the ECC cipher. The algorithm is modelled in C language. Compute-intensive components are identified for their efficient hardware implementations. In the implementation, residue number system (RNS) with projective coordinates are utilized for performing the required arithmetic operations. To manage the hardware-software codeign in an integrated fashion Xilinx platform studio tool and Virtex-5 xc5vfx70t device based platform is utilized. An application of the implementation is demonstrated for encryption of text and its respective decryption over prime fields. The design is useful for providing an adequate level of security for IoTs.
Secure logging is essential for the integrity and accountability of cyber-physical systems (CPS). To prevent modification of log files the integrity of data must be ensured. In this work, we propose a solution for secure event in cyberphysical systems logging based on the blockchain technology, by encapsulating event data in blocks. The proposed solution considers the real-time application constraints that are inherent in CPS monitoring and control functions by optimizing the heterogeneous resources governing blockchain computations. In doing so, the proposed blockchain mechanism manages to deliver events in hard-to-tamper ledger blocks that can be accessed and utilized by the various functions and components of the system. Performance analysis of the proposed solution is conducted through extensive simulation, demonstrating the effectiveness of the proposed approach in delivering blocks of events on time using the minimum computational resources.
Neural Style Transfer based on convolutional neural networks has produced visually appealing results for image and video data in the recent years where e.g. the content of a photo and the style of a painting are merged to a novel piece of digital art. In practical engineering development, we utilize 3D objects as standard for optimizing digital shapes. Since these objects can be represented as binary 3D voxel representation, we propose to extend the Neural Style Transfer method to 3D geometries in analogy to 2D pixel representations. In a series of experiments, we first evaluate traditional Neural Style Transfer on 2D binary monochromatic images. We show that this method produces reasonable results on binary images lacking color information and even improve them by introducing a standardized Gram matrix based loss function for style. For an application of Neural Style Transfer on 3D voxel primitives, we trained several classifier networks demonstrating the importance of a meaningful convolutional network architecture. The standardization of the Gram matrix again strongly contributes to visually improved, less noisy results. We conclude that Neural Style Transfer extended by a standardization of the Gram matrix is a promising approach for generating novel 3D voxelized objects and expect future improvements with increasing graphics memory availability for finer object resolutions.
Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.
When employing biometric recognition systems, we have to take into account that biometric data are considered sensitive data. This has raised some privacy issues, and therefore secure systems providing template protection are required. Using homomorphic encryption, permanent protection can be ensured, since templates are stored and compared in the encrypted domain. In addition, the unprotected system's accuracy is preserved. To solve the problem of the computational overload linked to the encryption scheme, we present an early decision making strategy for iris-codes. In order to improve the recognition accuracy, the most consistent bits of the iris-code are moved to the beginning of the template. This allows an accurate block-wise comparison, thereby reducing the execution time. Hence, the resulting system grants template protection in a computationally efficient way. More specifically, in the experimental evaluation in identification mode, the block-wise comparison achieves a 92% speed-up on the IITD database with 300 enrolled templates.
In cloud computing environments with many virtual machines, containers, and other systems, an epidemic of malware can be crippling and highly threatening to business processes. In this vision paper, we introduce a hierarchical approach to performing malware detection and analysis using several recent advances in machine learning on graphs, hypergraphs, and natural language. We analyze individual systems and their logs, inspecting and understanding their behavior with attentional sequence models. Given a feature representation of each system's logs using this procedure, we construct an attributed network of the cloud with systems and other components as vertices and propose an analysis of malware with inductive graph and hypergraph learning models. With this foundation, we consider the multicloud case, in which multiple clouds with differing privacy requirements cooperate against the spread of malware, proposing the use of federated learning to perform inference and training while preserving privacy. Finally, we discuss several open problems that remain in defending cloud computing environments against malware related to designing robust ecosystems, identifying cloud-specific optimization problems for response strategy, action spaces for malware containment and eradication, and developing priors and transfer learning tasks for machine learning models in this area.
Many consumers now rely on different forms of voice assistants, both stand-alone devices and those built into smartphones. Currently, these systems react to specific wake-words, such as "Alexa," "Siri," or "Ok Google." However, with advancements in natural language processing, the next generation of voice assistants could instead always listen to the acoustic environment and proactively provide services and recommendations based on conversations without being explicitly invoked. We refer to such devices as "always listening voice assistants" and explore expectations around their potential use. In this paper, we report on a 178-participant survey investigating the potential services people anticipate from such a device and how they feel about sharing their data for these purposes. Our findings reveal that participants can anticipate a wide range of services pertaining to a conversation; however, most of the services are very similar to those that existing voice assistants currently provide with explicit commands. Participants are more likely to consent to share a conversation when they do not find it sensitive, they are comfortable with the service and find it beneficial, and when they already own a stand-alone voice assistant. Based on our findings we discuss the privacy challenges in designing an always-listening voice assistant.
The increasing demand and the use of mobile ad hoc network (MANET) in recent days have attracted the attention of researchers towards pursuing active research work largely related to security attacks in MANET. Gray hole attack is one of the most common security attacks observed in MANET. The paper focuses on gray hole attack analysis in Ad hoc on demand distance vector(AODV) routing protocol based MANET with reliability as a metric. Simulation is performed using ns-2.35 simulation software under varying number of network nodes and varying number of gray hole nodes. Results of simulation indicates that increasing the number of gray hole node in the MANET will decrease the reliability of MANET.
Multi-tenant cloud networks have various security and monitoring service functions (SFs) that constitute a service function chain (SFC) between two endpoints. SF rule ordering overlaps and policy conflicts can cause increased latency, service disruption and security breaches in cloud networks. Software Defined Network (SDN) based Network Function Virtualization (NFV) has emerged as a solution that allows dynamic SFC composition and traffic steering in a cloud network. We propose an SDN enabled Universal Policy Checking (SUPC) framework, to provide 1) Flow Composition and Ordering by translating various SF rules into the OpenFlow format. This ensures elimination of redundant rules and policy compliance in SFC. 2) Flow conflict analysis to identify conflicts in header space and actions between various SF rules. Our results show a significant reduction in SF rules on composition. Additionally, our conflict checking mechanism was able to identify several rule conflicts that pose security, efficiency, and service availability issues in the cloud network.