Biblio
The huge volume, variety, and velocity of big data have empowered Machine Learning (ML) techniques and Artificial Intelligence (AI) systems. However, the vast portion of data used to train AI systems is sensitive information. Hence, any vulnerability has a potentially disastrous impact on privacy aspects and security issues. Nevertheless, the increased demands for high-quality AI from governments and companies require the utilization of big data in the systems. Several studies have highlighted the threats of big data on different platforms and the countermeasures to reduce the risks caused by attacks. In this paper, we provide an overview of the existing threats which violate privacy aspects and security issues inflicted by big data as a primary driving force within the AI/ML workflow. We define an adversarial model to investigate the attacks. Additionally, we analyze and summarize the defense strategies and countermeasures of these attacks. Furthermore, due to the impact of AI systems in the market and the vast majority of business sectors, we also investigate Standards Developing Organizations (SDOs) that are actively involved in providing guidelines to protect the privacy and ensure the security of big data and AI systems. Our far-reaching goal is to bridge the research and standardization frame to increase the consistency and efficiency of AI systems developments guaranteeing customer satisfaction while transferring a high degree of trustworthiness.
We aim at creating a society where we can resolve various social challenges by incorporating the innovations of the fourth industrial revolution (e.g. IoT, big data, AI, robot, and the sharing economy) into every industry and social life. By doing so the society of the future will be one in which new values and services are created continuously, making people's lives more conformable and sustainable. This is Society 5.0, a super-smart society. Security and privacy are key issues to be addressed to realize Society 5.0. Privacy-preserving data analytics will play an important role. In this talk we show our recent works on privacy-preserving data analytics such as privacy-preserving logistic regression and privacy-preserving deep learning. Finally, we show our ongoing research project under JST CREST “AI”. In this project we are developing privacy-preserving financial data analytics systems that can detect fraud with high security and accuracy. To validate the systems, we will perform demonstration tests with several financial institutions and solve the problems necessary for their implementation in the real world.
Ciphertext storage can effectively solve the security problems in cloud storage, among which the ciphertext policy attribute-based encryption (CP-ABE) is more suitable for ciphertext access control in cloud storage environment for it can achieve one-to-many ciphertext sharing. The existing attribute encryption scheme CP-ABE has problems with revocation such as coarse granularity, untimeliness, and low efficiency, which cannot meet the demands of cloud storage. This paper proposes an RCP-ABE scheme that supports real-time revocable fine-grained attributes for the existing attribute revocable scheme, the scheme of this paper adopts the version control technology to realize the instant revocation of the attributes. In the key update mechanism, the subset coverage technology is used to update the key, which reduces the workload of the authority. The experimental analysis shows that RCP-ABE is more efficient than other schemes.
In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.
Cloud service providers offer a low-cost and convenient solution to host unstructured data. However, cloud services act as third-party solutions and do not provide control of the data to users. This has raised security and privacy concerns for many organizations (users) with sensitive data to utilize cloud-based solutions. User-side encryption can potentially address these concerns by establishing user-centric cloud services and granting data control to the user. Nonetheless, user-side encryption limits the ability to process (e.g., search) encrypted data on the cloud. Accordingly, in this research, we provide a framework that enables processing (in particular, searching) of encrypted multiorganizational (i.e., multi-source) big data without revealing the data to cloud provider. Our framework leverages locality feature of edge computing to offer a user-centric search ability in a realtime manner. In particular, the edge system intelligently predicts the user's search pattern and prunes the multi-source big data search space to reduce the search time. The pruning system is based on efficient sampling from the clustered big dataset on the cloud. For each cluster, the pruning system dynamically samples appropriate number of terms based on the user's search tendency, so that the cluster is optimally represented. We developed a prototype of a user-centric search system and evaluated it against multiple datasets. Experimental results demonstrate 27% improvement in the pruning quality and search accuracy.
The disclosure of an important yet sensitive link may cause serious privacy crisis between two users of a social graph. Only deleting the sensitive link referred to as a target link which is often the attacked target of adversaries is not enough, because the adversarial link prediction can deeply forecast the existence of the missing target link. Thus, to defend some specific adversarial link prediction, a budget limited number of other non-target links should be optimally removed. We first propose a path-based dissimilarity function as the optimizing objective and prove that the greedy link deletion to preserve target link privacy referred to as the GLD2Privacy which has monotonicity and submodularity properties can achieve a near optimal solution. However, emulating all length limited paths between any pair of nodes for GLD2Privacy mechanism is impossible in large scale social graphs. Secondly, we propose a Walk2Privacy mechanism that uses self-avoiding random walk which can efficiently run in large scale graphs to sample the paths of given lengths between the two ends of any missing target link, and based on the sampled paths we select the alternative non-target links being deleted for privacy purpose. Finally, we compose experiments to demonstrate that the Walk2Privacy algorithm can remarkably reduce the time consumption and achieve a very near solution that is achieved by the GLD2Privacy.
The threat of cybercrime is becoming increasingly complex and diverse on putting citizen's data or money in danger. Cybercrime threats are often originating from trusted, malicious, or negligent insiders, who have excessive access privileges to sensitive data. The analysis of cybercrime insider investigation presents many opportunities for actionable intelligence on improving the quality and value of digital evidence. There are several advantages of applying Deep Packet Inspection (DPI) methods in cybercrime insider investigation. This paper introduces DPI method that can help investigators in developing new techniques and performing digital investigation process in forensically sound and timely fashion manner. This paper provides a survey of the packet inspection, which can be applied to cybercrime insider investigation.
Withgrowing times and technology, and the data related to it is increasing on daily basis and so is the daunting task to manage it. The present solution to this problem i.e our present databases, are not the long-term solutions. These data volumes need to be stored safely and retrieved safely to use. This paper presents an overview of security issues for big data. Big Data encompasses data configuration, distribution and analysis of the data that overcome the drawbacks of traditional data processing technology. Big data manages, stores and acquires data in a speedy and cost-effective manner with the help of tools, technologies and frameworks.
In order to study the application of improved image hashing algorithm in image tampering detection, based on compressed sensing and ring segmentation, a new image hashing technique is studied. The image hash algorithm based on compressed sensing and ring segmentation is proposed. First, the algorithm preprocesses the input image. Then, the ring segment is used to extract the set of pixels in each ring region. These aggregate data are separately performed compressed sensing measurements. Finally, the hash value is constructed by calculating the inner product of the measurement vector and the random vector. The results show that the algorithm has good perceived robustness, uniqueness and security. Finally, the ROC curve is used to analyze the classification performance. The comparison of ROC curves shows that the performance of the proposed algorithm is better than FM-CS, GF-LVQ and RT-DCT.
Tele-radiology is a technology that helps in bringing the communication between the radiologist, patients and healthcare units situated at distant places. This involves exchange of medical centric data. The medical data may be stored as Electronic Health Records (EHR). These EHRs contain X-Rays, CT scans, MRI reports. Hundreds of scans across multiple radiology centers lead to medical big data (MBD). Healthcare Cloud can be used to handle MBD. Since lack of security to EHRs can cause havoc in medical IT, healthcare cloud must be secure. It should ensure secure sharing and storage of EHRs. This paper proposes the application of decoy technique to provide security to EHRs. The EHRs have the risk of internal attacks and external intrusion. This work addresses and handles internal attacks. It also involves study on honey-pots and intrusion detection techniques. Further it identifies the possibility of an intrusion and alerts the administrator. Also the details of intrusions are logged.