Biblio

List
Filter

Found 570 results

Filters: Keyword is Data models [Clear All Filters]

2021-03-01

Davis, B., Glenski, M., Sealy, W., Arendt, D.. 2020. Measure Utility, Gain Trust: Practical Advice for XAI Researchers. 2020 IEEE Workshop on TRust and EXpertise in Visual Analytics (TREX). :1–8.

Research into the explanation of machine learning models, i.e., explainable AI (XAI), has seen a commensurate exponential growth alongside deep artificial neural networks throughout the past decade. For historical reasons, explanation and trust have been intertwined. However, the focus on trust is too narrow, and has led the research community astray from tried and true empirical methods that produced more defensible scientific knowledge about people and explanations. To address this, we contribute a practical path forward for researchers in the XAI field. We recommend researchers focus on the utility of machine learning explanations instead of trust. We outline five broad use cases where explanations are useful and, for each, we describe pseudo-experiments that rely on objective empirical measurements and falsifiable hypotheses. We believe that this experimental rigor is necessary to contribute to scientific knowledge in the field of XAI.

Tao, J., Xiong, Y., Zhao, S., Xu, Y., Lin, J., Wu, R., Fan, C.. 2020. XAI-Driven Explainable Multi-view Game Cheating Detection. 2020 IEEE Conference on Games (CoG). :144–151.

Online gaming is one of the most successful applications having a large number of players interacting in an online persistent virtual world through the Internet. However, some cheating players gain improper advantages over normal players by using illegal automated plugins which has brought huge harm to game health and player enjoyment. Game industries have been devoting much efforts on cheating detection with multiview data sources and achieved great accuracy improvements by applying artificial intelligence (AI) techniques. However, generating explanations for cheating detection from multiple views still remains a challenging task. To respond to the different purposes of explainability in AI models from different audience profiles, we propose the EMGCD, the first explainable multi-view game cheating detection framework driven by explainable AI (XAI). It combines cheating explainers to cheating classifiers from different views to generate individual, local and global explanations which contributes to the evidence generation, reason generation, model debugging and model compression. The EMGCD has been implemented and deployed in multiple game productions in NetEase Games, achieving remarkable and trustworthy performance. Our framework can also easily generalize to other types of related tasks in online games, such as explainable recommender systems, explainable churn prediction, etc.

Kuppa, A., Le-Khac, N.-A.. 2020. Black Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

Cybersecurity community is slowly leveraging Machine Learning (ML) to combat ever evolving threats. One of the biggest drivers for successful adoption of these models is how well domain experts and users are able to understand and trust their functionality. As these black-box models are being employed to make important predictions, the demand for transparency and explainability is increasing from the stakeholders.Explanations supporting the output of ML models are crucial in cyber security, where experts require far more information from the model than a simple binary output for their analysis. Recent approaches in the literature have focused on three different areas: (a) creating and improving explainability methods which help users better understand the internal workings of ML models and their outputs; (b) attacks on interpreters in white box setting; (c) defining the exact properties and metrics of the explanations generated by models. However, they have not covered, the security properties and threat models relevant to cybersecurity domain, and attacks on explainable models in black box settings.In this paper, we bridge this gap by proposing a taxonomy for Explainable Artificial Intelligence (XAI) methods, covering various security properties and threat models relevant to cyber security domain. We design a novel black box attack for analyzing the consistency, correctness and confidence security properties of gradient based XAI methods. We validate our proposed system on 3 security-relevant data-sets and models, and demonstrate that the method achieves attacker's goal of misleading both the classifier and explanation report and, only explainability method without affecting the classifier output. Our evaluation of the proposed approach shows promising results and can help in designing secure and robust XAI methods.

2021-02-23

Park, S. H., Park, H. J., Choi, Y.. 2020. RNN-based Prediction for Network Intrusion Detection. 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). :572—574.

We investigate a prediction model using RNN for network intrusion detection in industrial IoT environments. For intrusion detection, we use anomaly detection methods that estimate the next packet, measure and score the distance measurement in real packets to distinguish whether it is a normal packet or an abnormal packet. When the packet was learned in the LSTM model, two-gram and sliding window of N-gram showed the best performance in terms of errors and the performance of the LSTM model was the highest compared with other data mining regression techniques. Finally, cosine similarity was used as a scoring function, and anomaly detection was performed by setting a boundary for cosine similarity that consider as normal packet.

2021-02-22

Afanasyev, A., Ramani, S. K.. 2020. NDNconf: Network Management Framework for Named Data Networking. 2020 IEEE International Conference on Communications Workshops (ICC Workshops). :1–6.

The rapid growth of the Internet is, in part, powered by the broad participation of numerous vendors building network components. All these network devices require that they be properly configured and maintained, which creates a challenge for system administrators of complex networks with a growing variety of heterogeneous devices. This challenge is true for today's networks, as well as for the networking architectures of the future, such as Named Data Networking (NDN). This paper gives a preliminary design of an NDNconf framework, an adaptation of a recently developed NETCONF protocol, to realize unified configuration and management for NDN. The presented design is built leveraging the benefits provided by NDN, including the structured naming shared among network and application layers, stateful data retrieval with name-based interest forwarding, in-network caching, data-centric security model, and others. Specifically, the configuration data models, the heart of NDNconf, the elements of the models and models themselves are represented as secured NDN data, allowing fetching models, fetching configuration data that correspond to elements of the model, and issuing commands using the standard Interest-Data exchanges. On top of that, the security of models, data, and commands are realized through native data-centric NDN mechanisms, providing highly secure systems with high granularity of control.

2021-02-16

Liu, F., Eugenio, E., Jin, I. H., Bowen, C.. 2020. Differentially Private Generation of Social Networks via Exponential Random Graph Models. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1695—1700.

Many social networks contain sensitive relational information. One approach to protect the sensitive relational information while offering flexibility for social network research and analysis is to release synthetic social networks at a pre-specified privacy risk level, given the original observed network. We propose the DP-ERGM procedure that synthesizes networks that satisfy the differential privacy (DP) via the exponential random graph model (EGRM). We apply DP-ERGM to a college student friendship network and compare its original network information preservation in the generated private networks with two other approaches: differentially private DyadWise Randomized Response (DWRR) and Sanitization of the Conditional probability of Edge given Attribute classes (SCEA). The results suggest that DP-EGRM preserves the original information significantly better than DWRR and SCEA in both network statistics and inferences from ERGMs and latent space models. In addition, DP-ERGM satisfies the node DP, a stronger notion of privacy than the edge DP that DWRR and SCEA satisfy.

Wu, J. M.-T., Srivastava, G., Pirouz, M., Lin, J. C.-W.. 2020. A GA-based Data Sanitization for Hiding Sensitive Information with Multi-Thresholds Constraint. 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). :29—34.

In this work, we propose a new concept of multiple support thresholds to sanitize the database for specific sensitive itemsets. The proposed method assigns a stricter threshold to the sensitive itemset for data sanitization. Furthermore, a genetic-algorithm (GA)-based model is involved in the designed algorithm to minimize side effects. In our experimental results, the GA-based PPDM approach is compared with traditional compact GA-based model and results clearly showed that our proposed method can obtain better performance with less computational cost.

Mujib, M., Sari, R. F.. 2020. Performance Evaluation of Data Center Network with Network Micro-segmentation. 2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE). :27—32.

Research on the design of data center infrastructure is increasing, both from academia and industry, due to the rapid development of cloud-based applications such as search engines, social networks, and large-scale computing. On a large scale, data centers can consist of hundreds to thousands of servers that require systems with high-performance requirements and low downtime. To meet the network's needs in a dynamic data center, infrastructure of applications and services are growing. It takes a process of designing a network topology so that it can guarantee availability and security. One way to surmount this is by implementing the zero trust security model based on micro-segmentation. Zero trust is a security idea based on the principle of "never trust, always verify" in which no concepts of trust and untrust in network traffic. The zero trust security model implemented network traffic in the form of untrust. Micro-segmentation is a way to achieve zero trust by dividing a network into smaller logical segments to restrict the traffic. In this research, data center network performance based on software-defined networking with zero trust security model using micro-segmentation has been evaluated using a testbed simulation of Cisco Application Centric Infrastructure by measuring the round trip time, jitter, and packet loss during experiments. Performance evaluation results show that micro-segmentation adds an average round trip time of 4 μs and jitter of 11 μs without packet loss so that the security can be improved without significantly affecting network performance on the data center.

2021-02-10

Lei, L., Chen, M., He, C., Li, D.. 2020. XSS Detection Technology Based on LSTM-Attention. 2020 5th International Conference on Control, Robotics and Cybernetics (CRC). :175—180.

Cross-site scripting (XSS) is one of the main threats of Web applications, which has great harm. How to effectively detect and defend against XSS attacks has become more and more important. Due to the malicious obfuscation of attack codes and the gradual increase in number, the traditional XSS detection methods have some defects such as poor recognition of malicious attack codes, inadequate feature extraction and low efficiency. Therefore, we present a novel approach to detect XSS attacks based on the attention mechanism of Long Short-Term Memory (LSTM) recurrent neural network. First of all, the data need to be preprocessed, we used decoding technology to restore the XSS codes to the unencoded state for improving the readability of the code, then we used word2vec to extract XSS payload features and map them to feature vectors. And then, we improved the LSTM model by adding attention mechanism, the LSTM-Attention detection model was designed to train and test the data. We used the ability of LSTM model to extract context-related features for deep learning, the added attention mechanism made the model extract more effective features. Finally, we used the classifier to classify the abstract features. Experimental results show that the proposed XSS detection model based on LSTM-Attention achieves a precision rate of 99.3% and a recall rate of 98.2% in the actually collected dataset. Compared with traditional machine learning methods and other deep learning methods, this method can more effectively identify XSS attacks.

2021-02-08

Zhang, J.. 2020. DeepMal: A CNN-LSTM Model for Malware Detection Based on Dynamic Semantic Behaviours. 2020 International Conference on Computer Information and Big Data Applications (CIBDA). :313–316.

Malware refers to any software accessing or being installed in a system without the authorisation of administrators. Various malware has been widely used for cyber-criminals to accomplish their evil intentions and goals. To combat the increasing amount and reduce the threat of malicious programs, a novel deep learning framework, which uses NLP techniques for reference, combines CNN and LSTM neurones to capture the locally spatial correlations and learn from sequential longterm dependency is proposed. Hence, high-level abstractions and representations are automatically extracted for the malware classification task. The classification accuracy improves from 0.81 (best one by Random Forest) to approximately 1.0.

Pelissero, N., Laso, P. M., Puentes, J.. 2020. Naval cyber-physical anomaly propagation analysis based on a quality assessed graph. 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA). :1–8.

As any other infrastructure relying on cyber-physical systems (CPS), naval CPS are highly interconnected and collect considerable data streams, on which depend multiple command and navigation decisions. Being a data-driven decision system requiring optimized supervisory control on a permanent basis, it is critical to examine the CPS vulnerability to anomalies and their propagation. This paper presents an approach to detect CPS anomalies and estimate their propagation applying a quality assessed graph, which represents the CPS physical and digital subsystems, combined with system variables dependencies and a set of data and information quality measures vectors. Following the identification of variables dependencies and high-risk nodes in the CPS, data and information quality measures reveal how system variables are modified when an anomaly is detected, also indicating its propagation path. Taking as reference the normal state of a naval propulsion management system, four anomalies in the form of cyber-attacks - port scan, programmable logical controller stop, and man in the middle to change the motor speed and operation of a tank valve - were produced. Three anomalies were properly detected and their propagation path identified. These results suggest the feasibility of anomaly detection and estimation of propagation estimation in CPS, applying data and information quality analysis to a system graph.

2021-02-01

Wickramasinghe, C. S., Marino, D. L., Grandio, J., Manic, M.. 2020. Trustworthy AI Development Guidelines for Human System Interaction. 2020 13th International Conference on Human System Interaction (HSI). :130–136.

Artificial Intelligence (AI) is influencing almost all areas of human life. Even though these AI-based systems frequently provide state-of-the-art performance, humans still hesitate to develop, deploy, and use AI systems. The main reason for this is the lack of trust in AI systems caused by the deficiency of transparency of existing AI systems. As a solution, “Trustworthy AI” research area merged with the goal of defining guidelines and frameworks for improving user trust in AI systems, allowing humans to use them without fear. While trust in AI is an active area of research, very little work exists where the focus is to build human trust to improve the interactions between human and AI systems. In this paper, we provide a concise survey on concepts of trustworthy AI. Further, we present trustworthy AI development guidelines for improving the user trust to enhance the interactions between AI systems and humans, that happen during the AI system life cycle.

2021-01-28

Wang, N., Song, H., Luo, T., Sun, J., Li, J.. 2020. Enhanced p-Sensitive k-Anonymity Models for Achieving Better Privacy. 2020 IEEE/CIC International Conference on Communications in China (ICCC). :148—153.

To our best knowledge, the p-sensitive k-anonymity model is a sophisticated model to resist linking attacks and homogeneous attacks in data publishing. However, if the distribution of sensitive values is skew, the model is difficult to defend against skew attacks and even faces sensitive attacks. In practice, the privacy requirements of different sensitive values are not always identical. The “one size fits all” unified privacy protection level may cause unnecessary information loss. To address these problems, the paper quantifies privacy requirements with the concept of IDF and concerns more about sensitive groups. Two enhanced anonymous models with personalized protection characteristic, that is, (p,αisg) -sensitive k-anonymity model and (pi,αisg)-sensitive k-anonymity model, are then proposed to resist skew attacks and sensitive attacks. Furthermore, two clustering algorithms with global search and local search are designed to implement our models. Experimental results show that the two enhanced models have outstanding advantages in better privacy at the expense of a little data utility.

Li, Y., Chen, J., Li, Q., Liu, A.. 2020. Differential Privacy Algorithm Based on Personalized Anonymity. 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). :260—267.

The existing anonymized differential privacy model adopts a unified anonymity method, ignoring the difference of personal privacy, which may lead to the problem of excessive or insufficient protection of the original data [1]. Therefore, this paper proposes a personalized k-anonymity model for tuples (PKA) and proposes a differential privacy data publishing algorithm (DPPA) based on personalized anonymity, firstly based on the tuple personality factor set by the user in the original data set. The values are classified and the corresponding privacy protection relevance is calculated. Then according to the tuple personality factor classification value, the data set is clustered by clustering method with different anonymity, and the quasi-identifier attribute of each cluster is aggregated and noise-added to realize anonymized differential privacy; finally merge the subset to get the data set that meets the release requirements. In this paper, the correctness of the algorithm is analyzed theoretically, and the feasibility and effectiveness of the proposed algorithm are verified by comparison with similar algorithms.

Kariyappa, S., Qureshi, M. K.. 2020. Defending Against Model Stealing Attacks With Adaptive Misinformation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :767—775.

Deep Neural Networks (DNNs) are susceptible to model stealing attacks, which allows a data-limited adversary with no knowledge of the training dataset to clone the functionality of a target model, just by using black-box query access. Such attacks are typically carried out by querying the target model using inputs that are synthetically generated or sampled from a surrogate dataset to construct a labeled dataset. The adversary can use this labeled dataset to train a clone model, which achieves a classification accuracy comparable to that of the target model. We propose "Adaptive Misinformation" to defend against such model stealing attacks. We identify that all existing model stealing attacks invariably query the target model with Out-Of-Distribution (OOD) inputs. By selectively sending incorrect predictions for OOD queries, our defense substantially degrades the accuracy of the attacker's clone model (by up to 40%), while minimally impacting the accuracy (\textbackslashtextless; 0.5%) for benign users. Compared to existing defenses, our defense has a significantly better security vs accuracy trade-off and incurs minimal computational overhead.

Drašar, M., Moskal, S., Yang, S., Zat'ko, P.. 2020. Session-level Adversary Intent-Driven Cyberattack Simulator. 2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT). :1—9.

Recognizing the need for proactive analysis of cyber adversary behavior, this paper presents a new event-driven simulation model and implementation to reveal the efforts needed by attackers who have various entry points into a network. Unlike previous models which focus on the impact of attackers' actions on the defender's infrastructure, this work focuses on the attackers' strategies and actions. By operating on a request-response session level, our model provides an abstraction of how the network infrastructure reacts to access credentials the adversary might have obtained through a variety of strategies. We present the current capabilities of the simulator by showing three variants of Bronze Butler APT on a network with different user access levels.

2021-01-22

Mani, G., Pasumarti, V., Bhargava, B., Vora, F. T., MacDonald, J., King, J., Kobes, J.. 2020. DeCrypto Pro: Deep Learning Based Cryptomining Malware Detection Using Performance Counters. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). :109—118.

Autonomy in cybersystems depends on their ability to be self-aware by understanding the intent of services and applications that are running on those systems. In case of mission-critical cybersystems that are deployed in dynamic and unpredictable environments, the newly integrated unknown applications or services can either be benign and essential for the mission or they can be cyberattacks. In some cases, these cyberattacks are evasive Advanced Persistent Threats (APTs) where the attackers remain undetected for reconnaissance in order to ascertain system features for an attack e.g. Trojan Laziok. In other cases, the attackers can use the system only for computing e.g. cryptomining malware. APTs such as cryptomining malware neither disrupt normal system functionalities nor trigger any warning signs because they simply perform bitwise and cryptographic operations as any other benign compression or encoding application. Thus, it is difficult for defense mechanisms such as antivirus applications to detect these attacks. In this paper, we propose an Operating Context profiling system based on deep neural networks-Long Short-Term Memory (LSTM) networks-using Windows Performance Counters data for detecting these evasive cryptomining applications. In addition, we propose Deep Cryptomining Profiler (DeCrypto Pro), a detection system with a novel model selection framework containing a utility function that can select a classification model for behavior profiling from both the light-weight machine learning models (Random Forest and k-Nearest Neighbors) and a deep learning model (LSTM), depending on available computing resources. Given data from performance counters, we show that individual models perform with high accuracy and can be trained with limited training data. We also show that the DeCrypto Profiler framework reduces the use of computational resources and accurately detects cryptomining applications by selecting an appropriate model, given the constraints such as data sample size and system configuration.

2021-01-20

Focardi, R., Luccio, F. L.. 2020. Automated Analysis of PUF-based Protocols. 2020 IEEE 33rd Computer Security Foundations Symposium (CSF). :304—317.

Physical Unclonable Functions (PUFs) are a promising technology to secure low-cost devices. A PUF is a function whose values depend on the physical characteristics of the underlying hardware: the same PUF implemented on two identical integrated circuits will return different values. Thus, a PUF can be used as a unique fingerprint identifying one specific physical device among (apparently) identical copies that run the same firmware on the same hardware. PUFs, however, are tricky to implement, and a number of attacks have been reported in the literature, often due to wrong assumptions about the provided security guarantees and/or the attacker model. In this paper, we present the first mechanized symbolic model for PUFs that allows for precisely reasoning about their security with respect to a variegate set of attackers. We consider mutual authentication protocols based on different kinds of PUFs and model attackers that are able to access PUF values stored on servers, abuse the PUF APIs, model the PUF behavior and exploit error correction data to reproduce the PUF values. We prove security properties and we formally specify the capabilities required by the attacker to break them. Our analysis points out various subtleties, and allows for a systematic comparison between different PUF-based protocols. The mechanized models are easily extensible and can be automatically checked with the Tamarin prover.

2021-01-11

Li, Y., Chang, T.-H., Chi, C.-Y.. 2020. Secure Federated Averaging Algorithm with Differential Privacy. 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). :1–6.

Federated learning (FL), as a recent advance of distributed machine learning, is capable of learning a model over the network without directly accessing the client's raw data. Nevertheless, the clients' sensitive information can still be exposed to adversaries via differential attacks on messages exchanged between the parameter server and clients. In this paper, we consider the widely used federating averaging (FedAvg) algorithm and propose to enhance the data privacy by the differential privacy (DP) technique, which obfuscates the exchanged messages by properly adding Gaussian noise. We analytically show that the proposed secure FedAvg algorithm maintains an O(l/T) convergence rate, where T is the total number of stochastic gradient descent (SGD) updates for local model parameters. Moreover, we demonstrate how various algorithm parameters can impact on the algorithm communication efficiency. Experiment results are presented to justify the obtained analytical results on the performance of the proposed algorithm in terms of testing accuracy.

Wu, N., Farokhi, F., Smith, D., Kaafar, M. A.. 2020. The Value of Collaboration in Convex Machine Learning with Differential Privacy. 2020 IEEE Symposium on Security and Privacy (SP). :304–317.

In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.

Lyu, L.. 2020. Lightweight Crypto-Assisted Distributed Differential Privacy for Privacy-Preserving Distributed Learning. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

The appearance of distributed learning allows multiple participants to collaboratively train a global model, where instead of directly releasing their private training data with the server, participants iteratively share their local model updates (parameters) with the server. However, recent attacks demonstrate that sharing local model updates is not sufficient to provide reasonable privacy guarantees, as local model updates may result in significant privacy leakage about local training data of participants. To address this issue, in this paper, we present an alternative approach that combines distributed differential privacy (DDP) with a three-layer encryption protocol to achieve a better privacy-utility tradeoff than the existing DP-based approaches. An unbiased encoding algorithm is proposed to cope with floating-point values, while largely reducing mean squared error due to rounding. Our approach dispenses with the need for any trusted server, and enables each party to add less noise to achieve the same privacy and similar utility guarantees as that of the centralized differential privacy. Preliminary analysis and performance evaluation confirm the effectiveness of our approach, which achieves significantly higher accuracy than that of local differential privacy approach, and comparable accuracy to the centralized differential privacy approach.

Zhang, H., Zhang, D., Chen, H., Xu, J.. 2020. Improving Efficiency of Pseudonym Revocation in VANET Using Cuckoo Filter. 2020 IEEE 20th International Conference on Communication Technology (ICCT). :763–769.

In VANETs, pseudonyms are often used to replace the identity of vehicles in communication. When vehicles drive out of the network or misbehave, their pseudonym certificates need to be revoked by the certificate authority (CA). The certificate revocation lists (CRLs) are usually used to store the revoked certificates before their expiration. However, using CRLs would incur additional storage, communication and computation overhead. Some existing schemes have proposed to use Bloom Filter to compress the original CRLs, but they are unable to delete the expired certificates and introduce the false positive problem. In this paper, we propose an improved pseudonym certificates revocation scheme, using Cuckoo Filter for compression to reduce the impact of these problems. In order to optimize deletion efficiency, we propose the concept of Certificate Expiration List (CEL) which can be implemented with priority queue. The experimental results show that our scheme can effectively reduce the storage and communication overhead of pseudonym certificates revocation, while retaining moderately low false positive rates. In addition, our scheme can also greatly improve the lookup performance on CRLs, and reduce the revocation operation costs by allowing deletion.

Cao, S., Zou, J., Du, X., Zhang, X.. 2020. A Successive Framework: Enabling Accurate Identification and Secure Storage for Data in Smart Grid. ICC 2020 - 2020 IEEE International Conference on Communications (ICC). :1–6.

Due to malicious eavesdropping, forgery as well as other risks, it is challenging to dispose and store collected power data from smart grid in secure manners. Blockchain technology has become a novel method to solve the above problems because of its de-centralization and tamper-proof characteristics. It is especially well known that data stored in blockchain cannot be changed, so it is vital to seek out perfect mechanisms to ensure that data are compliant with high quality (namely, accuracy of the power data) before being stored in blockchain. This will help avoid losses due to low-quality data modification or deletion as needed in smart grid. Thus, we apply the parallel vision theory on the identification of meter readings to realize accurate power data. A cloud-blockchain fusion model (CBFM) is proposed for the storage of accurate power data, allowing for secure conducting of flexible transactions. Only power data calculated by parallel visual system instead of image data collected originally via robot would be stored in blockchain. Hence, we define the quality assurance before data uploaded to blockchain and security guarantee after data stored in blockchain as a successive framework, which is a brand new solution to manage efficiency and security as a whole for power data and data alike in other scenes. Security analysis and performance evaluations are performed, which prove that CBFM is highly secure and efficient impressively.

2020-12-28

Lee, H., Cho, S., Seong, J., Lee, S., Lee, W.. 2020. De-identification and Privacy Issues on Bigdata Transformation. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). :514—519.

As the number of data in various industries and government sectors is growing exponentially, the `7V' concept of big data aims to create a new value by indiscriminately collecting and analyzing information from various fields. At the same time as the ecosystem of the ICT industry arrives, big data utilization is treatened by the privacy attacks such as infringement due to the large amount of data. To manage and sustain the controllable privacy level, there need some recommended de-identification techniques. This paper exploits those de-identification processes and three types of commonly used privacy models. Furthermore, this paper presents use cases which can be adopted those kinds of technologies and future development directions.

Yang, H., Huang, L., Luo, C., Yu, Q.. 2020. Research on Intelligent Security Protection of Privacy Data in Government Cyberspace. 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). :284—288.

Based on the analysis of the difficulties and pain points of privacy protection in the opening and sharing of government data, this paper proposes a new method for intelligent discovery and protection of structured and unstructured privacy data. Based on the improvement of the existing government data masking process, this method introduces the technologies of NLP and machine learning, studies the intelligent discovery of sensitive data, the automatic recommendation of masking algorithm and the full automatic execution following the improved masking process. In addition, the dynamic masking and static masking prototype with text and database as data source are designed and implemented with agent-based intelligent masking middleware. The results show that the recognition range and protection efficiency of government privacy data, especially government unstructured text have been significantly improved.