Visible to the public Biblio

Filters: Keyword is data publishing  [Clear All Filters]
2021-01-28
Wang, N., Song, H., Luo, T., Sun, J., Li, J..  2020.  Enhanced p-Sensitive k-Anonymity Models for Achieving Better Privacy. 2020 IEEE/CIC International Conference on Communications in China (ICCC). :148—153.

To our best knowledge, the p-sensitive k-anonymity model is a sophisticated model to resist linking attacks and homogeneous attacks in data publishing. However, if the distribution of sensitive values is skew, the model is difficult to defend against skew attacks and even faces sensitive attacks. In practice, the privacy requirements of different sensitive values are not always identical. The “one size fits all” unified privacy protection level may cause unnecessary information loss. To address these problems, the paper quantifies privacy requirements with the concept of IDF and concerns more about sensitive groups. Two enhanced anonymous models with personalized protection characteristic, that is, (p,αisg) -sensitive k-anonymity model and (pi,αisg)-sensitive k-anonymity model, are then proposed to resist skew attacks and sensitive attacks. Furthermore, two clustering algorithms with global search and local search are designed to implement our models. Experimental results show that the two enhanced models have outstanding advantages in better privacy at the expense of a little data utility.

2020-12-28
Tojiboev, R., Lee, W., Lee, C. C..  2020.  Adding Noise Trajectory for Providing Privacy in Data Publishing by Vectorization. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). :432—434.

Since trajectory data is widely collected and utilized for scientific research and business purpose, publishing trajectory without proper privacy-policy leads to an acute threat to individual data. Recently, several methods, i.e., k-anonymity, l-diversity, t-closeness have been studied, though they tend to protect by reducing data depends on a feature of each method. When a strong privacy protection is required, these methods have excessively reduced data utility that may affect the result of scientific research. In this research, we suggest a novel approach to tackle this existing dilemma via an adding noise trajectory on a vector-based grid environment.

2020-06-22
Gao, Ruichao, Ma, Xuebin.  2019.  Dynamic Data Publishing with Differential Privacy via Reinforcement Learning. 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 1:746–752.
Differential privacy, which is due to its rigorous mathematical proof and strong privacy guarantee, has become a standard for the release of statistics with privacy protection. Recently, a lot of dynamic data publishing algorithms based on differential privacy have been proposed, but most of the algorithms use a native method to allocate the privacy budget. That is, the limited privacy budget is allocated to each time point uniformly, which may result in the privacy budget being unreasonably utilized and reducing the utility of data. In order to make full use of the limited privacy budget in the dynamic data publishing and improve the utility of data publishing, we propose a dynamic data publishing algorithm based on reinforcement learning in this paper. The algorithm consists of two parts: privacy budget allocation and data release. In the privacy budget allocation phase, we combine the idea of reinforcement learning and the changing characteristics of dynamic data, and establish a reinforcement learning model for the allocation of privacy budget. Finally, the algorithm finds a reasonable privacy budget allocation scheme to publish dynamic data. In the data release phase, we also propose a new dynamic data publishing strategy to publish data after the privacy budget is exhausted. Extensive experiments on real datasets demonstrate that our algorithm can allocate the privacy budget reasonably and improve the utility of dynamic data publishing.
2020-04-20
Esquivel-Quiros, Luis Gustavo, Barrantes, Elena Gabriela, Darlington, Fernando Esponda.  2018.  Measuring data privacy preserving and machine learning. 2018 7th International Conference On Software Process Improvement (CIMPS). :85–94.

The increasing publication of large amounts of data, theoretically anonymous, can lead to a number of attacks on the privacy of people. The publication of sensitive data without exposing the data owners is generally not part of the software developers concerns. The regulations for the data privacy-preserving create an appropriate scenario to focus on privacy from the perspective of the use or data exploration that takes place in an organization. The increasing number of sanctions for privacy violations motivates the systematic comparison of three known machine learning algorithms in order to measure the usefulness of the data privacy preserving. The scope of the evaluation is extended by comparing them with a known privacy preservation metric. Different parameter scenarios and privacy levels are used. The use of publicly available implementations, the presentation of the methodology, explanation of the experiments and the analysis allow providing a framework of work on the problem of the preservation of privacy. Problems are shown in the measurement of the usefulness of the data and its relationship with the privacy preserving. The findings motivate the need to create optimized metrics on the privacy preferences of the owners of the data since the risks of predicting sensitive attributes by means of machine learning techniques are not usually eliminated. In addition, it is shown that there may be a hundred percent, but it cannot be measured. As well as ensuring adequate performance of machine learning models that are of interest to the organization that data publisher.

2018-04-02
Jia, J., Chen, L..  2017.  (L, m, d) \#x2014; Anonymity : A Resisting Similarity Attack Model for Multiple Sensitive Attributes. 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). :756–760.

Preserving privacy is extremely important in data publishing. The existing privacy-preserving models are mostly oriented to single sensitive attribute, can not be applied to multiple sensitive attributes situation. Moreover, they do not consider the semantic similarity between sensitive attribute values, and may be vulnerable to similarity attack. In this paper, we propose a (l, m, d)-anonymity model for multiple sensitive attributes similarity attack, where m is the dimension of the sensitive attributes. This model uses the semantic hierarchical tree to analyze and compute the semantic dissimilarity between sensitive attribute values, and each equivalence class must exist at least l sensitive attribute values that satisfy d-different on each dimension sensitive attribute. Meanwhile, in order to make the published data highly available, our model adopts the distance-based measurement method to divide the equivalence class. We carry out extensive experiments to certify the (1, m, d)-anonymity model can significantly reduce the probability of sensitive information leakage and protect individual privacy more effectively.

2015-05-05
Lei Xu, Chunxiao Jiang, Jian Wang, Jian Yuan, Yong Ren.  2014.  Information Security in Big Data: Privacy and Data Mining. Access, IEEE. 2:1149-1176.

The growing popularity and development of data mining technologies bring serious threat to the security of individual,'s sensitive information. An emerging research topic in data mining, known as privacy-preserving data mining (PPDM), has been extensively studied in recent years. The basic idea of PPDM is to modify the data in such a way so as to perform data mining algorithms effectively without compromising the security of sensitive information contained in the data. Current studies of PPDM mainly focus on how to reduce the privacy risk brought by data mining operations, while in fact, unwanted disclosure of sensitive information may also happen in the process of data collecting, data publishing, and information (i.e., the data mining results) delivering. In this paper, we view the privacy issues related to data mining from a wider perspective and investigate various approaches that can help to protect sensitive information. In particular, we identify four different types of users involved in data mining applications, namely, data provider, data collector, data miner, and decision maker. For each type of user, we discuss his privacy concerns and the methods that can be adopted to protect sensitive information. We briefly introduce the basics of related research topics, review state-of-the-art approaches, and present some preliminary thoughts on future research directions. Besides exploring the privacy-preserving approaches for each type of user, we also review the game theoretical approaches, which are proposed for analyzing the interactions among different users in a data mining scenario, each of whom has his own valuation on the sensitive information. By differentiating the responsibilities of different users with respect to security of sensitive information, we would like to provide some useful insights into the study of PPDM.