Visible to the public Biblio

Filters: Keyword is big data privacy  [Clear All Filters]
2020-08-07
Dilmaghani, Saharnaz, Brust, Matthias R., Danoy, Grégoire, Cassagnes, Natalia, Pecero, Johnatan, Bouvry, Pascal.  2019.  Privacy and Security of Big Data in AI Systems: A Research and Standards Perspective. 2019 IEEE International Conference on Big Data (Big Data). :5737—5743.

The huge volume, variety, and velocity of big data have empowered Machine Learning (ML) techniques and Artificial Intelligence (AI) systems. However, the vast portion of data used to train AI systems is sensitive information. Hence, any vulnerability has a potentially disastrous impact on privacy aspects and security issues. Nevertheless, the increased demands for high-quality AI from governments and companies require the utilization of big data in the systems. Several studies have highlighted the threats of big data on different platforms and the countermeasures to reduce the risks caused by attacks. In this paper, we provide an overview of the existing threats which violate privacy aspects and security issues inflicted by big data as a primary driving force within the AI/ML workflow. We define an adversarial model to investigate the attacks. Additionally, we analyze and summarize the defense strategies and countermeasures of these attacks. Furthermore, due to the impact of AI systems in the market and the vast majority of business sectors, we also investigate Standards Developing Organizations (SDOs) that are actively involved in providing guidelines to protect the privacy and ensure the security of big data and AI systems. Our far-reaching goal is to bridge the research and standardization frame to increase the consistency and efficiency of AI systems developments guaranteeing customer satisfaction while transferring a high degree of trustworthiness.

Mehta, Brijesh B., Gupta, Ruchika, Rao, Udai Pratap, Muthiyan, Mukesh.  2019.  A Scalable (\$\textbackslashtextbackslashalpha, k\$)-Anonymization Approach using MapReduce for Privacy Preserving Big Data Publishing. 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1—6.
Different tools and sources are used to collect big data, which may create privacy issues. k-anonymity, l-diversity, t-closeness etc. privacy preserving data publishing approaches are used data de-identification, but as multiple sources is used to collect the data, chance of re-identification is very high. Anonymization large data is not a trivial task, hence, privacy preserving approaches scalability has become a challenging research area. Researchers explore it by proposing algorithms for scalable anonymization. We further found that in some scenarios efficient anonymization is not enough, timely anonymization is also required. Hence, to incorporate the velocity of data with Scalable k-Anonymization (SKA) approach, we propose a novel approach, Scalable ( α, k)-Anonymization (SAKA). Our proposed approach outperforms in terms of information loss and running time as compared to existing approaches. With best of our knowledge, this is the first proposed scalable anonymization approach for the velocity of data.
2020-07-13
Andrew, J., Karthikeyan, J., Jebastin, Jeffy.  2019.  Privacy Preserving Big Data Publication On Cloud Using Mondrian Anonymization Techniques and Deep Neural Networks. 2019 5th International Conference on Advanced Computing Communication Systems (ICACCS). :722–727.

In recent trends, privacy preservation is the most predominant factor, on big data analytics and cloud computing. Every organization collects personal data from the users actively or passively. Publishing this data for research and other analytics without removing Personally Identifiable Information (PII) will lead to the privacy breach. Existing anonymization techniques are failing to maintain the balance between data privacy and data utility. In order to provide a trade-off between the privacy of the users and data utility, a Mondrian based k-anonymity approach is proposed. To protect the privacy of high-dimensional data Deep Neural Network (DNN) based framework is proposed. The experimental result shows that the proposed approach mitigates the information loss of the data without compromising privacy.

Mahmood, Shah.  2019.  The Anti-Data-Mining (ADM) Framework - Better Privacy on Online Social Networks and Beyond. 2019 IEEE International Conference on Big Data (Big Data). :5780–5788.
The unprecedented and enormous growth of cloud computing, especially online social networks, has resulted in numerous incidents of the loss of users' privacy. In this paper, we provide a framework, based on our anti-data-mining (ADM) principle, to enhance users' privacy against adversaries including: online social networks; search engines; financial terminal providers; ad networks; eavesdropping governments; and other parties who can monitor users' content from the point where the content leaves users' computers to within the data centers of these information accumulators. To achieve this goal, our framework proactively uses the principles of suppression of sensitive data and disinformation. Moreover, we use social-bots in a novel way for enhanced privacy and provide users' with plausible deniability for their photos, audio, and video content uploaded online.
2020-01-06
Mo, Ran, Liu, Jianfeng, Yu, Wentao, Jiang, Fu, Gu, Xin, Zhao, Xiaoshuai, Liu, Weirong, Peng, Jun.  2019.  A Differential Privacy-Based Protecting Data Preprocessing Method for Big Data Mining. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :693–699.

Analyzing clustering results may lead to the privacy disclosure issue in big data mining. In this paper, we put forward a differential privacy-based protecting data preprocessing method for distance-based clustering. Firstly, the data distortion technique differential privacy is used to prevent the distances in distance-based clustering from disclosing the relationships. Differential privacy may affect the clustering results while protecting privacy. Then an adaptive privacy budget parameter adjustment mechanism is applied for keeping the balance between the privacy protection and the clustering results. By solving the maximum and minimum problems, the differential privacy budget parameter can be obtained for different clustering algorithms. Finally, we conduct extensive experiments to evaluate the performance of our proposed method. The results demonstrate that our method can provide privacy protection with precise clustering results.

2019-03-06
Suwansrikham, P., She, K..  2018.  Asymmetric Secure Storage Scheme for Big Data on Multiple Cloud Providers. 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). :121-125.

Recently, cloud computing is an emerging technology along with big data. Both technologies come together. Due to the enormous size of data in big data, it is impossible to store them in local storage. Alternatively, even we want to store them locally, we have to spend much money to create bit data center. One way to save money is store big data in cloud storage service. Cloud storage service provides users space and security to store the file. However, relying on single cloud storage may cause trouble for the customer. CSP may stop its service anytime. It is too risky if data owner hosts his file only single CSP. Also, the CSP is the third party that user have to trust without verification. After deploying his file to CSP, the user does not know who access his file. Even CSP provides a security mechanism to prevent outsider attack. However, how user ensure that there is no insider attack to steal or corrupt the file. This research proposes the way to minimize the risk, ensure data privacy, also accessing control. The big data file is split into chunks and distributed to multiple cloud storage provider. Even there is insider attack; the attacker gets only part of the file. He cannot reconstruct the whole file. After splitting the file, metadata is generated. Metadata is a place to keep chunk information, includes, chunk locations, access path, username and password of data owner to connect each CSP. Asymmetric security concept is applied to this research. The metadata will be encrypted and transfer to the user who requests to access the file. The file accessing, monitoring, metadata transferring is functions of dew computing which is an intermediate server between the users and cloud service.

Mito, M., Murata, K., Eguchi, D., Mori, Y., Toyonaga, M..  2018.  A Data Reconstruction Method for The Big-Data Analysis. 2018 9th International Conference on Awareness Science and Technology (iCAST). :319-323.
In recent years, the big-data approach has become important within various business operations and sales judgment tactics. Contrarily, numerous privacy problems limit the progress of their analysis technologies. To mitigate such problems, this paper proposes several privacy-preserving methods, i.e., anonymization, extreme value record elimination, fully encrypted analysis, and so on. However, privacy-cracking fears still remain that prevent the open use of big-data by other, external organizations. We propose a big-data reconstruction method that does not intrinsically use privacy data. The method uses only the statistical features of big-data, i.e., its attribute histograms and their correlation coefficients. To verify whether valuable information can be extracted using this method, we evaluate the data by using Self Organizing Map (SOM) as one of the big-data analysis tools. The results show that the same pieces of information are extracted from our data and the big-data.
Leung, C. K., Hoi, C. S. H., Pazdor, A. G. M., Wodi, B. H., Cuzzocrea, A..  2018.  Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data. 2018 IEEE International Conference on Big Data (Big Data). :5101-5110.
As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data. In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present- in this paper-a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.
Cuzzocrea, A., Damiani, E..  2018.  Pedigree-Ing Your Big Data: Data-Driven Big Data Privacy in Distributed Environments. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). :675-681.
This paper introduces a general framework for supporting data-driven privacy-preserving big data management in distributed environments, such as emerging Cloud settings. The proposed framework can be viewed as an alternative to classical approaches where the privacy of big data is ensured via security-inspired protocols that check several (protocol) layers in order to achieve the desired privacy. Unfortunately, this injects considerable computational overheads in the overall process, thus introducing relevant challenges to be considered. Our approach instead tries to recognize the "pedigree" of suitable summary data representatives computed on top of the target big data repositories, hence avoiding computational overheads due to protocol checking. We also provide a relevant realization of the framework above, the so-called Data-dRIven aggregate-PROvenance privacypreserving big Multidimensional data (DRIPROM) framework, which specifically considers multidimensional data as the case of interest.
Khan, Latifur.  2018.  Big IoT Data Stream Analytics with Issues in Privacy and Security. Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. :22-22.
Internet of Things (IoT) Devices are monitoring and controlling systems that interact with the physical world by collecting, processing and transmitting data using the internet. IoT devices include home automation systems, smart grid, transportation systems, medical devices, building controls, manufacturing and industrial control systems. With the increase in deployment of IoT devices, there will be a corresponding increase in the amount of data generated by these devices, therefore, resulting in the need of large scale data processing systems to process and extract information for efficient and impactful decision making that will improve quality of living.
El Haourani, Lamia, Elkalam, Anas Abou, Ouahman, Abdelah Ait.  2018.  Knowledge Based Access Control a Model for Security and Privacy in the Big Data. Proceedings of the 3rd International Conference on Smart City Applications. :16:1-16:8.
The most popular features of Big Data revolve around the so-called "3V" criterion: Volume, Variety and Velocity. Big Data is based on the massive collection and in-depth analysis of personal data, with a view to profiling, or even marketing and commercialization, thus violating citizens' privacy and the security of their data. In this article we discuss security and privacy solutions in the context of Big Data. We then focus on access control and present our new model called Knowledge-based Access Control (KBAC); this strengthens the access control already deployed in the target company (e.g., based on "RBAC" role or "ABAC" attributes for example) by adding a semantic access control layer. KBAC offers thinner access control, tailored to Big Data, with effective protection against intrusion attempts and unauthorized data inferences.
Yan, Li, Hao, Xiaowei, Cheng, Zelei, Zhou, Rui.  2018.  Cloud Computing Security and Privacy. Proceedings of the 2018 International Conference on Big Data and Computing. :119-123.
Cloud computing is an emerging technology that can provide organizations, enterprises and governments with cheaper, more convenient and larger scale computing resources. However, cloud computing will bring potential risks and threats, especially on security and privacy. We make a survey on potential threats and risks and existing solutions on cloud security and privacy. We also put forward some problems to be addressed to provide a secure cloud computing environment.
Guerriero, Michele, Tamburri, Damian Andrew, Di Nitto, Elisabetta.  2018.  Defining, Enforcing and Checking Privacy Policies in Data-Intensive Applications. Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems. :172-182.
The rise of Big Data is leading to an increasing demand for large-scale data-intensive applications (DIAs), which have to analyse massive amounts of personal data (e.g. customers' location, cars' speed, people heartbeat, etc.), some of which can be sensitive, meaning that its confidentiality has to be protected. In this context, DIA providers are responsible for enforcing privacy policies that account for the privacy preferences of data subjects as well as for general privacy regulations. This is the case, for instance, of data brokers, i.e. companies that continuously collect and analyse data in order to provide useful analytics to their clients. Unfortunately, the enforcement of privacy policies in modern DIAs tends to become cumbersome because (i) the number of policies can easily explode, depending on the number of data subjects, (ii) policy enforcement has to autonomously adapt to the application context, thus, requiring some non-trivial runtime reasoning, and (iii) designing and developing modern DIAs is complex per se. For the above reasons, we need specific design and runtime methods enabling so called privacy-by-design in a Big Data context. In this article we propose an approach for specifying, enforcing and checking privacy policies on DIAs designed according to the Google Dataflow model and we show that the enforcement approach behaves correctly in the considered cases and introduces a performance overhead that is acceptable given the requirements of a typical DIA.
Gursoy, Mehmet Emre, Liu, Ling, Truex, Stacey, Yu, Lei, Wei, Wenqi.  2018.  Utility-Aware Synthesis of Differentially Private and Attack-Resilient Location Traces. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. :196-211.
As mobile devices and location-based services become increasingly ubiquitous, the privacy of mobile users' location traces continues to be a major concern. Traditional privacy solutions rely on perturbing each position in a user's trace and replacing it with a fake location. However, recent studies have shown that such point-based perturbation of locations is susceptible to inference attacks and suffers from serious utility losses, because it disregards the moving trajectory and continuity in full location traces. In this paper, we argue that privacy-preserving synthesis of complete location traces can be an effective solution to this problem. We present AdaTrace, a scalable location trace synthesizer with three novel features: provable statistical privacy, deterministic attack resilience, and strong utility preservation. AdaTrace builds a generative model from a given set of real traces through a four-phase synthesis process consisting of feature extraction, synopsis learning, privacy and utility preserving noise injection, and generation of differentially private synthetic location traces. The output traces crafted by AdaTrace preserve utility-critical information existing in real traces, and are robust against known location trace attacks. We validate the effectiveness of AdaTrace by comparing it with three state of the art approaches (ngram, DPT, and SGLT) using real location trace datasets (Geolife and Taxi) as well as a simulated dataset of 50,000 vehicles in Oldenburg, Germany. AdaTrace offers up to 3-fold improvement in trajectory utility, and is orders of magnitude faster than previous work, while preserving differential privacy and attack resilience.
Colombo, Pietro, Ferrari, Elena.  2018.  Access Control in the Era of Big Data: State of the Art and Research Directions. Proceedings of the 23Nd ACM on Symposium on Access Control Models and Technologies. :185-192.
Data security and privacy issues are magnified by the volume, the variety, and the velocity of Big Data and by the lack, up to now, of a standard data model and related data manipulation language. In this paper, we focus on one of the key data security services, that is, access control, by highlighting the differences with traditional data management systems and describing a set of requirements that any access control solution for Big Data platforms may fulfill. We then describe the state of the art and discuss open research issues.
2018-02-06
Mehrpouyan, H., Azpiazu, I. M., Pera, M. S..  2017.  Measuring Personality for Automatic Elicitation of Privacy Preferences. 2017 IEEE Symposium on Privacy-Aware Computing (PAC). :84–95.

The increasing complexity and ubiquity in user connectivity, computing environments, information content, and software, mobile, and web applications transfers the responsibility of privacy management to the individuals. Hence, making it extremely difficult for users to maintain the intelligent and targeted level of privacy protection that they need and desire, while simultaneously maintaining their ability to optimally function. Thus, there is a critical need to develop intelligent, automated, and adaptable privacy management systems that can assist users in managing and protecting their sensitive data in the increasingly complex situations and environments that they find themselves in. This work is a first step in exploring the development of such a system, specifically how user personality traits and other characteristics can be used to help automate determination of user sharing preferences for a variety of user data and situations. The Big-Five personality traits of openness, conscientiousness, extroversion, agreeableness, and neuroticism are examined and used as inputs into several popular machine learning algorithms in order to assess their ability to elicit and predict user privacy preferences. Our results show that the Big-Five personality traits can be used to significantly improve the prediction of user privacy preferences in a number of contexts and situations, and so using machine learning approaches to automate the setting of user privacy preferences has the potential to greatly reduce the burden on users while simultaneously improving the accuracy of their privacy preferences and security.

Wang, Y., Rawal, B., Duan, Q..  2017.  Securing Big Data in the Cloud with Integrated Auditing. 2017 IEEE International Conference on Smart Cloud (SmartCloud). :126–131.

In this paper, we review big data characteristics and security challenges in the cloud and visit different cloud domains and security regulations. We propose using integrated auditing for secure data storage and transaction logs, real-time compliance and security monitoring, regulatory compliance, data environment, identity and access management, infrastructure auditing, availability, privacy, legality, cyber threats, and granular auditing to achieve big data security. We apply a stochastic process model to conduct security analyses in availability and mean time to security failure. Potential future works are also discussed.

Heifetz, A., Mugunthan, V., Kagal, L..  2017.  Shade: A Differentially-Private Wrapper for Enterprise Big Data. 2017 IEEE International Conference on Big Data (Big Data). :1033–1042.

Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.

Palanisamy, B., Li, C., Krishnamurthy, P..  2017.  Group Privacy-Aware Disclosure of Association Graph Data. 2017 IEEE International Conference on Big Data (Big Data). :1043–1052.

In the age of Big Data, we are witnessing a huge proliferation of digital data capturing our lives and our surroundings. Data privacy is a critical barrier to data analytics and privacy-preserving data disclosure becomes a key aspect to leveraging large-scale data analytics due to serious privacy risks. Traditional privacy-preserving data publishing solutions have focused on protecting individual's private information while considering all aggregate information about individuals as safe for disclosure. This paper presents a new privacy-aware data disclosure scheme that considers group privacy requirements of individuals in bipartite association graph datasets (e.g., graphs that represent associations between entities such as customers and products bought from a pharmacy store) where even aggregate information about groups of individuals may be sensitive and need protection. We propose the notion of $ε$g-Group Differential Privacy that protects sensitive information of groups of individuals at various defined group protection levels, enabling data users to obtain the level of information entitled to them. Based on the notion of group privacy, we develop a suite of differentially private mechanisms that protect group privacy in bipartite association graphs at different group privacy levels based on specialization hierarchies. We evaluate our proposed techniques through extensive experiments on three real-world association graph datasets and our results demonstrate that the proposed techniques are effective, efficient and provide the required guarantees on group privacy.

Shi, Y., Piao, C., Zheng, L..  2017.  Differential-Privacy-Based Correlation Analysis in Railway Freight Service Applications. 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). :35–39.

With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.

Guan, Z., Si, G., Du, X., Liu, P., Zhang, Z., Zhou, Z..  2017.  Protecting User Privacy Based on Secret Sharing with Fault Tolerance for Big Data in Smart Grid. 2017 IEEE International Conference on Communications (ICC). :1–6.

In smart grid, large quantities of data is collected from various applications, such as smart metering substation state monitoring, electric energy data acquisition, and smart home. Big data acquired in smart grid applications is usually sensitive. For instance, in order to dispatch accurately and support the dynamic price, lots of smart meters are installed at user's house to collect the real-time data, but all these collected data are related to user privacy. In this paper, we propose a data aggregation scheme based on secret sharing with fault tolerance in smart grid, which ensures that control center gets the integrated data without revealing user's privacy. Meanwhile, we also consider fault tolerance during the data aggregation. At last, we analyze the security of our scheme and carry out experiments to validate the results.

Zebboudj, S., Brahami, R., Mouzaia, C., Abbas, C., Boussaid, N., Omar, M..  2017.  Big Data Source Location Privacy and Access Control in the Framework of IoT. 2017 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B). :1–5.

In the recent years, we have observed the development of several connected and mobile devices intended for daily use. This development has come with many risks that might not be perceived by the users. These threats are compromising when an unauthorized entity has access to private big data generated through the user objects in the Internet of Things. In the literature, many solutions have been proposed in order to protect the big data, but the security remains a challenging issue. This work is carried out with the aim to provide a solution to the access control to the big data and securing the localization of their generator objects. The proposed models are based on Attribute Based Encryption, CHORD protocol and $μ$TESLA. Through simulations, we compare our solutions to concurrent protocols and we show its efficiency in terms of relevant criteria.

Nosouhi, M. R., Pham, V. V. H., Yu, S., Xiang, Y., Warren, M..  2017.  A Hybrid Location Privacy Protection Scheme in Big Data Environment. GLOBECOM 2017 - 2017 IEEE Global Communications Conference. :1–6.

Location privacy has become a significant challenge of big data. Particularly, by the advantage of big data handling tools availability, huge location data can be managed and processed easily by an adversary to obtain user private information from Location-Based Services (LBS). So far, many methods have been proposed to preserve user location privacy for these services. Among them, dummy-based methods have various advantages in terms of implementation and low computation costs. However, they suffer from the spatiotemporal correlation issue when users submit consecutive requests. To solve this problem, a practical hybrid location privacy protection scheme is presented in this paper. The proposed method filters out the correlated fake location data (dummies) before submissions. Therefore, the adversary can not identify the user's real location. Evaluations and experiments show that our proposed filtering technique significantly improves the performance of existing dummy-based methods and enables them to effectively protect the user's location privacy in the environment of big data.

Chen, D., Irwin, D..  2017.  Weatherman: Exposing Weather-Based Privacy Threats in Big Energy Data. 2017 IEEE International Conference on Big Data (Big Data). :1079–1086.

Smart energy meters record electricity consumption and generation at fine-grained intervals, and are among the most widely deployed sensors in the world. Energy data embeds detailed information about a building's energy-efficiency, as well as the behavior of its occupants, which academia and industry are actively working to extract. In many cases, either inadvertently or by design, these third-parties only have access to anonymous energy data without an associated location. The location of energy data is highly useful and highly sensitive information: it can provide important contextual information to improve big data analytics or interpret their results, but it can also enable third-parties to link private behavior derived from energy data with a particular location. In this paper, we present Weatherman, which leverages a suite of analytics techniques to localize the source of anonymous energy data. Our key insight is that energy consumption data, as well as wind and solar generation data, largely correlates with weather, e.g., temperature, wind speed, and cloud cover, and that every location on Earth has a distinct weather signature that uniquely identifies it. Weatherman represents a serious privacy threat, but also a potentially useful tool for researchers working with anonymous smart meter data. We evaluate Weatherman's potential in both areas by localizing data from over one hundred smart meters using a weather database that includes data from over 35,000 locations. Our results show that Weatherman localizes coarse (one-hour resolution) energy consumption, wind, and solar data to within 16.68km, 9.84km, and 5.12km, respectively, on average, which is more accurate using much coarser resolution data than prior work on localizing only anonymous solar data using solar signatures.

Hassoon, I. A., Tapus, N., Jasim, A. C..  2017.  Enhance Privacy in Big Data and Cloud via Diff-Anonym Algorithm. 2017 16th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1–5.

The main issue with big data in cloud is the processed or used always need to be by third party. It is very important for the owners of data or clients to trust and to have the guarantee of privacy for the information stored in cloud or analyzed as big data. The privacy models studied in previous research showed that privacy infringement for big data happened because of limitation, privacy guarantee rate or dissemination of accurate data which is obtainable in the data set. In addition, there are various privacy models. In order to determine the best and the most appropriate model to be applied in the future, which also guarantees big data privacy, it is necessary to invest in research and study. In the next part, we surfed some of the privacy models in order to determine the advantages and disadvantages of each model in privacy assurance for big data in cloud. The present study also proposes combined Diff-Anonym algorithm (K-anonymity and differential models) to provide data anonymity with guarantee to keep balance between ambiguity of private data and clarity of general data.