Biblio
We regularly use communication apps like Facebook and WhatsApp on our smartphones, and the exchange of media, particularly images, has grown at an exponential rate. There are over 3 billion images shared every day on Whatsapp alone. In such a scenario, the management of images on a mobile device has become highly inefficient, and this leads to problems like low storage, manual deletion of images, disorganization etc. In this paper, we present a solution to tackle these issues by automatically classifying every image on a smartphone into a set of predefined categories, thereby segregating spam images from them, allowing the user to delete them seamlessly.
Nowadays, the Internet is developed, so that the requirements for on- and offline data storage have increased. Large storage IT projects, are related to large costs and high level of business risk. A storage service provider (SSP) provides computer storage space and management. In addition to that, it offers also back-up and archiving. Despite this, many companies fears security, privacy and integrity of outsourced data. As a solution, File Assured Deletion (FADE) is a system built upon standard cryptographic issues. It aims to guarantee their privacy and integrity, and most importantly, assuredly deleted files to make them unrecoverable to anybody (including those who manage the cloud storage) upon revocations of file access policies, by encrypting outsourced data files. Unfortunately, This system remains weak, in case the key manager's security is compromised. Our work provides a new scheme that aims to improve the security of FADE by using the TPM (Trusted Platform Module) that stores safely keys, passwords and digital certificates.
In the existing remote data integrity checking schemes, dynamic update operates on block level, which usually restricts the location of the data inserted in a file due to the fixed size of a data block. In this paper, we propose a remote data integrity checking scheme with fine-grained update for big data storage. The proposed scheme achieves basic operations of insertion, modification, deletion on line level at any location in a file by designing a mapping relationship between line level update and block level update. Scheme analysis shows that the proposed scheme supports public verification and privacy preservation. Meanwhile, it performs data integrity checking with low computation and communication cost.
In this paper we study the densest subgraph problem, which plays a key role in many graph mining applications. The goal of the problem is to find a subset of nodes that induces a graph with maximum average degree. The problem has been extensively studied in the past few decades under a variety of different settings. Several exact and approximation algorithms were proposed. However, as normal graph can only model objects with pairwise relationships, the densest subgraph problem fails in identifying communities under relationships that involve more than 2 objects, e.g., in a network connecting authors by publications. We consider in this work the densest subgraph problem in hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time r-approximation algorithm for the problem, where r is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized polog update time, for any ε \textbackslashtextgreater 0. For the case when there are only insertions, the approximation ratio we maintain is r(1+ε), while for the fully dynamic case, the ratio is r2(1+ε). Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms.
Outsourcing data storage and IT workloads to a third-party cloud provider introduces some security risks and time performance degradation. Moreover, controlling access to this data becomes very difficult when the volume of the data and number of users is very high. Various access control techniques have been proposed to address this issue. However, those techniques have complex schemes which are costly to be applied in real scenarios and they have limited flexibility and scalability to large volumes of data and users. In this paper we propose ESSAC which is an enhanced version of the SSAC scheme. ESSAC introduces a fine-grained access control scheme based on a classified Attribute Based Encryption, Role Based Encryption and Single Key Encryption methodology which achieves highest security without degrading the performance. We validate our scheme using a simulation on top of Amazon S3 and compare it to current schemes.
Box queries on a dataset in a multidimensional data space are a type of query which specifies a set of allowed values for each dimension. Indexing a dataset in a multidimensional Non-ordered Discrete Data Space (NDDS) for supporting efficient box queries is becoming increasingly important in many application domains such as genome sequence analysis. The BoND-tree was recently introduced as an index structure specifically designed for box queries in an NDDS. Earlier work focused on developing strategies for building an effective BoND-tree to achieve high query performance. Developing efficient and effective techniques for deleting indexed vectors from the BoND-tree remains an open issue. In this paper, we present three deletion algorithms based on different underflow handling strategies in an NDDS. Our study shows that incorporating a new BoND-tree inspired heuristic can provide improved performance compared to the traditional underflow handling heuristics in NDDSs.
Critical business applications in domains ranging from technical support to healthcare increasingly rely on large-scale, automatically constructed knowledge graphs. These applications use the results of complex queries over knowledge graphs in order to help users in taking crucial decisions such as which drug to administer, or whether certain actions are compliant with all the regulatory requirements and so on. However, these knowledge graphs constantly evolve, and the newer versions may adversely impact the results of queries that the previously taken business decisions were based on. We propose a framework based on provenance polynomials to track the impact of knowledge graph changes on arbitrary SPARQL query results. Focusing on the deletion of facts, we show how to efficiently determine the queries impacted by the change, develop ways to incrementally maintain these polynomials, and present an efficient implementation on top of RDF graph databases. Our experimental evaluation over large-scale RDF/SPARQL benchmarks show the effectiveness of our proposal.
Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications such as data profiling, data cleaning, entity resolution and schema matching. Their discovery in an unknown dataset is at the core of any data analysis effort. Therefore, several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are appropriate for applications on dynamic datasets, such as transactional datasets, scientific applications, and social network. In these cases, discovery techniques should be able to efficiently update the inclusion dependencies after an update in the dataset, without reprocessing the entire dataset. We present the first approach for incrementally updating the unary inclusion dependencies. In particular, our approach is based on the concept of attribute clustering from which the unary inclusion dependencies are efficiently derivable. We incrementally update the clusters after each update of the dataset. Updating the clusters does not need to access the dataset because of special data structures designed to efficiently support the updating process. We perform an exhaustive analysis of our approach by applying it to large datasets with several hundred attributes and more than 116,200,000 million tuples. The results show that the incremental discovery significantly reduces the runtime needed by the static discovery. This reduction in the runtime is up to 99.9996 % for both the insert and the delete.
We propose a formalism to model database-driven systems, called database manipulating systems (DMS). The actions of a (DMS) modify the current instance of a relational database by adding new elements into the database, deleting tuples from the relations and adding tuples to the relations. The elements which are modified by an action are chosen by (full) first-order queries. (DMS) is a highly expressive model and can be thought of as a succinct representation of an infinite state relational transition system, in line with similar models proposed in the literature. We propose monadic second order logic (MSO-FO) to reason about sequences of database instances appearing along a run. Unsurprisingly, the linear-time model checking problem of (DMS) against (MSO-FO) is undecidable. Towards decidability, we propose under-approximate model checking of (DMS), where the under-approximation parameter is the "bound on recency". In a k-recency-bounded run, only the most recent k elements in the current active domain may be modified by an action. More runs can be verified by increasing the bound on recency. Our main result shows that recency-bounded model checking of (DMS) against (MSO-FO) is decidable, by a reduction to the satisfiability problem of MSO over nested words.
Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.
The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.
We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way–-a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linear-sized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O((log2 N)/B+ logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Theta(log N) I/Os. This is no better than the bounds one gets from running an in-memory skip list in external memory.
We investigate minimization of tree pattern queries that use the child relation, descendant relation, node labels, and wildcards. We prove that minimization for such tree patterns is Sigma2P-complete and thus solve a problem first attacked by Flesca, Furfaro, and Masciari in 2003. We first provide an example that shows that tree patterns cannot be minimized by deleting nodes. This example shows that the M-NR conjecture, which states that minimality of tree patterns is equivalent to their nonredundancy, is false. We then show how the example can be turned into a gadget that allows us to prove Sigma2P-completeness.
In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heavy subgraph problems, and show that uniform sampling preserves a 1-ε approximate solution to these problems. This class contains many interesting problems such as densest subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max clustering. As an immediate impact of this result, one can use uniform sampling to solve these problems in streaming, turnstile or Map-Reduce settings. Indeed, our results by characterizing heavy subgraph problems address Open Problem 13 at the IITK Workshop on Algorithms for Data Streams in 2006 regarding the effects of subsampling, in the context of graph streams. Recently Bhattacharya et al. in STOC 2015 provide the first one pass algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-ε)-approximation algorithm using \textasciitildeO(n) space, where factors of ε and log(n) are suppressed in the \textasciitildeO notation. In this paper we improve the (0.5-ε)-approximation algorithm of Bhattacharya et al. by providing a (1-ε)-approximation algorithm using \textasciitildeO(n) space.
Unlike most social media, where automatic archiving of data is the default, Snapchat defaults to ephemerality: deleting content shortly after it is viewed by a receiver. Interviews with 25 Snapchat users show that ephemerality plays a key role in shaping their practices. Along with friend-adding features that facilitate a network of mostly close relations, default deletion affords everyday, mundane talk and reduces self-consciousness while encouraging playful interaction. Further, although receivers can save content through screenshots, senders are notified; this selective saving with notification supports complex information norms that preserve the feel of ephemeral communication while supporting the capture of meaningful content. This dance of giving and taking, sharing and showing, and agency for both senders and receivers provides the basis for a rich design space of mechanisms, levels, and domains for ephemerality.
Conventional overwriting-based and encryption-based secure deletion schemes can only sanitize data. However, the past existence of the deleted data may leave artifacts in the layout at all layers of a computing system. These structural artifacts may be utilized by the adversary to infer sensitive information about the deleted data or even to fully recover them. The conventional secure deletion solutions unfortunately cannot sanitize them. In this work, we introduce truly secure deletion, a novel security notion that is much stronger than the conventional secure deletion. Truly secure deletion requires sanitizing both the obsolete data as well as the corresponding structural artifacts, so that the resulting storage layout after a delete operation is indistinguishable from that the deleted data never appeared. We propose TedFlash, a Truly secure deletion scheme for Flash-based block devices. TedFlash can successfully sanitize both the data and the structural artifacts, while satisfying the design constraints imposed for flash memory. Security analysis and experimental evaluation show that TedFlash can achieve the truly secure deletion guarantee with a small additional overhead compared to conventional secure deletion solutions.
In the cloud storage, users lose direct control over their data. How to surely delete data in the cloud becomes a crucial problem for a secure cloud storage system. The existing way to this problem is to encrypt the data before outsourcing and destroy the encryption key when deleting. However, this solution may cause heavy computation overhead for the user-side and the encrypted data remains intact in the cloud after the deletion operation. To solve this challenge problem, we propose a novel method to surely delete data in the cloud storage by overwriting. Different from existing works, our scheme is efficient in the user-side and is able to wipe out the deleted data from the drives of the cloud servers.
Inadvertent exposure of sensitive data is a major concern for potential cloud customers. Much focus has been on other data leakage vectors, such as side channel attacks, while issues of data disposal and assured deletion have not received enough attention to date. However, data that is not properly destroyed may lead to unintended disclosures, in turn, resulting in heavy financial penalties and reputational damage. In non-cloud contexts, issues of incomplete deletion are well understood. To the best of our knowledge, to date, there has been no systematic analysis of assured deletion challenges in public clouds. In this paper, we aim to address this gap by analysing assured deletion requirements for the cloud, identifying cloud features that pose a threat to assured deletion, and describing various assured deletion challenges. Based on this discussion, we identify future challenges for research in this area and propose an initial assured deletion architecture for cloud settings. Altogether, our work offers a systematization of requirements and challenges of assured deletion in the cloud, and a well-founded reference point for future research in developing new solutions to assured deletion.
- « first
- ‹ previous
- 1
- 2
- 3