Visible to the public Biblio

Filters: Author is Glavic, Boris  [Clear All Filters]
2017-05-16
Arab, Bahareh Sadat, Gawlick, Dieter, Krishnaswamy, Vasudha, Radhakrishnan, Venkatesh, Glavic, Boris.  2016.  Reenactment for Read-Committed Snapshot Isolation. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :841–850.

Provenance for transactional updates is critical for many applications such as auditing and debugging of transactions. Recently, we have introduced MV-semirings, an extension of the semiring provenance model that supports updates and transactions. Furthermore, we have proposed reenactment, a declarative form of replay with provenance capture, as an efficient and non-invasive method for computing this type of provenance. However, this approach is limited to the snapshot isolation (SI) concurrency control protocol while many real world applications apply the read committed version of snapshot isolation (RC-SI) to improve performance at the cost of consistency. We present non trivial extensions of the model and reenactment approach to be able to compute provenance of RC-SI transactions efficiently. In addition, we develop techniques for applying reenactment across multiple RC-SI transactions. Our experiments demonstrate that our implementation in the GProM system supports efficient re-construction and querying of provenance.

2017-03-07
Santoro, Donatello, Arocena, Patricia C., Glavic, Boris, Mecca, Giansalvatore, Miller, Renée J., Papotti, Paolo.  2016.  BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems. Proceedings of the 2016 International Conference on Management of Data. :2161–2164.

Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.