Title | Debugging Distributed Systems with Why-Across-Time Provenance |
Publication Type | Conference Paper |
Year of Publication | 2018 |
Authors | Whittaker, Michael, Teodoropol, Cristina, Alvaro, Peter, Hellerstein, Joseph M. |
Conference Name | Proceedings of the ACM Symposium on Cloud Computing |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-6011-1 |
Keywords | composability, data provenance, Distributed Systems, Human Behavior, Metrics, Provenance, pubcrawl, Resiliency, state machines |
Abstract | Systematically reasoning about the fine-grained causes of events in a real-world distributed system is challenging. Causality, from the distributed systems literature, can be used to compute the causal history of an arbitrary event in a distributed system, but the event's causal history is an over-approximation of the true causes. Data provenance, from the database literature, precisely describes why a particular tuple appears in the output of a relational query, but data provenance is limited to the domain of static relational databases. In this paper, we present wat-provenance: a novel form of provenance that provides the benefits of causality and data provenance. Given an arbitrary state machine, wat-provenance describes why the state machine produces a particular output when given a particular input. This enables system developers to reason about the causes of events in real-world distributed systems. We observe that automatically extracting the wat-provenance of a state machine is often infeasible. Fortunately, many distributed systems components have simple interfaces from which a developer can directly specify wat-provenance using a technique we call wat-provenance specifications. Leveraging the theoretical foundations of wat-provenance, we implement a prototype distributed debugging framework called Watermelon. |
URL | http://doi.acm.org/10.1145/3267809.3267839 |
DOI | 10.1145/3267809.3267839 |
Citation Key | whittaker_debugging_2018 |