Visible to the public Debugging Distributed Systems with Why-Across-Time Provenance

TitleDebugging Distributed Systems with Why-Across-Time Provenance
Publication TypeConference Paper
Year of Publication2018
AuthorsWhittaker, Michael, Teodoropol, Cristina, Alvaro, Peter, Hellerstein, Joseph M.
Conference NameProceedings of the ACM Symposium on Cloud Computing
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6011-1
Keywordscomposability, data provenance, Distributed Systems, Human Behavior, Metrics, Provenance, pubcrawl, Resiliency, state machines
AbstractSystematically reasoning about the fine-grained causes of events in a real-world distributed system is challenging. Causality, from the distributed systems literature, can be used to compute the causal history of an arbitrary event in a distributed system, but the event's causal history is an over-approximation of the true causes. Data provenance, from the database literature, precisely describes why a particular tuple appears in the output of a relational query, but data provenance is limited to the domain of static relational databases. In this paper, we present wat-provenance: a novel form of provenance that provides the benefits of causality and data provenance. Given an arbitrary state machine, wat-provenance describes why the state machine produces a particular output when given a particular input. This enables system developers to reason about the causes of events in real-world distributed systems. We observe that automatically extracting the wat-provenance of a state machine is often infeasible. Fortunately, many distributed systems components have simple interfaces from which a developer can directly specify wat-provenance using a technique we call wat-provenance specifications. Leveraging the theoretical foundations of wat-provenance, we implement a prototype distributed debugging framework called Watermelon.
URLhttp://doi.acm.org/10.1145/3267809.3267839
DOI10.1145/3267809.3267839
Citation Keywhittaker_debugging_2018