Visible to the public The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance

TitleThe Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance
Publication TypeConference Paper
Year of Publication2016
AuthorsChen, Ang, Wu, Yang, Haeberlen, Andreas, Zhou, Wenchao, Loo, Boon Thau
Conference NameProceedings of the 2016 ACM SIGCOMM Conference
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4193-6
Keywordscomposability, Debugging, Human Behavior, Metrics, Network diagnostics, Provenance, pubcrawl, Resiliency
Abstract

In this paper, we propose a new approach to diagnosing problems in complex distributed systems. Our approach is based on the insight that many of the trickiest problems are anomalies. For instance, in a network, problems often affect only a small fraction of the traffic (e.g., perhaps a certain subnet), or they only manifest infrequently. Thus, it is quite common for the operator to have "examples" of both working and non-working traffic readily available - perhaps a packet that was misrouted, and a similar packet that was routed correctly. In this case, the cause of the problem is likely to be wherever the two packets were treated differently by the network. We present the design of a debugger that can leverage this information using a novel concept that we call differential provenance. Differential provenance tracks the causal connections between network states and state changes, just like classical provenance, but it can additionally perform root-cause analysis by reasoning about the differences between two provenance trees. We have built a diagnostic tool that is based on differential provenance, and we have used our tool to debug a number of complex, realistic problems in two scenarios: software-defined networks and MapReduce jobs. Our results show that differential provenance can be maintained at relatively low cost, and that it can deliver very precise diagnostic information; in many cases, it can even identify the precise root cause of the problem.

URLhttp://doi.acm.org/10.1145/2934872.2934910
DOI10.1145/2934872.2934910
Citation Keychen_good_2016