Visible to the public Principled Workflow-centric Tracing of Distributed Systems

TitlePrincipled Workflow-centric Tracing of Distributed Systems
Publication TypeConference Paper
Year of Publication2016
AuthorsSambasivan, Raja R., Shafer, Ilari, Mace, Jonathan, Sigelman, Benjamin H., Fonseca, Rodrigo, Ganger, Gregory R.
Conference NameProceedings of the Seventh ACM Symposium on Cloud Computing
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4525-5
KeywordsHuman Behavior, Metrics, multiple fault diagnosis, pubcrawl, Resiliency
Abstract

Workflow-centric tracing captures the workflow of causally-related events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource accounting and diagnosis. Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential. To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructures utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.

URLhttp://doi.acm.org/10.1145/2987550.2987568
DOI10.1145/2987550.2987568
Citation Keysambasivan_principled_2016