Visible to the public AESOP: Automatic Policy Learning for Predicting and Mitigating Network Service Impairments

TitleAESOP: Automatic Policy Learning for Predicting and Mitigating Network Service Impairments
Publication TypeConference Paper
Year of Publication2017
AuthorsDeb, Supratim, Ge, Zihui, Isukapalli, Sastry, Puthenpura, Sarat, Venkataraman, Shobha, Yan, He, Yates, Jennifer
Conference NameProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Date PublishedAugust 2017
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4887-4
KeywordsAutomated Response Actions, composability, network management, policy learning, pubcrawl, Resiliency, supervised learning
Abstract

Efficient management and control of modern and next-gen networks is of paramount importance as networks have to maintain highly reliable service quality whilst supporting rapid growth in traffic demand and new application services. Rapid mitigation of network service degradations is a key factor in delivering high service quality. Automation is vital to achieving rapid mitigation of issues, particularly at the network edge where the scale and diversity is the greatest. This automation involves the rapid detection, localization and (where possible) repair of service-impacting faults and performance impairments. However, the most significant challenge here is knowing what events to detect, how to correlate events to localize an issue and what mitigation actions should be performed in response to the identified issues. These are defined as policies to systems such as ECOMP. In this paper, we present AESOP, a data-driven intelligent system to facilitate automatic learning of policies and rules for triggering remedial actions in networks. AESOP combines best operational practices (domain knowledge) with a variety of measurement data to learn and validate operational policies to mitigate service issues in networks. AESOP's design addresses the following key challenges: (i) learning from high-dimensional noisy data, (ii) capturing multiple fault models, (iii) modeling the high service-cost of false positives, and (iv) accounting for the evolving network infrastructure. We present the design of our system and show results from our ongoing experiments to show the effectiveness of our policy leaning framework.

URLhttp://doi.acm.org/10.1145/3097983.3098157
DOI10.1145/3097983.3098157
Citation Keydeb_aesop:_2017