Biblio
Efficient management and control of modern and next-gen networks is of paramount importance as networks have to maintain highly reliable service quality whilst supporting rapid growth in traffic demand and new application services. Rapid mitigation of network service degradations is a key factor in delivering high service quality. Automation is vital to achieving rapid mitigation of issues, particularly at the network edge where the scale and diversity is the greatest. This automation involves the rapid detection, localization and (where possible) repair of service-impacting faults and performance impairments. However, the most significant challenge here is knowing what events to detect, how to correlate events to localize an issue and what mitigation actions should be performed in response to the identified issues. These are defined as policies to systems such as ECOMP. In this paper, we present AESOP, a data-driven intelligent system to facilitate automatic learning of policies and rules for triggering remedial actions in networks. AESOP combines best operational practices (domain knowledge) with a variety of measurement data to learn and validate operational policies to mitigate service issues in networks. AESOP's design addresses the following key challenges: (i) learning from high-dimensional noisy data, (ii) capturing multiple fault models, (iii) modeling the high service-cost of false positives, and (iv) accounting for the evolving network infrastructure. We present the design of our system and show results from our ongoing experiments to show the effectiveness of our policy leaning framework.