AI in network operations: How do you build trust in a self-healing network?

How to build trust in a self-healing network

One of the key challenges in the shift towards more autonomous networks is ensuring trust in the AI models supporting automation, especially for the network engineers who will still remain accountable for network performance even when they are relying on automated and intelligent systems.

The goal of a self healing / self-optimising network is to monitor the network in real-time, in order to identify and rectify potential faults before they occur. So what is actually happening in practice?

  1. Collection of real-time network data
  2. Feed it into a machine learning (ML) powered engine that can identify potential issues and recommend a next best action
  3. The next best action is based on the policies and playbooks already set by the network operator
  4. The model then delivers an actionable insight, which ideally is implemented in an automated fashion
    • These four steps all occur in real time. Underpinning this are some less time sensitive activities:
  5. Event based data and a whole host of other data on network operations is fed into the centralise data lake.
  6. There it is analysed retrospectively to develop better insights on how to improve management of network functions, with the aim of updating policies to meet changing usage patterns and needs on the network.

To achieve self-healing network operations, operators need to build trust in ML and automation at several stages (indicated by green tick marks on the graphic):

  • Trust that the recommendations on real-time actions are in fact in line with what a domain expert would recommend. This means the algorithms need some level of explainability on how they are coming to a decision, particularly if the recommendations differ from current practices
  • That the next best action suggested will not have unintended consequences, and are in line with the intentions of the network operations team
  • That the automation of the intended action will occur smoothly, as expected
  • That the analysis of the historical data in the centralised data lake is incorporating the right data to make a well-informed recommendation

A good way of enabling this is to include network engineers – who are responsible for delivering on SLAs – in the development of the models and systems powering closed-loop operations, so can trust the AI systems will perform as expected.

See our research on AI in networks: