Improving Fault Isolation With Active Assurance and Passive Assurance
When something goes wrong in the network, we often call it an “event” or a “fault.” That sounds so innocuous. After all, “events” happen, right? But for some reason, customers don’t see it that way. One little interruption of service or poor experience can drive them away. The challenge will only get worse as 5G brings exploding demands, higher customer expectations, and increased complexity. So how can you assure that you will catch performance problems and resolve underlying faults before they cascade into crises?
Readers of our 5G Transformation Needs Automated Assurance blog series have joined us on a journey to understand the challenges presented by 5G and effective new approaches including automated assurance, quality monitoring for 5G network slicing, and automated change management. Now let’s look at methods that can help operators detect and isolate faults in the automated network.
Two modes of Automated Assurance
We’ve discussed automated assurance in our previous blogs, but for today’s blog we’re going to drill down to another level of detail. Automated assurance consists of two main categories of virtualized assurance functions: Active Assurance and Passive Assurance. Both categories of virtualized functions are designed to be seamlessly integrated with the network (i.e. orchestrated) to automate fault detection and resolution. Let’s look at the strengths and weaknesses of each of these approaches and why having both approaches is going to be critical for 5G.
Passive Assurance is the traditional method for ascertaining the health of the network. Passive Assurance collects telemetry from virtual and physical network functions, DPI (Deep Packet Inspection) probes and protocol probes that passively monitor the signaling and user traffic on the network. Data could also be gathered from other sources of subscriber data such as billing records that inform the operator about the subscribers and how they’re using the network.
Strengths:
Passive Assurance gathers all the data from all of the operator’s customers as they use the service. It’s particularly good at determining how many customers are affected by a problem. Because it works with vast volumes of data, it’s best at detecting issues that can be inferred by correlating statistical data from a variety of sources such as signaling traffic, user plane packet headers and network element counters. Passive Assurance works best once a new function, slice or service is up and running and while traffic levels are maintained at a minimum level.
Limitations:
Passive Assurance produces valuable insights for detecting major issues. But real usage of the network is highly variable. Small changes in the performance of the network or services may be due to varying traffic levels, changes in the geographic traffic distribution or differences in the mix of applications being used. But these minor performance changes could also indicate a network fault that isn’t apparent from the correlated, high-level data that Passive Assurance relies on—until it escalates into a severe issue.
Because Passive Assurance is expensive to install and analyze, operators must be judicious about where they place their monitoring, typically inserting probes into just key parts of the network. It’s not cost effective to monitor all traffic, in all parts of the network, at all times. As a result, operators may not have coverage at the spot where a problem is occurring at a particular time.
Relying on signaling and statistics makes it difficult, if not impossible, to get an accurate sense of the real user experience. For instance, if you’re collecting data at the core of the network, you may not get an accurate understanding of the customer’s experience at a distant end point.
Most critically, Passive Assurance relies on data generated by use of the network and services. As its name implies, it’s passive—it waits for customers to use the network—and is therefore not very helpful at turn-up, before users create traffic, or for critical, always-on but mostly inactive services such as IoT alarms and public safety networks.
Active Assurance presents a perfect complement to Passive Assurance. Active Assurance, which is also referred to as Active Test, consists of an Active Assurance Controller and Virtual Test Agents (VTAs). Directed by the Controller, VTAs emulate portions of the network, end-user devices and the use of specific applications to create small amounts of synthetic traffic that are injected into the network. The synthetic traffic allows the Active Assurance system to evaluate network performance even when there’s no real user traffic in the network. This makes it ideal for when you first turn up a function, want to ensure a public safety network is working properly, or need to isolate a complex problem.
With Active Assurance, you insert a known quantity—synthetic traffic—into the network. When you know what you’re putting in, you can easily measure what’s coming out at the endpoint. Remember what we said about the variability of user traffic causing uncertainty about whether small performance fluctuations were normal or the first sign of a real problem? By inserting known traffic into the network, Active Assurance enables you to benchmark and track fine variations in performance over time, allowing you to differentiate normal variations from significant issues.
What’s more, one can instantiate a VTA into a specific part of the network, run your tests and then de-instantiate the agent—easily and cost-effectively. VTAs can be inserted anywhere in the network, enabling you to perform similar tests at different locations. VTAs can also be run continuously to assure the availability and performance of critical links and services.
Because it uses known, synthetic traffic, Active Assurance not only enables operators to troubleshoot complex issues, it provides full visibility of user plane performance—and the customer’s true end-to-end service experience—in a way not possible with Passive Assurance.
Passive Assurance is ideal to:
Monitor performance once the network is up and running
Track services and links that have consistent traffic flows
Detect major issues and determine how many users are impacted
Troubleshoot issues for high-priority parts of the network
Active Assurance complements Passive Assurance by enabling you to:
Evaluate performance at turn-up before customer usage starts
Continuously check critical services and links, such as an IoT alarm sensor, redundant network link or public safety network, regardless of the level of usage
Proactively identify minor issues before they become major
Get full visibility of the user’s end-to-end performance and service experience
Troubleshoot complex issues in any part of the network because it’s virtualized and inserts small amounts of traffic, so it’s not a heavy lift.
Performance challenges are growing. But so are our tools.
It’s true: network automation and 5G along with snowballing demand from impatient consumers for data-intensive, bandwidth-scarfing applications will challenge our industry as never before. But by utilizing thoughtful pairing of Passive Assurance and Active Assurance, operators can assure that they can meet the expectations of their customers and fulfill their promises of a quality experience.