Information About Enhanced Application-Aware Routing
Without enhanced application-aware routing enabled, Cisco IOS XE Catalyst SD-WAN devicerequire several minutes to switch traffic from one network path to another to meet SLA requirements when the loss, latency, and jitter exceed specific threshold values. Enabling enhanced application-aware routing speeds the detection of tunnel performance issues. This enables Cisco IOS XE Catalyst SD-WAN device to redirect traffic away from tunnels that do not meet SLA requirements.
Overview of Enhanced Application-Aware Routing
BFD (Bidirectional Forwarding Detection) detects link failure conditions and gathers performance routing data (PfR), including loss, latency, and jitter information of Cisco Catalyst SD-WAN tunnels (both IPsec and GRE). Each BFD hello packet collects the following information:
Latency: RTT (Round trip time) between BFD echo request and reply.
Jitter: The variation in the delay of packet arrival times in a network. It is a measure of the irregularity in the timing of data packets as they are transmitted and received.
Loss: Number of echo requests that fail to receive a reply.
By default, with a BFD hello timer of 1 second, one sample of PfR data is collected every second. This PfR data is collected over the duration of the poll interval (default 10 minutes). During the poll interval, the average of each statistic is computed. To determine dynamic path decisions based on the thresholds specified in application-aware routing SLAs, a default multiplier of 6 is employed to review multiple averages of the poll-interval. A poll interval average refers to the average time duration between consecutive polling or measurement events in a network monitoring or performance measurement system. The poll interval average provides an indication of how frequently the system collects data or samples network metrics over a specific time-period.
Convergence time refers to the amount of time it takes for the network to recover and resume normal operations after a failure or disruption. However, the default convergence time for detection of slowly degrading WAN circuits is between 10 minutes and 1 hour. Even with the lowest recommended poll-interval of 2 minutes and 6 intervals, the convergence time is between 2 minutes and 12 minutes. Setting a very low poll interval can result in false positives of PfR and traffic instability due to insufficient sample data for loss, latency, and jitter measurements.
PfR Measurements
Metric |
Source |
Description |
---|---|---|
Loss |
BFD |
Measured as loss of BFD packet at 1pps or one packet in n_app_probe_class (n-apc) sec If the application probe class (APC) configuration is not set, the loss of BFD packets occurs at a rate of 1 packet per second (1pps). With the APC configuration, the loss is reduced to 1 packet in N seconds. For more information see, Application Probe Class. |
Latency |
BFD |
RTT measurements 1 pps or one packet in n-apc sec Without the application probe class (APC) configuration, the loss of RTT packets occurs at a rate of 1 packet per second (1pps). With the APC configuration, the loss is reduced to 1 packet in N seconds. |
Jitter |
BFD |
Variation in RTT |
Application-Aware Routing Design and Measurements
-
The default BFD hello-interval is 1 sec, and the app-route/SLA poll-interval is 10 mins:
The BFD hello-interval refers to the frequency at which BFD (Bidirectional Forwarding Detection) protocol sends hello packets to detect the liveliness of a network path. By default, the hello-interval is set to 1 second. On the other hand, the app-route/SLA poll-interval determines how frequently the network monitoring system collects data or measures network metrics related to application routes or Service Level Agreements (SLAs). The default poll-interval for app-route/SLA is set to 10 minutes.
-
By default, the system calculates to 60 minutes using 1 pps x 600 sec x 6 buckets:
Refers to the calculation of a default value for the poll-interval in minutes. It calculates the interval by multiplying 1 packet per second (pps) by 600 seconds (10 minutes) and then multiplying the result by 6 buckets. The resulting value is 60 minutes, which is the default poll-interval.
-
Experts suggest using a poll-interval of 120 seconds (2 minutes) and a multiplier of 5, which results in a 10-minute interval. This recommendation is often followed to achieve a specific monitoring frequency.
-
Reducing the poll-interval/multiplier helps improve detection time but may lead to false positives with a small number of samples for PfR metrics:
Decreasing the poll-interval and/or the multiplier can enhance the speed at which network performance issues are detected. However, reducing these values may also increase the likelihood of false positives, which is that the system may incorrectly identify issues due to a small number of data samples. The detection time and the accuracy of PfR (Performance Routing) metrics must be balanced.
-
The only option is to improve the measurement accuracy at a faster rate by reducing the BFD Hello interval:
To achieve a faster and more accurate measurement of network performance, the recommended approach is to decrease the BFD hello-interval. Network path liveliness refers to the condition of the connectivity and availability of network paths. By reducing the interval at which hello packets are exchanged, the liveliness of network paths can be detected more frequently, leading to improved measurement accuracy.
Benefits of Enhanced Application-Aware Routing
-
Improved the PfR metrics (loss/latency/jitter) measurements by introducing inline data that allows for more accurate and detailed measurements of these metrics. Inline data refers to the traffic that is processed and inspected directly at the edge of the network, within the Cisco IOS XE Catalyst SD-WAN devices. Instead of routing all the traffic to a central location for analysis and security checks, inline data allows for real-time inspection and decision-making at the network edge.
-
Quick Enhanced-App-Route Detection and SLA Enforcement, which involves reducing the PfR poll-interval to a very low value (minimum of 10 seconds). This allows the Cisco IOS XE Catalyst SD-WAN devices to quickly detect any slow degradation of circuits. If a circuit fails to meet the SLA threshold, the tunnels are swiftly switched out from SLA forwarding to ensure efficient and reliable network performance. SLA (Service Level Agreement) forwarding refers to the capability of the Cisco Catalyst SD-WAN solution to dynamically route network traffic based on predefined performance criteria or SLAs.
-
The speed of SLA switch-over is improved.
-
SLA Dampening is introduced for a smoother transition to SLA forwarding. Before implementing SLA forwarding again, the tunnel goes through a process called dampening, which helps prevent disruptions and instabilities. This ensures a smooth transition back to SLA, minimizing any negative effects on network performance.
-
Enhancements are made to measure loss, latency, and jitter.
Guidelines of Enhanced Application-Aware Routing
-
Both GRE and IPSEC tunnels are supported.
-
All existing TLOCs and WAN interface types, including physical, sub interface, loopback bind, dialer, and LTE interfaces, are supported.
-
TLOC Extension tunnels are supported.
-
Both IPv4 and IPv6 underlay tunnels are supported.
-
SLA update and switchover occur at a minimum interval of 10 seconds.
-
Tunnel scale is not impacted, with minimal impact on memory and performance.
-
Support is provided with and without app-probe class configuration in SLA classes.
-
SLA dampening is supported.
Compatibility With Cisco IOS XE Catalyst SD-WAN devices Not Running Enhanced Application-Aware Routing
-
In the following scenario:
-
On the local side: The Cisco IOS XE Catalyst SD-WAN device is upgraded to Cisco IOS XE Catalyst SD-WAN Release 17.12.1a and later and has EAAR (Enhanced Application-Aware Routing) enabled.
-
On the remote side: The Cisco IOS XE Catalyst SD-WAN device is not upgraded to Cisco IOS XE Catalyst SD-WAN Release 17.12.1a and the EAAR is not enabled.
Then the system will fall back to using BFD based measurements where support compatibility with older releases and disabled features are present.
-
-
If both the local and remote sides are using Cisco IOS XE Catalyst SD-WAN Release 17.12.1a but the EAAR feature is not enabled, the system will revert to using BFD based measurements.
Note |
The EAAR feature is disabled by default to support existing deployments. |