The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes general troubleshooting for properly reporting high CPU/QFP issues to TAC for faster case resolution.
Cisco recommends that you have basic knowledge of these topics:
This document is not restricted to specific software and hardware versions. It applies for any routing Cisco IOS-XE® platform with physical/virtualized QFP like ASR1000, ISR4000, ISR1000, Cat8000 or Cat8000v.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
This document outlines the commands TAC needs for the initial triage of a high CPU/QFP problem for a great TAC experience from the very first contact.
Also, this document contains troubleshooting tips in order to identify a high Central Processing Unit (CPU) or a high Quantum Flow Processor(QFP) utilization problem so you can find a solution prior opening a TAC case.
The purpose of this document is not to explain any troubleshooting procedures extensively. If available, references to more in deep troubleshooting guides are provided.
At the end of this document, there are block diagrams that serve for educational purposes as a visual representation of the components.
High components - memory, TCAM, CPU, QFP - utilization typically is an indicator of either:
Identifying the underlying cause of the high component utilization is vital in order to determine proper course of action for solving the problem.
You can validate if there is a high CPU or QFP condition via monitoring tools, or via these commands:
show process cpu sorted
iosxe_router#show process cpu sorted
CPU utilization for five seconds: 90%/0%; one minute: 0%; five minutes: 0%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
395 78769 1242162 63 89.07% 88.04% 89.02% 0 CDP Protocol
1 8 88 90 0.00% 0.00% 0.00% 0 Chunk Manager
--- snip ---
From the line "CPU utilization for five seconds: 90%/0%; one minute: 0%; five minutes: 0%," you need to focus on the first value after the "five seconds" string. In this case, the 90% indicates the overall CPU utilization, while the number to the right of the slash-0 in this case-represents the CPU usage due to interrupts. The difference between these two numbers represents the total CPU utilization due to processes. In this scenario, the CDP Protocol is consuming most of the CPU (control plane) resources.
Since Cisco IOS-XE has a Linux-based kernel, sometimes you find issues along any of the processes running on top of it, you can use the show processes CPU platform sorted for validating if any process is causing problems (focus on the 5sec column) to show processes from the underlying operating system.
iosxe_router#show process cpu platform sorted
-- depending on the architecture, there can be multiple cores, deleting for brevity --
Pid PPid 5Sec 1Min 5Min Status Size Name
--------------------------------------------------------------------------------
18009 18001 323% 325% 328% R 266740 ucode_pkt_PPE0
11168 11160 1% 1% 1% S 914556 linux_iosd-imag
96 2 1% 0% 0% S 0 ksmd
--- snip ---
Note: Routers with virtual QFP have the ucode_pkt_PPE0 process, which is the software process which emulates the data plane. Therefore that process can be ignored from the list of processes that contribute to CPU utilization.
QFP is the System on a Chip responsible of all the packet forwarding. Additional information can be found in the section: Understanding High QFP on IOS-XE routers.
iosxe_router #show platform hardware qfp active datapath utilization
CPP 0: Subdev 0 5 secs 1 min 5 min 60 min
--- snip ---
(bps) 21992 13648 13736 13720
Processing: Load (pct) 0 0 0 0
Crypto/IO
RX: Load (pct) 0 0 0 0
TX: Load (pct) 1 1 1 0
Idle (pct) 99 99 99 99
From the show platform hardware qfp active data path utilization command, focus on the processing: Load for the 5 seconds column, as this provides the most recent overall QFP usage. Some devices display the usage of the Crypto/IO module as well, focus on Idle, the closer to 100%, the better.
By default, there are no logs generated by the system that shows a high CPU utilization on IOSd which uses CPU number 0, the first CPU on Cisco IOS-XE systems.
This command must be configured first for syslog to be generated on the first core.
This command must be written according to the format described in CPU Thresholding Notification: process cpu threshold type {total | process | interrupt} rising percentage interval seconds [falling percentage interval seconds]
In that way, we would be able to see this type of notification:
%SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 91%/2%, Top 3 processes(Pid/Util):
Another way to catch high usage on it is via SNMP or Telemetry measurements.
In some cases you would see a resource LIMIT alert like this one when other cores have a high usage hit:
PLATFORM_INFRA-5-IOS_INTR_OVER_LIMIT:
For data plane , we would see this type of QFP alert in log generally indicating that the threshold load as exceeded:
MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 93% exceeds the setting threshold(80%).
If CPU is not stuck at a constant 100%, include a show tech output. This is of great help to the TAC, and you can benefit from the automations TAC has developed to help you find issues faster.
Note: High-CPU condition must be troubleshooted while the problem is present, as the device does not store any historical data about processes running time.
Note: Make sure you are running a supported version. Look for the End-of-Sale and End-of-Life document for the release. If needed, move to a version that is currently under Software Maintenance Releases. Otherwise, TAC is limited on the troubleshooting and resolution options.
As a rule, a CPU/QFP is considered to be running high if it is running above 80%.
Cisco IOS-XE routers can be associated with high utilization on the control plane (CPU) or in the data plane (QFP).
Note: Ideally, a high CPU/QFP utilization must be evaluated relative to the typical usage patterns of the device over time. For example, if a device normally operates at 10% CPU usage but suddenly jumps to 40%, this could indicate high CPU usage for that device. On the other hand, a device consistently running at 80% CPU usage is not necessarily a problem if that is its usual operating level. Monitoring systems with CPU graphs can help collect and analyze this data to establish a baseline for each device.
Referring to CPU on an Cisco IOS-XE router is referring to the CPU responsible for the administrative/control plane operations of the device. There are many processes running on the device, all of them running on top of a Linux based kernel. Each one of these processes are running in a general-purpose CPU.
When a high CPU condition is present, it is typically an indicator of:
Some platforms have multiple general purpose CPUs, which abide by these rules:
On Cisco IOSXE devices, generally we have data plane and control plane CPU-dedicated cores.
Generally, if CPU 0 (the first CPU) is tied to IOSd (IOS daemon) , the CPU-dedicated core is control plane-related. Other CPUs can be a mix of control plane and data plane CPUs.
In the case of ASR 1000, which is generally modular, command outputs like show platform resources and show platform software status control-processor brief show the usage for control plane (RP) and data plane (ESP) CPUs.
Control plane CPUs are dedicated to controlling protocol processing like processing BGP protocol, STP protocol, CDP, SSH and so on. Control plane CPUs process packets destined to the router itself for its processing.
Data plane generally refers to transit packets that the router does not consume itself in Routing Processor (RP), instead, data plane process packets that are processed only in Quantum Flow Processor (QFP) component which is the packet processor. These packets have its processing in QFP where lookups happen to send the transit packet to its intended destination.
The Quantum Flow Processor (QFP) is the System on a Chip (SoC) in charge of all the packet forwarding operations in the device.
The QFP runs a specialized piece of software called microcode. This microcode is responsible for executing and applying features to all the packets passing through the device based on the input/output interface configuration. It also interacts with the rest of the system through the different processes.
When a high QFP condition is present, it is typically an indicator of:
For better understanding of the situation, TAC must collect the Feature Invocation Array (FIA) trace for additional analysis. This is documented at Troubleshoot with the IOS-XE Datapath Packet Trace Feature
These are starting basic commands that must be gathered at issue time (EEM logic can be implemented to match log notification and get the output):
router_non_modular#show platform resources
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
RP0 (ok, active) H
Control Processor 10.64% 100% 80% 90% H
DRAM 2143MB(54%) 3913MB 88% 93% H
bootflash 2993MB(97%) 3099MB 70% 90% C
ESP0(ok, active) H
QFP H
DRAM 52844KB(20%) 262144KB 85% 95% H
IRAM 207KB(10%) 2048KB 85% 95% H
CPU Utilization 0.00% 100% 90% 95% H
Router#show platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
RP0 Healthy 1.75 1.25 1.14
Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
RP0 Healthy 4003008 2302524 (58%) 1700484 (42%) 3043872 (76%)
CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
RP0 0 5.60 10.80 0.00 75.00 0.00 0.10 8.50
1 8.10 11.81 0.00 66.66 0.00 0.20 13.21
2 4.69 9.49 0.00 80.81 0.00 0.19 4.79
3 4.80 10.20 0.00 79.30 0.00 0.10 5.60
4 3.70 3.20 0.00 92.90 0.00 0.00 0.20
5 1.09 2.99 0.00 95.00 0.00 0.09 0.79
6 20.00 33.10 0.00 46.90 0.00 0.00 0.00
7 0.00 0.00 0.00 100.00 0.00 0.00 0.00
Router#
High CPU usage in a modular Cisco IOS-XE router can have a high CPU condition in the Route Processor (RP) card, the Embedded Service Processor (ESP) or the SPA Interface Processor (SIP) card. These commands help in understanding if the high CPU condition is related with a different card within the device:
ios_xe_modular_router#show platform resources
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
RP0 (ok, active) H
Control Processor 11.62% 100% 90% 95% H
DRAM 1730MB(45%) 3783MB 90% 95% H
ESP0(ok, active) H
Control Processor 19.59% 100% 90% 95% H
DRAM 616MB(65%) 946MB 90% 95% H
QFP H
TCAM 8cells(0%) 65536cells 45% 55% H
DRAM 79212KB(30%) 262144KB 80% 90% H
IRAM 9329KB(7%) 131072KB 80% 90% H
SIP0 H
Control Processor 2.30% 100% 90% 95% H
DRAM 280MB(60%) 460MB 90% 95% H
* Depending on the Cisco IOS version, QFP can contain the processor usage, otherwise you need to collect the show platform hardware qfp datapath utilization
A good reference guide for ASR1k can be found at Troubleshoot High CPU on ASR1000 Series Router
Note: Commands vary sometimes depending on the platform and version. Look for the specific platform documentation in some cases.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
23-Oct-2024 |
Initial Release |