THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
22-Aug-22 |
Initial Release |
Affected OS Type | Affected Software Product | Affected Release | Affected Release Number | Comments |
---|---|---|---|---|
NON-IOS |
Network Services Orchestrator Software |
5 |
5.6, 5.6.1, 5.6.2, 5.6.3, 5.6.3.1, 5.6.4, 5.6.5, 5.6.6, 5.6.6.1, 5.6.7, 5.7, 5.7.1, 5.7.1.1, 5.7.2, 5.7.2.1, 5.7.3, 5.7.4, 5.7.5, 5.8, 5.8.1, 5.8.2 |
Defect ID | Headline |
---|---|
CSCwc14532 | NSO crashing due to NSO leaking resources |
Affected releases of the Cisco Network Services Orchestrator (NSO) can leak resources under certain circumstances, which causes Cisco NSO to crash. The leak occurs when a subscriber invokes the cdb-diff-iterate
Application Programming Interface (API). Cisco NSO also internally uses the cdb-diff-iterate
API for Smart Licensing and notification kickers. Hence, most NSO installations are at risk. See the Background section for more details.
The resource leak is related to the usage of the cdb-diff-iterate
API. This affects customers who use Smart Licensing, use notification-kicker, or directly call the API through C, Python, Java, or econfd. Any change under /kickers/notification-kicker
or /devices/device
will trigger an internal subscription, which leads to the leak. These internal subscribers cannot be disabled. Once enough transactions have been performed, the Cisco NSO process will get terminated due to exhausting all available resources as described in the Problem Symptom section. The number of transactions that can be performed before termination will depend on the resources allocated to Cisco NSO.
This error can be observed in the /var/log/ncs/ncs.log
file if Cisco NSO has crashed due to the issue described in this field notice:
NSO (Erlang VM) terminates due to "Internal error: Supervision terminated"
In order to check if a recent crash was due to this leak, complete these steps:
ncserr.log
file, which is typically located in the /var/log/ncs/
folder.ncserr.log
file with the ncs --printlog ncserr.log > ncserror.txt
command.{error_report,<0.205.0>, {<0.208.0>,crash_report, [[{initial_call,{capi_server,init,['Argument__1']}}, {pid,<0.208.0>}, {registered_name,capi_server}, {error_info, {error,system_limit, [{erlang,spawn_link, [proc_lib,init_p, [capi_server, [capi_sup,<0.206.0>], capi_server,session, [{session,5636202,cs_trans,6,292130769,read_write, data,undefined,[],false,undefined,[]}]]],
Cisco recommends to upgrade Cisco NSO as specified in this table in order to solve this resource leak.
Affected Releases | Fixed Release |
---|---|
5.6, 5.6.1, 5.6.2, 5.6.3, 5.7.4, 5.6.5, 5.6.6, 5.6.7 | 5.6.7.1 |
5.7, 5.7.1, 5.7.2, 5.7.3, 5.7.3, 5.7.4, 5.7.5 | 5.7.5.1 |
5.8, 5.8.1, 5.8.2 | 5.8.2.1 |
Customers who cannot upgrade immediately can restart Cisco NSO periodically. The rate at which the Cisco NSO process consumes resources is directly tied to the number of transactions you perform or the usage frequency of the affected cdb-diff-iterate
API.
Monitor and Avoid an Unplanned Cisco NSO Crash
In order to determine the frequency at which you need to restart Cisco NSO, complete these steps:
From the Linux CLI on the host where Cisco NSO runs, enter this command:
ps ax | grep ncs.smp | egrep "\-P\s+[0-9]+" -o
If the command yields results, use the number obtained. Otherwise, use 32,768.
To be on the safe side, deduct 2,000 to account for Cisco NSO internal startup processes to arrive at the number of transactions Cisco NSO can complete before it should be restarted.
For example, if the ps ax | grep ncs.smp | egrep "\-P\s+[0-9]+" -o
command returns "-P 361,5000" the calculation is:
361,5000 - 2,000 = 361,3000
As another example, if the ps ax | grep ncs.smp | egrep "\-P\s+[0-9]+" -o
command returns nothing, the calculation is:
32,768 - 2,000 = 30,768
grep "ncs progress.*datastore=running.*applying transaction: ok” devel.log | wc -l
Note: It is normal for the count to increase over time. After the upgrade, monitoring is no longer needed as it will return a false positive result.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
My Notifications—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance