Field Notice: FN - 72438 - Specific Releases of Network Services Orchestrator Might Crash Due to a Resource Leak - Software Upgrade Recommended

Available Languages

Updated:August 29, 2022

Document ID:FN72438

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Notice

THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.

Revision History

Revision	Publish Date	Comments
1.0	22-Aug-22	Initial Release

Products Affected

Affected OS Type	Affected Software Product	Affected Release	Affected Release Number	Comments
NON-IOS	Network Services Orchestrator Software	5	5.6, 5.6.1, 5.6.2, 5.6.3, 5.6.3.1, 5.6.4, 5.6.5, 5.6.6, 5.6.6.1, 5.6.7, 5.7, 5.7.1, 5.7.1.1, 5.7.2, 5.7.2.1, 5.7.3, 5.7.4, 5.7.5, 5.8, 5.8.1, 5.8.2

Defect Information

Defect ID	Headline
CSCwc14532	NSO crashing due to NSO leaking resources

Problem Description

Affected releases of the Cisco Network Services Orchestrator (NSO) can leak resources under certain circumstances, which causes Cisco NSO to crash. The leak occurs when a subscriber invokes the cdb-diff-iterate Application Programming Interface (API). Cisco NSO also internally uses the cdb-diff-iterate API for Smart Licensing and notification kickers. Hence, most NSO installations are at risk. See the Background section for more details.

Background

The resource leak is related to the usage of the cdb-diff-iterate API. This affects customers who use Smart Licensing, use notification-kicker, or directly call the API through C, Python, Java, or econfd. Any change under /kickers/notification-kicker or /devices/device will trigger an internal subscription, which leads to the leak. These internal subscribers cannot be disabled. Once enough transactions have been performed, the Cisco NSO process will get terminated due to exhausting all available resources as described in the Problem Symptom section. The number of transactions that can be performed before termination will depend on the resources allocated to Cisco NSO.

Problem Symptom

This error can be observed in the /var/log/ncs/ncs.log file if Cisco NSO has crashed due to the issue described in this field notice:

NSO (Erlang VM) terminates due to "Internal error: Supervision terminated"

In order to check if a recent crash was due to this leak, complete these steps:

Find the ncserr.log file, which is typically located in the /var/log/ncs/ folder.
Decrypt the ncserr.log file with the ncs --printlog ncserr.log > ncserror.txt command.

Look for the signature, as highlighted in this example, which indicates that the failure was due to this known trigger:

{error_report,<0.205.0>,

     {<0.208.0>,crash_report,

      [[{initial_call,{capi_server,init,['Argument__1']}},

        {pid,<0.208.0>},

        {registered_name,capi_server},

        {error_info,

            {error,system_limit,

                [{erlang,spawn_link,

                     [proc_lib,init_p,

                      [capi_server,

                       [capi_sup,<0.206.0>],

                       capi_server,session,

                       [{session,5636202,cs_trans,6,292130769,read_write,

                            data,undefined,[],false,undefined,[]}]]],

Workaround/Solution

Cisco recommends to upgrade Cisco NSO as specified in this table in order to solve this resource leak.

Affected Releases	Fixed Release
5.6, 5.6.1, 5.6.2, 5.6.3, 5.7.4, 5.6.5, 5.6.6, 5.6.7	5.6.7.1
5.7, 5.7.1, 5.7.2, 5.7.3, 5.7.3, 5.7.4, 5.7.5	5.7.5.1
5.8, 5.8.1, 5.8.2	5.8.2.1

Customers who cannot upgrade immediately can restart Cisco NSO periodically. The rate at which the Cisco NSO process consumes resources is directly tied to the number of transactions you perform or the usage frequency of the affected cdb-diff-iterate API.

Monitor and Avoid an Unplanned Cisco NSO Crash

In order to determine the frequency at which you need to restart Cisco NSO, complete these steps:

Determine the maximum value your system is configured to allow.
From the Linux CLI on the host where Cisco NSO runs, enter this command:
```
ps ax | grep ncs.smp | egrep "\-P\s+[0-9]+" -o
```
If the command yields results, use the number obtained. Otherwise, use 32,768.

To be on the safe side, deduct 2,000 to account for Cisco NSO internal startup processes to arrive at the number of transactions Cisco NSO can complete before it should be restarted.

For example, if the ps ax | grep ncs.smp | egrep "\-P\s+[0-9]+" -o command returns "-P 361,5000" the calculation is:

361,5000 - 2,000 = 361,3000

As another example, if the ps ax | grep ncs.smp | egrep "\-P\s+[0-9]+" -o command returns nothing, the calculation is:

32,768 - 2,000 = 30,768
Check periodically to ensure the system does not approach the calculated value by counting the number of transactions from when Cisco NSO was started. Assuming the /var/log/ncs/devel.log was empty when Cisco NSO started, count the number of transactions with this command:
```
grep "ncs progress.*datastore=running.*applying transaction: ok” devel.log | wc -l
```
When Cisco NSO approaches the number that might trigger a crash, perform a restart of Cisco NSO.

Note: It is normal for the count to increase over time. After the upgrade, monitoring is no longer needed as it will return a false positive result.

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

My Notifications—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)