THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.7 |
06-Feb-23 |
Updated the Workaround/Solution Section |
1.6 |
18-Sep-22 |
Updated the Background Section |
1.5 |
08-Mar-22 |
Updated the Problem Symptom and Workaround/Solution Sections |
1.4 |
23-Jul-21 |
Updated the Workaround/Solution Section |
1.33 |
03-Jun-21 |
Updated the Workaround/Solution Section |
1.32 |
01-Jun-21 |
Updated the Defect Information Section |
1.31 |
24-May-21 |
Updated the Problem Description and Background Sections |
1.3 |
14-May-21 |
Updated the How to Identify Affected Products Section and Added the Serial Number Validation Section |
1.2 |
21-Apr-21 |
Updated the Workaround/Solution Section |
1.1 |
02-Apr-21 |
Update PIDs and workaround section |
1.0 |
29-Mar-21 |
Initial Release |
Affected Product ID | Comments |
---|---|
N9K-C9236C |
|
N9K-C9396TX |
|
N9K-C9396PX |
|
N9K-C93128TX |
|
N9K-C9332PQ |
|
N9K-C9372PX |
|
N9K-C9372TX |
|
N9K-C93120TX |
|
N9K-C9372PX-E |
|
N9K-C9372TX-E |
|
N9K-C92160YC-X |
|
N9K-C9272Q |
|
N9K-C93180YC-EX |
|
N9K-C93108TC-EX |
|
N9K-C9232C |
|
N9K-C93180YC-EX-24 |
|
N9K-C93108TC-EX-24 |
|
N9K-C93180LC-EX |
|
N9K-SUP-B+ |
|
N9K-SUP-B |
|
N9K-SUP-A+ |
|
N9K-SUP-A |
|
N3K-C3232C |
|
N3K-C3264Q |
|
N3K-C31128PQ-10GE |
|
N3K-C31108PC-V |
|
N3K-C31108TC-V |
|
N3K-C3164Q-40GE |
|
N9K-C9236C= |
|
N9K-C9396TX= |
|
N9K-C9396PX= |
|
N9K-C93128TX= |
|
N9K-C9332PQ= |
|
N9K-C9372PX= |
|
N9K-C9372TX= |
|
N9K-C93120TX= |
|
N9K-C9372PX-E= |
|
N9K-C9372TX-E= |
|
N9K-C92160YC-X= |
|
N9K-C9272Q= |
|
N9K-C93180YC-EX= |
|
N9K-C93108TC-EX= |
|
N9K-C9232C= |
|
N9K-SUP-B+= |
|
N9K-SUP-B= |
|
N9K-SUP-A+= |
|
N9K-SUP-A= |
|
N3K-C3232C= |
|
N3K-C3264Q= |
|
N3K-C31128PQ-10GE= |
|
N3K-C31108PC-V= |
|
N3K-C31108TC-V= |
|
N3K-C3164Q-40GE= |
Defect ID | Headline |
---|---|
CSCvx21260 | Nexus 9000/3000 NXOS : M500IT Bootflash in readonly mode |
Due to a flaw in the Solid State Drive (SSD) firmware, the SSD will no longer respond after approximately 3.2 years of cumulative operation.
After the first unresponsive event is experienced, every subsequent power-cycle of the system will allow the drive to operate for another 1008 hours (approximately six weeks) before it will no longer respond again.
After approximately 3.2 years (28,224 accumulated Power On Hours (POH)), a memory buffer overrun condition occurs which triggers the firmware event in the SSD.
This causes the drive to become unresponsive until the drive is power-cycled. No data loss will occur when the memory buffer overrun firmware event occurs. A power-cycle restores normal operation of the drive.
The drive continues to operate normally for approximately six weeks (1008 additional accumulated POH), at which time the drive will become unresponsive again.
Power-cycle the system in order to temporarily recover from this problem. However, this failure will reappear after 1008 hours of operation.
The bootflash on Nexus 9000/3000 switches will no longer respond, which causes failure of operations such as configuration changes/saves, read/write operations, and so on. It might also cause an unexpected reload.
In addition, these log messages are displayed and indicate that the bootflash is in read-only mode.
%$ VDC-1 %$ %KERN-2-SYSTEM_MSG: [ 1677.470266] EXT4-fs error (device sda3) in ext4_write_begin:1358: Journal has aborted - kernel
%$ VDC-1 %$ %KERN-2-SYSTEM_MSG: [ 1677.470410] EXT4-fs error (device sda3): ext4_journal_check_start:61: Detected aborted journal - kernel
%$ VDC-1 %$ %KERN-2-SYSTEM_MSG: [ 1677.470411] EXT4-fs (sda3): Remounting filesystem read-only - kernel
Further, the logs also indicate a bootflash diagnostic test failure.
%$ VDC-1 %$ %DIAGCLIENT-2-EEM_ACTION_HM_SHUTDOWN: Test <BootFlash> has been disabled as a part of default EEM action
%$ VDC-1 %$ %DEVICE_TEST-2-COMPACT_FLASH_FAIL: Module 1 has failed test BootFlash 5 times on device BootFlash due to error Failure
The switch might continue to work, but there will be an error when you try to save the configuration or write to any file on bootflash.
Workaround
Power-cycle the system in order to temporarily recover from this problem. However, this failure will reappear after 1008 hours of operation.
Solution
Upgrade the firmware of the SSD.
In order to prevent this issue and disruption to the network and operations, Cisco recommends to upgrade the firmware of the SSD proactively before the uptime reaches 28,224 hours. See the How to Identify Affected Products section and follow the firmware upgrade procedure accordingly.
If the system is already impacted, the SSD firmware upgrade will permanently resolve this defect.
Note: A Return Material Authorization (RMA) is not recommended as the upgrade process will resolve the issue.
There are three options to mitigate this issue. For all options, it is strongly recommended to upgrade the firmware in a Maintenance Window.
Precheck Before You Upgrade the SSD Firmware
smartctl -a /dev/sda | egrep 'Temperature_Celsius|ID#'
command. If it is 128, then power-cycle/reload the switch before you proceed with the SSD Firmware upgrade options.
Note: An upgrade of the SSD Firmware of the switch with a RAW_VALUE of 128 might result in unexpected behavior after a firmware upgrade (for example, an unexpected reload or read-only drive). Any RAW_VALUE other than 128 for Temperature_Celsius is valid.
Configure bash if not enabled and then run bash:
switch# feature bash switch# run bash sudo su bash-4.2#
For Nexus 9500, enter the rlogin
command from the Active supervisor in order to log in to Standby supervisor.
If slot 28 is the Standby supervisor, enter this command:
bash-4.2# rlogin sup28 root@switch#
If slot 27 is the Standby supervisor, enter this command:
bash-4.2# rlogin sup27 root@switch#
bash-4.4# smartctl -a /dev/sda | egrep 'Temperature_Celsius|ID#' ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 128 (0 65 0 10 255)
nxos.CSCvx21260-n9k_ALL-1.0.1-<NX-OS_Release>.lib32_n9000.rpm bundle (Note: 1.0.1) automatically performs this Temperature_Celsius attribute precheck. If the Temperature_Celsius attribute is read as 128, it will bail out and recommend a reload of the switch to the user.
Option 1 - Upgrade the NX-OS Version
The issue has been fixed in these NX-OS versions:
When the switch is upgraded (disruptive/non-disruptive) or reloaded using the fixed NX-OS version, the SSD firmware version will be automatically upgraded. See the Cisco Nexus 9000 Series NX-OS Software Upgrade and Downgrade Guide, Release 9.3(x) for more information.
Option 2 - Upgrade the SSD Firmware Using the SMU (No Reload Required)
This option is for Versions 7.0(3)I7(x), 9.2(x), 9.3(x), and 10.1(1).
In order to download and install the Software Maintenance Upgrades (SMUs) from the Software Download Center on Cisco.com, complete these steps:
Note: The SMU filename follows this format: nxos.CSCvx21260-n9k_ALL-1.0.0-<NX-OS_Release>.lib32_n9000.rpm or nxos.CSCvx21260-n9k_ALL-1.0.1-<NX-OS_Release>.lib32_n9000.rpm (Temperature_Celsius auto check as described in the Precheck Before You Upgrade the SSD Firmware section, step 2). If there was a previous upgrade performed using nxos.CSCvx21260-n9k_ALL-1.0.0-<NX-OS_Release>.lib32_n9000.rpm, there is no need reinstall the nxos.CSCvx21260-n9k_ALL-1.0.1-<NX-OS_Release>.lib32_n9000.rpm file.
Examples for N9K-C93180YC-EX:
install add bootflash:<SMU_filename> activate
This example shows the command to install the SMU for Cisco NX-OS Software Release 7.0(3)I7(9):
switch# install add bootflash:nxos.CSCvx21260-n9k_ALL-1.0.1-7.0.3.I7.9.lib32_n9000.rpm activate
Notes:
Option 3 - Upgrade the SSD Firmware Using a Script (No Reload Required)
In order to download and install the firmware upgrade script from the Software Download center on Cisco.com, complete these steps:
For 9500 Series Switches with Dual Supervisor, copy upgrade_m500_firmware.tar.gz to Active as well as Standby supervisor bootflash. Perform the upgrade first on Standby supervisor and then Active supervisor.
switch# dir bootflash: | grep upgrade 2151467 Mar 08 19:17:00 2021 upgrade_m500_firmware.tar.gz
For Nexus 9500, verify upgrade_m500_firmware.tar.gz is also in Standby supervisor bootflash.
switch# dir bootflash://sup-standby/ | grep upgrade 2151467 Mar 08 19:18:00 2021 upgrade_m500_firmware.tar.gz
switch# feature bash switch# run bash sudo su bash-4.2#
For Nexus 9500, log in to Standby supervisor for Nexus 9500. Enter the rlogin
command from Active supervisor.
If slot 28 is Standby supervisor, then
bash-4.2# rlogin sup28 root@switch#
If slot 27 is Standby supervisor, then
bash-4.2# rlogin sup27 root@switch#
bash-4.2# cp /bootflash/upgrade_m500_firmware.tar.gz /tmp
bash-4.2# cd /tmp bash-4.2# tar -xvzf upgrade_m500_firmware.tar.gz upgrade_m500_firmware M500_MC03.bin M500_MU05.bin
bash-4.2# ./upgrade_m500_firmware Checking SSD firmware ... Model Number: Micron_M500IT_MTFDDAT064SBD Serial Number: MSA2226001B Firmware Revision: MU01.00 SSD Model: Micron_M500IT_MT Current SSD Firmware Version: 1 Your SSD firmware needs update and will be upgraded Updating the SSD firmware ... /dev/sda: fwdownload: xfer_mode=3 min=1 max=255 size=512
................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ ................................................................ .......................................................... Done. Model Number: Micron_M500IT_MTFDDAT064SBD Serial Number: MSA2226001B Firmware Revision: MU05.00 Current SSD Firmware is 5 SSD Firmware has been updated successfully
Notes:
Additional Information
If you attempt to run the upgrade script on a system already upgraded to latest firmware, the script logs will indicate that the firmware is up-to-date and no action will be taken.
Check the model and firmware version of the bootflash.
switch# conf t switch(config)# feature bash switch(config)# run bash sudo su bash-4.2# smartctl -a /dev/sda | egrep 'Model|Firmware|Hours' Device Model: Micron_M500IT_MTFDDAT064SBD Firmware Version: MU01.00 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 4872 bash-4.2#
From the output of the previous commands, both of these conditions are true on an affected switch:
Power_On_Hours from the output can be used to calculate how much time is left before this issue occurs.
If any of the conditions are not true, then the switch is not in the affected list and no action is required.
This field notice provides the ability to determine if the serial number(s) of a device is impacted by this issue. In order to verify your serial number(s), enter it in the Serial Number Validation tool at https://snvui.cisco.com/snv/FN72150.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
My Notifications—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance