THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.0
|
24-Oct-12
|
Initial Release
|
10.0
|
12-Oct-17
|
Migration to new field notice system
|
10.1
|
04-Mar-19
|
Updated the Defect Information Section
|
Affected Product ID | Comments |
---|---|
8-10GBE
|
|
8-10GBE=
|
Defect ID | Headline |
---|---|
CSCvf34445 | There were no defects filed with this field notice at the time of publication. |
CRS1's 8-10GBE Line cards (LC) built between 9th January 2006 and 5th October 2007 may encounter an issue where by a fuse may fail and the board will cease to operate. Under certain conditions failure can occur on installed boards that are in steady state operation. Failures may also occur while powering up or during OIR.
Note: There are no safety concerns regarding the failure mode of this fuse.
In 2006, a new fuse was introduced on the 8-10GBE Line Card to meet industry regulations. Used in a specific placement on the 8-10GBE, this fuse has encountered long term reliability issues and failures can occur after OIR operations.
The rate of degradation can change due to variability of the fuse's metallic layers as well as the ambient temperature of the CRS system. Ambient temperatures of 35 deg C and above will increase the likelihood of the fuse failing.
The CRS1 8-10GBE board will fail to power up when the fuse has failed.
Sample error log for when LC fuse fails during operation:
SP/0/0/SP:Jan 28 02:49:25.369 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - 5V_A or 5V_B or 5V_C is bad - as indicated by power status registers
SP/0/0/SP:Jan 28 02:49:25.380 : i2c_server[60]: %PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - 1.5V or 1.8V or 3.3V is bad - as indicated by power status registers
SP/0/0/SP:Jan 28 02:49:25.381 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - egress-pse power is bad - as indicated by power status registers
SP/0/0/SP:Jan 28 02:49:25.382 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - CPU power is bad - as indicated by power status registers
SP/0/0/SP:Jan 28 02:49:25.383 : i2c_server[60]:%PLATFORM-I2C-6-LC_POWER_FAIL : LC power-up failed because - PLIM power is bad - as indicated by power status registers
SP/0/0/SP:Jan 28 02:49:25.384 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.6V1 on CPU is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.384 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.8V on CPU is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.385 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 2.5V on CPU is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.386 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 3.3V on CPU is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.388 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.2V on METRO1 is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.388 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 2.5V on METRO1 is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.389 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 5V_C on LC is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.390 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 3.3V on LC is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.391 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.8V on LC is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.391 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - 1.5V on LC is bad - as indicated by power good registers
SP/0/0/SP:Jan 28 02:49:25.392 : i2c_server[60]:%PLATFORM-I2C-7-LC_BAD_VRM_INFO : LC power-up failed because - power-supply on PLIM is bad - as indicated by power good registers
Sample error log for the LC fuse fail during a subsequent boot attempt:
LC/0/0/CPU0:Apr 12 07:23:21.449 : cpuctrl[220]: %PLATFORM-CPUCTRL-3-HW_DETECTED_ERROR_LINK : HW error interrupt link, port = 9 interrupt_id = 0x0, port_link_error = 0x00000001, port_link_crc_count = 0x00000003
LC/0/0/CPU0:Apr 12 07:23:21.458 : pse_driver[173]: %L2-PSE-7-ERR_EXIT : Exit on error: M0: Head FIFO overflow. Threshold value=0xffffffff: Caused by Input/output error : pkg/bin/pse_driver : (PID=36914) : -Traceback= 482251f0 4820f7e4 48213980 48213cb0 48214204 fc5d3dd4 fc5cd020 fc1b7f88
LC/0/0/CPU0:Apr 12 07:23:21.454 : egressq[125]: %L2-EGRESSQ-3-HW_ERROR : Sharq ENQ packet length error occurred. RP/0/RP0/CPU0:Apr 12 07:23:35.068 : shelfmgr[333]: %PLATFORM-SHELFMGR-3-NODE_RESET_BRINGDOWN : Reset node 0/0/CPU0 due to heartbeat loss
LC/0/5/CPU0:Apr 12 07:23:38.427 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-LNS_LOP_DROP : low availability of planes, aggr cell drop count: 110
LC/0/4/CPU0:Apr 12 07:23:38.430 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-LNS_LOP_DROP : low availability of planes, aggr cell drop count: 130
LC/0/1/CPU0:Apr 12 07:23:38.460 : ingressq[156]: %DRIVERS-INGRESSQ_DLL-4-LNS_LOP_DROP : low availability of planes, aggr cell drop count: 356
RP/0/RP0/CPU0:Apr 12 07:23:40.219 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: BRINGDOWN
RP/0/RP0/CPU0:Apr 12 07:23:40.609 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/CPU0, state: BRINGDOWN RP/0/RP0/CPU0:Apr 12 07:23:40.920 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/CPU0, state: PRESENT
RP/0/RP0/CPU0:Apr 12 07:23:51.232 : shelfmgr[333]: %PLATFORM-MBIMGR-7-IMAGE_VALIDATED : 0/0/SP: MBI bootflash:mbis/hfr-os-mbi-3.3.1.CSCek61756-1.0.0/cfc2413f7ad0e7e65a1c7f12c0f7aec4/mbihfr-sp.vm validated
RP/0/RP0/CPU0:Apr 12 07:23:52.085 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: MBI-BOOTING
RP/0/RP0/CPU0:Apr 12 07:24:08.049 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: MBI-RUNNING RP/0/RP0/CPU0:Apr 12 07:24:25.426 : invmgr[205]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/0/SP, state: IOS XR RUN
SP/0/0/SP:Apr 12 07:24:06.014 : init[65541]: %OS-INIT-7-MBI_STARTED : total time 8.478 seconds
SP/0/0/SP:Apr 12 07:24:16.489 : sysmgr[73]: %OS-SYSMGR-5-NOTICE : Card is COLD started
SP/0/0/SP:Apr 12 07:24:18.712 : init[65541]: %OS-INIT-7-INSTALL_READY : total time 21.192 seconds
SP/0/0/SP:Apr 12 07:24:34.784 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:36.588 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:39.405 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 2) will be respawned in 5 seconds
SP/0/0/SP:Apr 12 07:24:39.399 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:45.363 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:45.406 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 3) will be respawned in 5 seconds SP/0/0/SP:Apr 12 07:24:45.402 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:51.778 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:51.853 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 4) will be respawned in 5 seconds SP/0/0/SP:Apr 12 07:24:51.849 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:57.662 : envmon[104]: %PLATFORM-CCTL-3-ERROR_EXIT : Envmon process exiting because read the board type from hardware, error code 'Subsystem(8191)' detected the 'unknown' condition 'Code(63)': Unknown Error(511)
SP/0/0/SP:Apr 12 07:24:57.714 : sysmgr[73]: %OS-SYSMGR-2-REBOOT : reboot required, process (envmon) reason (maximum restart attempts exceeded)
SP/0/0/SP:Apr 12 07:24:58.111 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(1) (jid 104) can not be restarted, entering slow-restart mode
SP/0/0/SP:Apr 12 07:24:58.118 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon(104) (fail count 5) will be respawned in 30 seconds SP/0/0/SP:Apr 12 07:24:57.709 : sysmgr[73]: envmon(1) (jid 104) abnormally terminated, restart scheduled
SP/0/0/SP:Apr 12 07:24:58.129 : sysmgr[73]: %OS-SYSMGR-3-ERROR : envmon[104] (pid 69678) has not sent proc-ready within 45 seconds
SP/0/0/SP:Apr 12 07:24:58.614 : /pkg/bin/sysmgr_log[65585]: %OS-SYSMGR-4-CHECK_LOG : /pkg/bin/shutdown_debug_script invoked by sysmgr. Reason: (envmon) maximum restart attempts exceeded, Compressed output will be saved.
Cisco recommends replacing the suspect LC hardware (8-10GBE). The upgrade program is now closed and replacement is supported via Cisco RMA process.
As of approximately 1st November 2007 new products that were manufactured under Engineering Change Order (ECO) E097834 should be free of this problem. Refer to "How to Identify Affected Products" below for instructions on how to view the version and serial number.
Note: Products with ECO E097834 applied are not affected even when the LC fall within the serial number listed in tool as affected. Boards with TAN 800-24545-08 has ECO E097834 applied.
LC TAN or Part number | Steps | Action |
---|---|---|
800-24545-07 and lower | Check Serial number | If Affected, request replacement by following Cisco RMA process |
800-24545-08 and Higher | No Check required | LC is good, No replacment required |
The hardware level and serial number of the 8-10GBE Line Card can be verified by running CLI command or inspecting the LC physically. Both steps are listed below.
A) Using CLI Command:
1) Check the CRS1 8-10GBE TAN by using the show diag command below. If the TAN is 800-24545-08 or higher, the 8-10GBE is already upgraded and does NOT need replacing, and no further checks are necessary.
2) If the TAN is 800-24545-07 and lower, check suspect Serial Numbers by clicking on the SN Validation tool. If this tool returns results as 'Affected', the 8-10GBE is suspect. Please request board(s) replacement by filling the Upgrade form below.
Sample 'show diag' output (in admin mode) for identifying a 8-10GBE that needs to be replaced:
RP/0/RP0/CPU0:ios#sh diag ... Output truncated..... ECI: 173644 PLIM 0/PL0/* : Cisco CRS-1 Series 8x10GbE Interface Module MAIN: board type 600095 800-24545-05 rev A0 <--- TAN dev N/A S/N SAD1nnnnnn <---- Serial Number PCA: 73-9231-09 rev A0 PID: 8-10GBE VID: V05 CLEI: IPUIA1CRAA ECI: 147655 Interface port config: 8 Ports Optical reach type: Unknown Connector type: SC NODE 0/0/CPU0 Node State : IOS XR RUN PLD: Motherboard: 0x0015, Processor: 0x0015, Power: N/A MONLIB: QNXFFS Monlib Version 3.1 ROMMON: Version 1.54(20091016:214209) [CRS-1 ROMMON] CARD 0/1/* : Cisco CRS-1 Series Modular Services Card revision B MAIN: board type 500063 800-27067-08 rev A0 dev N/A S/N SAD1403008B PCA: 73-10334-08 rev A0
Sample show diag output (in admin mode) for identifying a good 8-10GBE LC that does not need to be replaced:
RP/0/RP0/CPU0:ios#sh diag ... Output truncated..... ECI: 173644 PLIM 0/PL0/* : Cisco CRS-1 Series 8x10GbE Interface Module MAIN: board type 600095 800-24545-08 rev A1 <--- TAN dev N/A S/N SAD1nnnnnn <---- Serial Number PCA: 73-9231-09 rev A0 PID: 8-10GBE VID: V05 CLEI: IPUIA1CRAA ECI: 147655 Interface port config: 8 Ports Optical reach type: Unknown Connector type: SC NODE 0/0/CPU0 Node State : IOS XR RUN PLD: Motherboard: 0x0015, Processor: 0x0015, Power: N/A MONLIB: QNXFFS Monlib Version 3.1 ROMMON: Version 1.54(20091016:214209) [CRS-1 ROMMON] CARD 0/1/* : Cisco CRS-1 Series Modular Services Card revision B MAIN: board type 5000 800-27067-08 rev A0 dev N/A S/N SAD1403008B PCA: 73-10334-08 rev A0
B) Physically Checking the Line Card
Refer to the picture below for location of TAN and Serial number. The picture is for suspect 8-10GBE LC.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance