Overview

Cisco NX-OS is a resilient operating system that is specifically designed for high availability at the network, system, and process level.

This chapter describes high-availability (HA) concepts and features for Cisco NX-OS devices and includes the following sections:

Supported Platforms

Starting with Cisco NX-OS release 7.0(3)I7(1), use the Nexus Switch Platform Support Matrix to know from which Cisco NX-OS releases various Cisco Nexus 9000 and 3000 switches support a selected feature.

About High Availability

To prevent or minimize traffic disruption during hardware or software failures, Cisco NX-OS has these features:

  • Redundancy—Cisco NX-OS HA provides physical and software redundancy at every component level, spanning across the physical, environmental, power, and system software aspects of its architecture.

  • Isolation of planes and processes—Cisco NX-OS HA provides isolation between control and data forwarding planes within the device and between software components so that a failure within one plane or process does not disrupt others.

  • Restartability—Most system functions and services are isolated so that they can be restarted independently after a failure while other services continue to run. In addition, most system services can perform stateful restarts, which allow the service to resume operations transparently to other services.

  • Supervisor stateful switchover—The Cisco Nexus 9504, 9508, and 9516 chassis support an active and standby dual supervisor configuration. State and configuration remain constantly synchronized between the two supervisor modules to provide seamless and stateful switchover in the event of a supervisor module failure.

  • Nondisruptive upgrades—Cisco NX-OS supports the in-service software upgrade (ISSU) feature, which allows you to upgrade the device software while the switch continues to forward traffic. ISSU reduces or eliminates the downtime typically caused by software upgrades.

Service-Level High Availability

Cisco NX-OS has a modularized architecture that compartmentalizes components for fault isolation, redundancy, and resource efficiency.

Isolation of Processes

In the Cisco NX-OS software, independent processes, known as services, perform a function or set of functions for a subsystem or feature set. Each service and service instance runs as an independent, protected process. This approach provides a highly fault-tolerant software infrastructure and fault isolation between services. A failure in a service instance (such as BGP) does not affect any other services that are running at that time (such as the Link Aggregation Control Protocol [LACP]). In addition, each instance of a service can run as an independent process, which means that two instances of a routing protocol (for example, two instances of the Open Shortest Path First [OSPF] protocol) can run as separate processes.

Process Restartability

Cisco NX-OS processes run in a protected memory space independently from each other and the kernel. This process isolation provides fault containment and enables rapid restarts. Process restartability ensures that process-level failures do not cause system-level failures. In addition, most services can perform stateful restarts, which allow a service that experiences a failure to be restarted and to resume operations transparently to other services within the platform and to neighboring devices within the network.

System-Level High Availability

The Cisco Nexus 9000 Series switches are protected from system failure by redundant hardware components and a high-availability software framework.

Physical Redundancy

The Cisco Nexus 9000 Series switches have the following physical redundancies:

  • Power Supply Redundancy—To provide redundant power input to the chassis, the Cisco Nexus 9000 Series switches support the following number of power supply modules:

    Cisco Nexus 9000 Series Switches

    Maximum Number of Supported Power Supply Modules

    9200, 9300, 9300-EX, 9300-FX, 9300-FX2, and 9300-FXP platform switches

    2

    9504 switch

    4

    9508 switch

    8

    9516 switch

    10

  • Fan Tray Redundancy—For cooling the system, the Cisco Nexus 9000 Series switches support the following number of fan trays:

    Cisco Nexus 9000 Series Switches

    Maximum Number of Supported Fan Trays

    9272Q, 92304QC, and 93120TX switches

    2

    • 9336C-FX2 switch

    • 9348GC-FXP switch

    • 9364C switch

    • 9396PX/TX and 93128TX switches

    • 9504, 9508, and 9516 switches

    3

    • 9236C, 92160YC-X, and 92300YC switches

    • 9332PQ and 9372PX/PX-E/TX/TX-E switches

    • 9300-EX platform switches

    • 93108TC-FX and 93180YC-FX switches

    4

    C9332C switch

    5

    • N9K-C93360YC-FX2

    • N9K-C92348GC-X

    3

    • N9K-C9364C-GX

    4

    • N9K-C9316D-GX

    • N9K-C93600CD-GX

    6

    • N9K-C93180YC-FX3

    4

  • Fabric Redundancy—Cisco NX-OS provides switching fabric availability through redundant switch fabric modules. You can configure a single Cisco Nexus 9500 platform chassis with one to six switch fabric cards for capacity and redundancy.


    Note


    The Cisco Nexus 9200, 9300, 9300-EX, and 9300-FX platform chassis do not have fabric modules.
  • System Controller Redundancy—A pair of redundant system controllers in the Cisco Nexus 9500 platform chassis offloads chassis management functions from the supervisor modules. You can have two of the same type or you can mix as follows:

    Active

    Standby

    Ok?

    A

    A

    Yes

    B

    B

    Yes

    A

    A+

    Yes

    B

    B+

    Yes

    A

    B

    Not unless A is able to failover to B

    B

    A

    Not unless A is able to failover to B

    A+

    B+

    Not unless A+ is able to failover to B+

    B+

    A+

    Not unless A+ is able to failover to B+


    Note


    The Cisco Nexus 9200, and 9300, 9300-EX, and 9300-FX platform chassis do not contain system controllers.

    Note


    Supervisor A and A+ are not supported on N9K-C950x-FM-R fabric modules.


  • Supervisor Module Redundancy—The Cisco Nexus 9500 platform chassis support dual supervisor modules to provide redundancy for the control and management plane.


    Note


    The Cisco Nexus 9200, 9300, 9300-EX, and 9300-FX platform chassis do not support supervisor module redundancy.

ISSU

Cisco NX-OS allows you to perform an in-service software upgrade (ISSU), which is also known as a nondisruptive upgrade. The modular software architecture of Cisco NX-OS supports plug-in-based services and features, which allow you to perform complete image upgrades of supervisors and switching modules with little to no impact on other modules. Because of this design, you can upgrade Cisco NX-OS nondisruptively with no impact to the data forwarding plane and allow for nonstop forwarding during a software upgrade, even between full image versions.


Note


ISSU feature is disruptive on any chassis with the fabric modules N9K-C95xx-FM-Ex and N9K-C950x-FM-R.


Network-Level High Availability

Network convergence is optimized by providing tools and functions to make both failover and fallback transparent and fast.

Layer 2 HA Features

Cisco NX-OS provides these Layer 2 HA features:

  • Spanning Tree Protocol (STP) enhancements, such as Bridge Protocol Data Unit (BPDU) Guard, Loop Guard, Root Guard, BPDU Filters, and Bridge Assurance, to guarantee the health of the STP control plane

  • Unidirectional Link Detection (UDLD) Protocol

  • IEEE 802.3ad link aggregation


    Note


    Virtual port channels (vPCs) allow you to create redundant physical links between two systems that act as a logical single link.


Layer 3 HA Features

Cisco NX-OS provides these Layer 3 HA features:

  • Nonstop forwarding (NSF) graceful restart extensions for routing protocols

    Open Shortest Path First version 2 (OSPFv2), OSPFv3, Intermediate System to Intermediate System (IS-IS), Enhanced Interior Gateway Routing Protocol (EIGRP), and Border Gateway Protocol (BGP) utilize graceful restart extensions to the base protocols to provide nonstop forwarding and least obtrusive routing recovery for those environments.

  • Shortest Path First (SPF) optimizations such as link-state advertisement (LSA) pacing and incremental SPF

  • Protocol-based periodic refresh

  • Millisecond timers for First-Hop Redundancy Protocols (FHRPs) such as the Hot Standby Router Protocol (HSRP) and the Virtual Router Redundancy Protocol (VRRP)


Note


For more information on these Layer 3 routing protocols, see the Cisco Nexus 9000 Series NX-OS Unicast Routing Configuration Guide.


Additional Management Tools for Availability

Cisco NX-OS incorporates several Cisco system management tools for monitoring and notification of system availability events.

EEM

Cisco Embedded Event Manager (EEM) consists of Event Detectors, the Event Manager, and an Event Manager Policy Engine. Using EEM, you can define policies to take specific actions when the system software recognizes certain events through the Event Detectors. The result is a flexible set of tools to automate many network management tasks and to direct the operation of Cisco NX-OS to increase availability, collect information, and notify external systems or personnel about critical events.

For information about configuring EEM, see the Cisco Nexus 9000 Series NX-OS System Management Configuration Guide.

Smart Call Home

Combining Cisco GOLD and Cisco EEM capabilities, Smart Call Home provides an e-mail-based notification of critical system events. Smart Call Home has message formats that are compatible with pager services, standard e-mail, or XML-based automated parsing applications. You can use this feature to page a network support engineer, e-mail a network operations center, or use Cisco Smart Call Home services to automatically generate a case with Cisco’s Technical Assistance Center (TAC).

For information about configuring Smart Call Home, see the Cisco Nexus 9000 Series NX-OS System Management Configuration Guide.

Software Image

The Cisco NX-OS software consists of one NXOS software image.

Virtual Device Contexts

Cisco NX-OS can segment operating system and hardware resources into virtual device contexts (VDCs) that emulate virtual devices. The Cisco Nexus 9000 Series switches currently do not support multiple VDCs. All switch resources are managed in the default VDC.