Understand DNA Center 3-Node Cluster HA Scenarios and Network Connectivity Details

Available Languages

Download Options

PDF (587.7 KB)
View with Adobe Reader on a variety of devices
ePub (661.9 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (619.5 KB)
View on Kindle device or Kindle app on multiple devices

Updated:September 13, 2019

Document ID:214471

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Failure Scenarios and Cluster Behavior

Physical Topology Option 1

Physical Topology Option 2 (Most Recommended)

Physical Topology Option 3 (For Data Center Type Environment)

Physical Topology Option 4 (Not Recommended)

Introduction

This document describes the supported network connectivity for various networks involved in Cisco DNA Center 3-node cluster deployment.

Prerequisites

Familiarize yourself with the basic information about the 3-node Cisco DNA Center cluster and High Availability (HA) with these articles:

Cisco DNA Center Install Guide - This guide describes step-by-step how to bring up a 3-node cluster.
Administrator Guide for Cisco DNA Center

Description

For Cisco DNA Center 1.2.8 version, the 3-node HA cluster is supported for Base Automation and SD-Access Automation. In 1.2.8/1.2.10, HA is still in Beta release for Assurance.

The HA of Cisco DNA Center offers more resiliency and reduces downtime when a node or service or a network link goes down. When a failure occurs, this framework helps to restore your network to its previous operational state. If this is not possible, Cisco DNA Center indicates that there is an issue that requires your attention.

Any time the HA framework of Cisco DNA Center determines that a change on a cluster node has taken place, it synchronizes this change with the other nodes. The supported synchronization types include:

Database changes, such as updates related to configuration, performance, and data supervision
File changes, such as report configurations, configuration templates, TFTP-root directory, administration settings, licensing files, and the key store

The current Cisco DNA Center software supports a minimum 3-node cluster for HA to work. Once the cluster is set up, it can manage single-node failure. A minimum of two nodes is required to set the quorum. Without a 2-node quorum, the cluster is declared down. If you use SD-Access Fabric then cluster failure only results in failure to provision Automation but still, your SD-Access fabric user network traffic continues to forward as Cisco DNA Center is not responsible for any Control or Data traffic.

In this document you look at various failure points and how the cluster mitigates the downtime in order to keep Cisco DNA Center operational at all times. You mainly focus on network connectivity aspects of a 3-node cluster. For services and all other information refer to the install and admin guide.

Network Connectivity

Cisco DNA Center uses these types of network connectivity:

1. 10 Gbps Cluster Link

2. 1 Gbps GUI/Management Link

3. 1 Gbps Cloud Link (Optional)

4. 10 Gbps Enterprise Link

5. 1 Gbps CIMC Link

It is assumed that proper intra-cluster-IP-ARP resolution happens and connectivity is ensured between all three nodes. In addition, it is recommended to have <10ms RTT between cluster links for all the scenarios.

Failure Scenarios and Cluster Behavior

In general, cluster services redistribution happens under these conditions:

1. Single node goes down: Services are distributed to the other two nodes and the cluster is still operational.

2. Enterprise network link goes down for single node: No service redistribution. Only reachability to the enterprise network from the failed node does not work.

3. Cluster Network link goes down: Services are redistributed to the other two nodes and the cluster is still operational.

4. All other network links go down except the cluster link for a single node: A node is not able to service the expected functions but all services and clusters operate normally.

5. Service failure on a single node: The service tries to restart. In most of the scenarios, it tries to restart on the same node but currently, there is no affinity to the node so it can start on any node.

6. Network switch goes down: Based upon different types of topology, the cluster operates normally or the service is redistributed or everything is down.

Physical Topology Option 1

Initially, connectivity to the network was recommended by the engineering. Both Image 1 and Image 2 provide connectivity where each type of Network link from all nodes is connected to the same physical switch. For example, the Enterprise network link from all three nodes is connected to the same physical switch.

Image 1

Recommended Connectivity Configuration to Ensure Cluster Stability

Image 2

This topology provides these types of failure scenarios where the cluster is still operational:

1. Single Node Failure

2. Enterprise Network Link Failure

3. Cluster Link Failure

4. Service Failure

This topology is not able to manage a complete switch down for any of the network links.

Failure Condition	Impact/Cluster State
Single Node Down	The cluster is still operational with the other two nodes.
Single Link Down for any of the Network Link	The cluster continues to operate normally. Services are distributed only if the cluster link goes down.
Switch goes Down	The cluster is unusable for automation.

Physical Topology Option 2 (Most Recommended)

Image 3 provides the connectivity where all of the Network Links from the same node are connected to the same physical switch. All links from a node are connected to the same physical switch with separation with the use of VLANs, or they can be connected to different switches. For example, Link from Node-1 is connected to Switch-1, Link from Node-2 is connected to Switch-2, and so forth.

Image 3

Enhanced Connectivity Configuration with Individual Node-Link Mapping

This topology provides these types of failure scenarios where the cluster is still operational:

1. Single Node Failure

2. Enterprise Network Link Failure for Single Node

3. Cluster Link Failure for Single Node

4. Service Failure for Single Node

5. Single Network Switch Failure for Single Node

Failure Condition	Impact/Cluster State
Single Node Down	The cluster is still operational with the other two nodes.
Single Link Down for any of the Network Link	The cluster continues to operate normally. Services are distributed only if the cluster link goes down.
Single switch goes down	The cluster is still operational with the other two nodes.

Physical Topology Option 3 (For Data Center Type Environment)

This topology is similar to Option 2, except, you can have three Layer-2 switches that connect to the gateway. All information is similar to Option 2.

Image 4

Enhanced Connectivity Configuration with Multiple Layer-2 Switches

Physical Topology Option 4 (Not Recommended)

Image 4 provides the connectivity where two nodes are connected to the same switch while the other node is connected to a different switch. This topology is least recommended as failure to switch which has multiple links connected can bring down the cluster.

Recommended Connectivity Configuration to Ensure Cluster Stability

This topology provides these types of failure scenarios where the cluster is still operational.

1. Single Node Failure

2. Enterprise Network Link Failure for Single Node

3. Cluster Link Failure for Single Node

4. Service Failure for Single Node

This topology is not able to manage a complete switch down for any of the network links.

Failure Condition	Impact / Cluster State
Single Node Down	The cluster is still operational with the other two nodes.
Single Link Down for any of the network links except the Cluster Link	The cluster continues to operate normally.
Single Cluster Link Down	Services are distributed to the other two nodes and continue operation.
Single switch goes down	The cluster can go down if a switch that has multiple links goes down.

Some additional failure scenarios and states are covered in Administrator Guide for Cisco DNA Center 1.2.10.

Revision History

Revision	Publish Date	Comments
1.0	24-May-2019	Initial Release

Contributed by

Parthiv Shah
EN Eng Switching Escalation

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Understand DNA Center 3-Node Cluster HA Scenarios and Network Connectivity Details

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Description

Network Connectivity

Failure Scenarios and Cluster Behavior

Physical Topology Option 1

Physical Topology Option 2 (Most Recommended)

Physical Topology Option 3 (For Data Center Type Environment)

Physical Topology Option 4 (Not Recommended)

Revision History

Contributed by

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products