Scale-Up Your Network Monitoring Strategy Using Telemetry

Table 1. Feature History

Feature Name

Release Information

Feature Description

Telemetry

Cisco IOS XR Release 6.5.31

The traditional methods of monitoring your network such as SNMP, Syslog, and CLI use a pull model to request information at regular intervals. The data that you collect may help you to efficiently monitor your network of a manageable size. However, as your network grows in complexity and scale, the data that you poll may be insufficient for efficient and effective monitoring.

Telemetry uses a push model that automatically streams data from a server. Instead of a client requesting data at periodic intervals, the server streams operational data in real time. Telemetry focuses on the power of scale, speed, and automation.

Are you monitoring your network using traditional polling methods such as SNMP, Syslog, and CLI? If yes, does the data that you extract from your network help you answer these questions?
  • What percentage of the network bandwidth does the network traffic currently consume?

  • Do all the links in the network run at a hundred percent utilization rate?

  • If an unmanned router fails, is the network operator notified in real time about the issue and its related consequences?

  • Is the CPU over- or under-utilized?

  • Can the efficiency of the network be calculated based on traffic and data loss?

  • What are the possible performance issues that cause traffic loss or network latency?

  • How do you proactively prevent issues that may arise? Does the data support the study of network patterns in real time?

These traditional methods use a pull model to request information at regular intervals. The data that you collect may help you to efficiently monitor your network of a manageable size. However, as your network grows in complexity and scale, the data that you poll may be insufficient for efficient and effective monitoring. Additionally, the polling methods are resource-intensive, and network operators face information gaps in the data that they collect. With the pull model, the network device (the server) sends data only when the data collector (the client) requests it. Initiating such requests requires continual manual intervention. This manual intervention makes this model unsuitable, and limits automation and the ability to scale. It inhibits the visibility of the network and therefore provides inefficient control of the network. You need monitoring strategy that adds resiliency and stability to your network.

Telemetry does just that. Telemetry uses a push model that automatically streams data from a network device. Instead of a collector requesting data at periodic intervals, the network device streams operational data in real time.

Telemetry focuses on the power of scale, speed, and automation. With the power of flexibility, you can select data of interest from the routers and transmit it in a structured format to remote management stations for monitoring. Using the finer granularity and higher frequency of data available through telemetry, DevOps (development and operations) engineers in your organization can quickly locate and investigate issues as soon as they occur. They can, thus, collaborate to monitor and have better control over the network.

The following image shows the comparative benefits of streaming telemetry data using the telemetry push model over traditional pull models. The pull models create resource bottlenecks that prevent retrieving valuable operational data from the router. On the other hand, the push model is designed to remove such bottlenecks and deliver data efficiently.

Figure 1. Comparison Between Traditional Pull Models and Telemetry Push Model

This article describes the benefits of using telemetry data and the various methods to stream meaningful data from your network device:

Benefits of Shifting Network Monitoring from Pull Models to Telemetry Push Model

Real-time telemetry data is useful in:

  • Managing network remotely: The primary benefit of telemetry is the ability it offers you as an end user to monitor the state of a network element remotely. After the network is deployed, you cannot be physically present at the network site to find out what works, and what is cumbersome. With telemetry, those insights can be analyzed, leveraged, and acted upon from a remote location.

  • Optimizing traffic: When link utilization and packet drops in a network are monitored at frequent intervals, it is easier to add or remove links, re-direct traffic, modify policing, and so on. With technologies like fast reroute, the network can switch to a new path and re-route faster than the traditional SNMP poll interval mechanism. Streaming telemetry data helps in providing quick response time for faster transport of traffic.

  • Preventive troubleshooting: Network state indicators, network statistics, and critical infrastructure information are exposed to the application layer, where they are used to enhance operational performance and to reduce troubleshooting time. The finer granularity and higher frequency of data available through telemetry enables better performance monitoring and therefore, better troubleshooting.

  • Visualizing data: Telemetry data acts as a data lake that analytics toolchains and applications use to visualize valuable insights into your network deployments.

  • Monitoring and controlling distributed devices: The monitoring function is decoupled from the storage and analysis functions. This decoupling helps to reduce device dependency, while providing flexibility to transform data using pipelines. These pipelines are utilities that consume telemetry data, transform it, and forward the resulting content to a downstream, typically off-the-shelf, consumer. The supported downstream consumers include Apache Kafka, Influxdata, Prometheus, and Grafana.

Streaming telemetry, thus, converts the monitoring process into a Big Data proposition that enables the rapid extraction and analysis of massive data sets to improve decision-making.

Review Mechanisms to Stream Telemetry Data from a Router to a Destination

Telemetry data can be streamed using either cadence-driven or event-driven mechanisms.

Figure 2. Cadence-driven and Event-driven Telemetry

Cadence-driven Telemetry

Cadence-driven telemetry continually streams data (operational statistics and state transitions) at a configured cadence. The higher frequency of the data that is continuously streamed helps you closely identify emerging patterns in the network.

The following image shows a continuous stream of data after a configured time interval:

Figure 3. Cadence-driven Telemetry

Event-driven Telemetry

Event-driven telemetry optimizes data that is collected at the receiver and streams data only when a state transition occurs and thus optimizes data that is collected at the receiver. For example, EDT streams data about interface state transitions, IP route updates, and so on.

The following image shows a stream of data after a state change:

Figure 4. Event-driven Telemetry

Learn About the Elements that Enable Streaming Telemetry Data

These elements are the building blocks in enabling telemetry in a network.

Sensor Path

The sensor path describes a YANG path or a subset of data definitions in a YANG data model with a container. In a YANG model, the sensor path can be specified to end at any level in the container hierarchy.

A YANG module defines a data model through the data of the router, and the hierarchical organization and constraints on that data.

YANG defines four node types. Each node has a name. Depending on the node type, the node either defines a value or contains a set of child nodes. The nodes types for data modeling are:
  • leaf node - contains a single value of a specific type

  • leaf-list node - contains a sequence of leaf nodes

  • list node - contains a sequence of leaf-list entries, each of which is uniquely identified by one or more key leaves

  • container node - contains a grouping of related nodes that have only child nodes, which can be any of the four node types

For more information about data models, see the Programmability Configuration Guide for Cisco NCS 4000 Series Routers.

The following table shows few examples of sensor paths. For the complete list of supported sensor paths, see Supported Sensor Paths.

Table 2. Sensor Paths

Feature

Sensor Path

CPU

Cisco-IOS-XR-wdsysmon-fd-oper:system-monitoring/cpu-utilization

Memory

Cisco-IOS-XR-nto-misc-oper:memory-summary/nodes/node/summary

Interface

Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters

Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/data-rate

openconfig-interfaces:interfaces/interface

Optical power levels

Cisco-IOS-XR-dwdm-ui-oper:dwdm/ports/port/info/optics-info

Node summary

Cisco-IOS-XR-nto-misc-oper:memory-summary/nodes/node/summary

Forwarding information base (FIB)

Cisco-IOS-XR-fib-common-oper:fib-statistics/nodes/node/drops

Cisco-IOS-XR-fib-common-oper:fib/nodes/node/protocols/protocol/vrfs/vrf/summary

MPLS Traffic engineering (MPLS-TE)

Cisco-IOS-XR-mpls-te-oper:mpls-te/tunnels/summary

Cisco-IOS-XR-ip-rsvp-oper:rsvp/interface-briefs/interface-brief

Cisco-IOS-XR-mpls-te-oper:mpls-te/fast-reroute/protections/protection

Cisco-IOS-XR-mpls-te-oper:mpls-te/signalling-counters/signalling-summary

Cisco-IOS-XR-mpls-te-oper:mpls-te/p2p-p2mp-tunnel/tunnel-heads/tunnel-head

MPLS Label distribution protocol (MPLS-LDP)

Cisco-IOS-XR-mpls-ldp-oper:mpls-ldp/nodes/node/bindings-summary-all

Cisco-IOS-XR-mpls-ldp-oper:mpls-ldp/global/active/default-vrf/summary

Cisco-IOS-XR-mpls-ldp-oper:mpls-ldp/nodes/node/default-vrf/neighbors/neighbor

Routing

Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-global

Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/neighbors/neighbor

Cisco-IOS-XR-ip-rib-ipv4-oper:rib/rib-table-ids/rib-table-id/summary-protos/summary-proto

Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/levels/level/adjacencies/adjacency

Cisco-IOS-XR-ipv4-bgp-oper:bgp/instances/instance/instance-active/default-vrf/process-info

Cisco-IOS-XR-ip-rib-ipv6-oper:ipv6-rib/rib-table-ids/rib-table-id/summary-protos/summary-proto


Note

Use specific paths to avoid streaming data that you may not be interested. For example, if you want to stream information about only the summary of MPLS-TE, use sensor-path Cisco-IOS-XR-mpls-te-oper:mpls-te/autotunnel/mesh/summary instead of sensor-path Cisco-IOS-XR-mpls-te-oper:mpls-te sensor path.


The router streams telemetry data at predefined gather points in the data model even if sensor-path configuration is to an individual leaf. The gather points are collection units; collection always happens at that level for operational data.

The router supports the following sensor-path resolutions:
  • Streaming data at the leaf-level or at the container-level under a gather point for cadence-based subscriptions.

    If a subscritpion has multiple sensor-paths that resolve to the same gather point and have the same cadence and encoding, data is pushed in a single collection stream for all the leaves. For example:
    
    telemetry model-driven
    destination-group tftp_server
      address-family ipv4 209.165.201.1 port 1234
       encoding json
       protocol tcp
      !
    !
    sensor-group Group1
      sensor-path Cisco-IOS-XR-ip-rsvp-oper:rsvp/nsr/status
    !
    subscription sub1
      sensor-group-id Group1 sample-interval 6000
      destination-id tftp_server
    !
    !
    
  • For event-driven subscriptions, streaming is always at the gather point in the model, even if specific leaves or leaf is configured as sensor-path. There is configuration to restrict streaming specific leaves for event-driven subscriptions. If this configuration is used, the sensor-path of the configured leaf streams data even if there is a change in one of its adjacent leaves. This indicates that even if there is no change in value of the configured leaf, data can stream out to the collector. The collector must be set to check if the leaf value changed before taking action on the streamed data.
    
    telemetry model-driven
    include select-leaves-on-events
    

    Note

    It is not recommended to configure sensor-paths with the same gather point into different subscriptions.


An MDT-capable device, such as a router, associates the sensor path to the nearest container path in the model. The router encodes and streams the container path within a single telemetry message. A receiver receives data about all the containers and leaf nodes at and below this container path. The router streams telemetry data, for one or more sensor-paths, at the configured frequency (Cadence-driven Telemetry), or when an event occurs (Event-driven Telemetry), to one or more collectors through subscribed sessions.

Subscription

A subscription binds one or more sensor paths and destinations.

The collector uses the subscription to receive updates about the state of data on the router. A subscription can consist of one or more sensor paths. The data for the paths that you have subscribed starts streaming until the session is terminated by the collector or the telemetry subscription configuration is removed to cancel the subscription.

Encoder

Data that is streamed from a router can be encoded using one of these formats:

  • GPB encoding: Configuring for GPB encoding requires metadata in the form of compiled .proto files. A .proto file describes the GPB message format which is used to stream data. The .proto files are available at Cisco Network Telemetry Proto in Github.

  • Self-describing GPB encoding: Data streamed for each sensor path is in a self-describing and ASCII text format. A single .proto file, telemetry.proto, is used by the collector to decode any sensor path data. Self-describing GPB encoding is easier to manage because it needs single .proto file to decode any sensor path data, even though the message size is large.

  • JSON encoding: Data is streamed in strings of keys and its values in a human-readable format.

Transport

In the telemetry push model, the router streams telemetry data using a transport protocol. The generated data is encapsulated into the desired format using encoders.

Model-Driven Telemetry (MDT) data is streamed through Transmission Control Protocol (TCP) that is used only for dial-out mode.

Dial-Out Mode

In a dial-out mode, the router dials out to the receiver to establish a subscription-based telemetry session. Because the router initiates the connection, there is no need to manage the ports for inbound traffic. In this default mode of operation, the protocol you use to establish a session is TCP. A simple protocol requires only accessibility to the socket on the collector. A secure protocol, additionally, offers security capabilities to authenticate and encrypt the session. You can, therefore, secure your collector, and establish a much advanced method of communication with the router. If the connection between the router and the destination is lost, the router re-establishes the connection with the destination and continues to push data again. However, data transmitted during the time of reconnection is lost.

To explore the dial-out mode, and to create a dial-out session, see Establish a Model-Driven Telemetry Session from a Router to a Collector.