Introduction
This document describes how to configure and use Speech Synthesis Markup Language (SSML) with Google Text to Speech (Google TTS).
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Webex Contact Center (WxCC) 2.0
Components Used
The information in this document is based on these software versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
Speech Synthesis Markup Language allows more customization in your Google Text to Speech audio response. It provides details on pauses, and audio format for acronyms, dates, times, abbreviations, or text that can be censored.
Configuration
Refer to Google SSML to find information about all SSML elements on the Google Cloud Portal.
Here are several examples of WxCC Flow configuration with some of the SSML elements:
<speak>
The Speech Synthesis Markup Language is an XML application. The root element is <speak>. All additional elements must be configured inside of the speak tags. If you put plain text within the speak tag, the caller hears the voice in the default TTS setup which is configured in the WxCC Flow.
<speak-as>
This element lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail provided in the contained text.
The <say‑as> element has the required attribute, interpret-as, which determines how the value is spoken. Optional attributes format and detail can be used related to the particular interpret-as value.
Example 1:
<speak-as> element with interpret-as='currency' as the required attribute and language='en-US' as the optional attribute. In this example the caller hears - Your current balance is Fifty-three dollars and twenty-one cents.
Example 2:
<speak-as> element with interpret-as="ordinal" as the required attribute. In this setup, the caller hears his or her position in the queue in the ordinal format such as first, second, and so on.