Newcomer

Question

IVR and Text to Speech Speed Questions

Forum|Forum|2 years ago
September 16, 2023
6 replies
38 views

Hello everyone,

I am hopeful to receive feedback from the general community here in addition to Zoom employees as well. I'll break down my questions into two parts:

1. Within an IVR, we are wanting customers to be able to press option 1, hear some text to speech general information about one of our locations, then give the customer the chance to press star for replay the message or press one to return to the main menu of the IVR. In testing this, I have Auto Receptionist 1 (AR1) with an IVR (IVR1.) Option 1 of IVR1 being the option for general information. I ended up creating a second auto receptionist (AR2) with it's own IVR (IVR2) and when a customer presses option 1 in IVR1 it sends the call to AR2 IVR2. Within IVR 2, I have a text to speech IVR main menu audio file play the general information message with customers being able to press star to replay the message or press 1 to return to the main menu of IVR1. Am I thinking about this the right way or is there an easier way? Initially, I had AR1/IVR1 option 1 point to a call queue, but unless an agent was assigned to the call queue, it would try to route the call to voicemail, so that didn't seem like an effective option. I would really appreciate any feedback here to let us know if we are doing this the right way or if there is an easier and/or "best practice" way to accomplish this.

2. For the text to speech pre-canned voices like "Matthew-Male," the voices themselves are great for being robotic, but they speak too fast and I won't be able to use them like that because our customers will not be able to understand them speaking so quickly. Is there some sort of logic or way to slow down the voices so that those hard of hearing or the elderly will be able to understand them easier?

Really looking forward to hearing what people have to say here, so thanks in advance for any feedback on these two topics.

Thanks!

Jason

E

Eliot

Community Super Champion | Partner

hi jason,

with respect to text to speech, you can control the speed of the text using Amazon Polly Speech Synthesis Markup Language (SSML).

here is a simple example. the first part of the text ( for dramatic purposes, you might wish to) is read at normal speed while the second part (slow up the speaking rate of your text) is read at an extra slow speed. this is accomplished with the use of ssml.

the beginning is marked with <speak> and the end with </speak>.

the prosody tag has three attributes which are volume, rate and pitch. i think zoom only supports volume and rate.

rate supports values of x-slow, slow, medium, fast and x-fast plus a non-negative percentage change with a range of 20-200%.

in my example, i

(a) start with <speak>,

(b) change the rate of speaking by inserting <prosody rate="x-slow"> and end with </prosody> and

(c) at the very end with </speak>.

for more details, please see

Supported SSML Tags - Amazon Polly

Did my response answer your question on your second topic? If so, please don't forget to mark the reply as an accepted solution.

thanks, eliot

I

ITMan318

Newcomer

When I add in the <speak> tags, it actually speaks the tags also

What do I need to do to not have the tags spoken? I am using the most updated version of Zoom Phone and trying to create IVR menus

E

Eliot

Community Super Champion | Partner

hi itman318,

when you use ssml tags such as <speak> tags, you have to end them with a forward / [name of tag], i.e. </speak>.

here is an example with two ssml tags:

Did my response answer your question? If so, please don't forget to mark the reply as an accepted solution.

thanks,

eliot

J

Jason2023Author

Newcomer

Hello Eliot,

Thank you very much for the SSML information. That certainly solves my second question and once I get my first question answered, I will be glad to mark your response as a solution.

I see numerous folks have looked at this thread, so any thoughts on my first question? Surely there have been other users wanting to accomplish the same, so I'm just looking for the best answer to it.

Thanks!

J

Jason2023Author

Newcomer

Also, Eliot, I had another question now that I am playing around with SSML. I see in Zoom Phone there is a Spanish option with one of the voices being "Pedro-Male." Let's say I have an English IVR with six options and I want option six ONLY to be the Pedro-Male voice speaking the language for option six. Can Zoom support such a specific example?

E

Eliot

Community Super Champion | Partner

hi jason,

using the <lang> tag may do what you want. you may want to use a bilangual voice such as Joanna or Matthew. see example below with a mixture of american english and american spanish:

seems to work as advertised.

amazon article says:

"Specify another language for a specific word, phrase, or sentence with the <lang> tag. Foreign language words and phrases are generally spoken better when they are enclosed within a pair of <lang> tags. To specify the language, use the xml:lang attribute. For a complete list of available languages, see Languages Supported by Amazon Polly. "

Supported SSML Tags - Amazon Polly

thanks, eliot

J

Jason2023Author

Newcomer

Thanks Eliot - I found that what we are looking for with language is not an option. Yet. We want to be able to change the language for a single IVR option to be a different TTS language like "American-Spanish/Pedro-Male" while having the rest of the IVR be "American-English/Matthew-Male." Zoom Support sent me a product enhancement form, so I am submitting that today, as I feel it would be very beneficial to be able to customize particular IVR options with differing languages.

Anyone out there have some feedback on my first question about the IVR? Still not sure about that one and seeking help on it.

Thanks,

Jason

E

Eliot

Community Super Champion | Partner

hi annianni,

great idea!

i do not know what voiceover api zoom supports.

you can upload a file as long as format and size meet zoom specs, i.e. mp3 or wav less than 10 MB.