Anyone have an example of Amazon Polly Speech Synthesis Markup Language (SSML) in text to speech

Forum|Forum|4 years ago
November 16, 2021
3 replies
8 views

Zoom team - do you have a full example of how to use SSML for text to speech.

The help link goes directly to a AWS web page with table of tags.

For a person not familiar with the tags it's confusing.

can you provide an example how to use it in a typical text to speech like shown below ?

Best answer by naveen

@bvanbens - Generally no need to use any of SSML TAG and it will convert all plan text to speech (wave file)...

E.g. If you want to add more pause time in between option1 to option2, then need to use SSML TAGs. (here adding additional 3 seconds delay)

<speak> Thank you for calling the IT Helpdesk <break time="3s"/> To open a new ticket press1 <break time="3s"/> To check existing ticket press 2</speak>

More saemple tags, plz refer to: supportedtags

N

naveenAnswer

Community Champion | Employee

@bvanbens - Generally no need to use any of SSML TAG and it will convert all plan text to speech (wave file)...

E.g. If you want to add more pause time in between option1 to option2, then need to use SSML TAGs. (here adding additional 3 seconds delay)

<speak> Thank you for calling the IT Helpdesk <break time="3s"/> To open a new ticket press1 <break time="3s"/> To check existing ticket press 2</speak>

More saemple tags, plz refer to: supportedtags

J

JoshRyder

Newcomer

The SSML tags are not working for me. The voice simply reads the tags and their contents.

For example the following:

Please wait while we connect your call.
<break time="3s"/>
A representative will be with you shortly.

Is read as:

Please wait while we connect your call.
Break time equals 3's. Greater than
A representative will be with you shortly

J

JoshRyder

Newcomer

Solved.

Any text outside of the <speak> tags appears to break the code and they are read literally.

D

Dave-Myers

Newcomer

Adding onto this thread, in the Amazon docs there is a column in the table for "Availability with Neural Voices", with values of "full availability", "partial availability" and "not available". It looks like Zoom supports the ones marked as "full availability" but not "partial availability", is this correct?

The one I'm trying to get working is <say-as> to pronounce digits correctly as well as say text characters:

<say-as interpret-as=\"digits\">123456</say-as>

<say-as interpret-as=\"characters\">ID</say-as>

When I add these to the message to play field, I'm getting an error of "Invalid SSML request".

thanks!

D

Dave-Myers

Newcomer

My bad - looks like <say-as> is supported, I escaped the ". This works: