The TTS and Recorded Voice Selection facility of BeVocal VoiceXML allows the voice developer to specify TTS and recorded voices within their VoiceXML programs. This chapter describes:
| | Specifying TTS Voices |
| | Specifying Recorded Voices |
| | Lists of Fallback Voices |
| | Overriding Recorded Voices |
Note: The TTS and Recorded Voice Selection facility is an experimental extension to VoiceXML; its implementation and behavior are subject to change. The current BeVocal VoiceXML implementation contains the feature before it has been standardized so that developers may provide feedback. If this capability becomes a standard part of a future version of VoiceXML, the BeVocal VoiceXML implementation will change as necessary to match the VoiceXML standard.
Text To Speech (TTS) output can occur in many different places in a VoiceXML application. Some examples where TTS can be played are within <block>, <prompt>, <audio>, and <say-as> tags, and there are others.
The following are the currently supported TTS voices and their characteristics:
| Voice name | Characteristics |
jennifer |
|
julie |
|
katarina |
|
laurie |
|
maria |
|
mark |
|
reed |
Note: If not specified, the default TTS voice is jennifer.
TTS voices can be specified in two ways, through a VoiceXML property with name bevocal.voice.name, and the <voice> tag using the name attribute. The same syntax is used both.
<property name="bevocal.voice.name" value="TTS_voice_name"/>
BeVocal VoiceXML has introduced a property, bevocal.voice.name which allows you to define a voice name. A TTS voice defines the characteristics of the TTS played by a TTS engine. A TTS voice may correspond to a single TTS engine with certain parameters set. Because the BeVocal interpreter can support multiple TTS engines, two TTS voices may correspond to different TTS engines.
As with all properties the bevocal.voice.name property is taken from the innermost property scope which applies to the TTS in question. When no property is specified the default TTS voice is used.
The following statement would specify the mark TTS voice for the prompts within the field, including the prompts "Please say a phone number" and "You said 408-555-1212":
<field name="myphone">
<property name=bevocal.voice.name" value="mark"/>
<grammar src="builtin:grammar/phone"/>
<prompt>Please say a phone number</prompt>
<filled>
You said
<prompt><say-as type="telephone">
<value expr="myphone"/></prompt>
</say-as>
</filled>
</field>
See the <say-as> tag for more details. When specifying TTS voices, it is important to know that some <say-as> types do extra processing on the TTS content of the <say-as> tag, before giving it to the TTS engine, or set parameters on the TTS engine to output the result more naturally.
In the case of the telephone type used above, inserting appropriate pauses in 10-digit US numbers helps the TTS output sound more natural. It is usually a good idea to use <say-as> when outputting a type which is supported by <say-as>. In a very special case you can parse the result in ECMAScript for example and put in the appropriate SSML or other TTS markup yourself.
An error.noresource event is thrown when an invalid TTS voice is specified.
The TTS voice can alternatively be specified using the <voice> tag's name attribute:
<voice name="mark"> Say Hello using Mark's voice! </voice>
The voice tag syntax allows you to specify TTS Voices at a finer -level of granularity, in SSML, wherever <voice> tags are allowed. With just two TTS voices, one female and one male, the use-cases for switching voices within SSML are few. In general it is probably not a good practice to do so. However, as an illustration, maybe you are reading out a plot from a gripping novel:
<voice name="jennifer"> Would you like an apple or a banana? </voice> <voice name="mark"> Didn't you say you had oranges? </voice>
A more typical use-case is using a more natural sounding voice for most TTS, but using a more intelligible TTS voice to read-out some email. The BeVocal interpreter can support multiple TTS engines and would surface these as TTS voices.
A recorded voice is defined as a set of interpreter prompts recorded by a human voice talent. A recorded voice usually has a TTS voice fallback. All recorded voices have TTS voices as fallbacks. A recorded voice is output when using the <say-as> tag and the attribute bevocal:mode="recorded".
| Voice name | Characteristics | Types | TTS fallback |
bv_ann_en_us |
jennifer |
||
bv_adam_en_us |
equity |
mark |
|
bv_ben_en_us |
citystate |
mark |
|
bv_cecelia_es_us |
maria |
You specify the recorded voice name in the exact same way as the TTS voice name, using the bevocal.voice.name property. The currently available set of recorded voices are itemized in the table.
Note: Currently the recorded voice can not be set using the SSML <voice> tag.
The following example uses the bv_adam_en_us recorded voice within a field which prompts for a stock name and outputs the result using a recorded prompt:
<field name="mystock">
<property name="bevocal.voice.name" value="bv_adam_en_us"/>
<grammar src="builtin:grammar/equity"/>
<prompt>Please say a stock name</prompt>
<filled>
You said
<prompt><say-as type="equity" bevocal:mode="recorded">
<value expr="mystock"/></prompt>
</say-as>
</filled>
</field>
In the above example the equity will be read out in the bv_adam_en_us recorded voice unless it is not available; in that case, it will fallback to TTS in the mark voice. This could happen for example if that equity has not been recorded in this voice yet.
Note that if the recorded voice is not available for a particular <say-as> type, then the TTS fallback is used. As you see in the table only the equity type is supported for bv_adam_en_us, so if any other type is specified the output would fallback to the TTS mark voice.
Important: Again, see the <say-as> tag for more details. When specifying a specific <say-as> types and recorded voices, the exact format of the content of the <say-as> tag is important.
Providing lists of voices enables extensibility and flexibility as the BeVocal interpreter supports more recorded and TTS voices. Also it provides a mechanism for BeVocal partners to create applications to override recorded voices for any type.
There are some cases in which a recorded output may not be available for a recorded voice:
Also note that the interpreter may in the future provide support for:
For these reasons and more, the BeVocal interpreter supports a space-delimited list of recorded and TTS voices within the bevocal.voice.name property.
| | recorded voice with a TTS fallback voice defined. |
| | recorded voice only |
| | TTS voice only. |
For any space delimited list of voices, the following algorithm is used.
For example, let's assume there is a newly available voice called bv_joe_en_us which supports all types, but the bv_adam_en_us has more up-to-date equity coverage. The following property declaration:
<property name=bevocal.voice.name"
value="bv_joe_en_us bv_adam_en_us"/>
and the subsequent <say-as> statement:
<say-as type="equity" bevocal:mode="recorded"> <value expr="mystock"/> </say-as>
would output the equity value in the variable mystock in bv_joe_en_us voice if possible but would fallback to bv_adam_en_us voice, if mystock was not available in the first. This assumes that you prefer Joe for consistency with the other types in your application but want to fallback to Adam to ensure a recorded male voice for equity.
What about TTS fallback in this case? If the bv_joe_en_us voice has a TTS fallback voice it will be used, since it is the first. This is true even if bv_adam_en_us was selected for the recorded voice.
What if you didn't like the TTS fallback for the Joe voice, and you wanted to override it? Then you could use a statement like the following:
<property name=bevocal.voice.name"
value="bv_superman_TTS bv_joe_en_us bv_adam_en_us"/>
Or alternatively the if the Joe voice didn't have a TTS fallback but you didn't want Adam's TTS fallback used, you could use the same statement above.
In most cases you should observe the following principles in your voice selection.
| | Specify one recorded voice, and one TTS voice per application for consistency in the interface. |
| | If you specify multiple voices they should have similar characteristics. |
For BeVocal VoiceXML recorded voices:
| | All recorded voices will have TTS fallbacks. |
| | Similar voices should have the same TTS fallback. |
BeVocal provides a mechanism to allow special BeVocal partners, carrier and enterprise customers to override interpreter voices for certain types or all types.
This can be combined with BeVocal Services for recording Voice talent.
If you are interested in these services and capabilities, send email to CafePartners@bevocal.com.
| Café Home |
Developer Agreement |
Privacy Policy |
Site Map |
Terms & Conditions Part No. 520-0001-02 | © 1999-2007, BeVocal, Inc. All rights reserved | 1.877.33.VOCAL |