VoiceXML is a markup language derived from XML for writing telephone-based speech applications. Users call applications by telephone. They listen to spoken instructions and questions instead of viewing a screen display; they provide input using the spoken word and the touchtone keypad instead of entering information with a keyboard or mouse.
| | VoiceXML |
| | User Interaction |
| | Flow of Execution |
| | Collecting Input and Playing Prompts |
Just as a web browser renders HTML documents visually, a VoiceXML interpreter renders VoiceXML documents audibly. You can think of the VoiceXML interpreter as a telephone-based voice browser.
As with HTML documents, VoiceXML documents have web URIs and can be located on any web server. Yet a standard web browser runs locally on your machine, whereas the VoiceXML interpreter is run remotely--at the VoiceXML hosting site, for example. And you use your telephone to access the VoiceXML interpreter.
In order to support a telephone interface, the VoiceXML interpreter runs within an execution environment that includes a telephony component, a text-to-speech (TTS) speech-synthesis component, and a speech-recognition component.
The VoiceXML interpreter transparently interacts with these infrastructure components as needed. For example:
VoiceXML uses markup tags and plain text. A tag is a keyword enclosed by the angle bracket characters (< and >). A tag may have attributes inside the angle brackets. Each attribute consists of a name and a value, separated by an equal sign (=) and the value must be enclosed in quotes.
Tags occur in pairs; corresponding to the start tag <keyword> is the end tag </keyword>. Between the start and end tag, other tags and text may appear. Everything from the start tag to the end tag, is called an element. For example, the following three lines constitute a prompt element:
<prompt> What is your telephone number? </prompt>
If there are no other tags or text between the start and end tag, a syntactic shorthand is permitted. You can precede the closing angle bracket ( > ) of the start tag with a slash ( / ) and omit the end tag. For example, instead of writing a value element as:
<value expr="result"></value>
you can use the shorthand notation:
<value expr="result"/>
Because the syntax specifies the end of each element, the VoiceXML interpreter can check that the entire document has been received.
If one element contains another, the containing element is called the parent element of the contained element. The contained element is called a child element of its containing element. The parent element may also be called a container.
Although both HTML and VoiceXML use markup tags, the two languages use tags differently. Whereas the markup tags in HTML describe how to render the data, the markup tags in XML (and consequently in VoiceXML) describe the data itself. This allows an XML interpreter or browser to display the data in whatever way is appropriate.
BeVocal VoiceXML generally complies with the VoiceXML 2.0 Specification. It also includes several handy extensions that you can use if you choose. VoiceXML Tag Summary lists any differences between BeVocal VoiceXML and the standard.
In VoiceXML, the <form> element is analogous to an HTML form that contains items for the user to enter. In VoiceXML forms, each logical piece of information to be collected from the user is identified with a <field> tag.
The form in the following example collects one piece of information from the user. Once this information is obtained, execution proceeds to the field's <filled> element. Other tags used in the example include the following:
| | The <script> tag specifies a block of client-side JavaScript code. |
| | The <var> tag declares a variable to be used within the form. |
| | The <prompt> tag produces audio output for the user. |
| | The <assign> tag assigns a value to a variable. |
| | The <value> tag evaluates an expression and produces spoken output of the result. |
This example requests a number from the caller, computes the factorial of that number, and repeats the answer to the caller.
<?xml version="1.0" ?>
<!DOCTYPE vxml PUBLIC "-//BeVocal Inc//VoiceXML 2.0//EN"
"http://cafe.bevocal.com/libraries/dtd/vxml2-0-bevocal.dtd">
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
<!-- This piece of JavaScript code actually computes the factorial. -->
<script>
<![CDATA[
function factorial(n) {return (n <= 1) ? 1 : n * factorial(n-1);}
]]>
</script>
<!-- Primary VoiceXML form for the application -->
<form id="computefactorial">
<!-- variable to hold result -->
<var name="result"/>
<!-- The field element holds 1 piece of info gotten from the caller. -->
<field name="num" type="number">
<!-- Ask the caller for the information. -->
<prompt>please say a number </prompt>
<!-- The filled tag is run when the interpreter -->
<!-- gets a valid answer from the caller -->
<filled>
<!-- Compute the factorial and assign it to the result variable -->
<assign
name="result"
expr="factorial(num)" />
<!-- Tell the caller the answer -->
<prompt>The factorial of
<value expr="num"/> is
<value expr="result"/>
</prompt>
<!-- Close everything off. -->
</filled>
</field>
</form>
</vxml>
VoiceXML contains no explicit instructions about how to present the prompt, "please say a number" or how to present the results. In theory, these could be presented textually on a different kind of browser.
In practice, the example document is run as a telephone application and results in conversations such as the following.
| Application: | |
| User: | |
| Application: |
An executable VoiceXML file is called a document. The VoiceXML interpreter loads a document file to execute it.
Every VoiceXML document must start with header information that conforms to the XML standard:
<?xml version="1.0"?> <!DOCTYPE vxml PUBLIC "-//W3C/DTD VoiceXML 2.0//EN" "http://www.w3.org/TR/voicexml20/vxml.dtd"> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
These headers describe the language in which the document is written:
| | The first tag indicates that the document is an XML document. This tag is required. |
| Always use this tag exactly as specified; it must be the very first characters in the document. To be a legal XML document, the first 4 characters of any XML file (including a VoiceXML document) must be: | |
<?xm | |
| No characters, not even whitespace characters such as space or newline, can come before these 4 characters in a VoiceXML document. | |
| | The second tag identifies the Document Type Definition (DTD), which is used to validate that the contents represent well-formed VoiceXML. This tag is optional. |
| A DTD describes the format of the data that might appear in an XML document. That is, the DTD defines the valid tags by specifying what attributes each tag can have and what child tags or other content each tag can contain. | |
If your document contains only standard VoiceXML elements, you can use the DTD shown above. If you use any of the BeVocal VoiceXML extensions to VoiceXML, you'll need to use the correct DTD. In this case, you replace the DOCTYPE element with the following: | |
<!DOCTYPE vxml PUBLIC | |
You should include a DOCTYPE declaration during development, as it allows better error checking by the interpreter. You may remove it during deployment for performance. | |
| | The third tag identifies the version of VoiceXML used in this document and the designated namespace for VoiceXML. This tag is required. |
For VoiceXML 2.0, this tag should always include these 2 attributes. It can also include optional attributes described in the section on the <vxml> tag. |
Apart from headers and possibly comments, all the content in a VoiceXML document is contained within a <vxml> element, that is, between the <vxml> start tag and the </vxml> end tag.
A VoiceXML application consists of one or more documents. Any multidocument application has a single application root document. Each document in an application identifies the application root document with the application attribute of the <vxml> tag:
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" application="myAppRoot.vxml" >
Whenever the interpreter executes a document, it loads that document. If the document specifies an application root document, that document is also loaded.
You can use an application root document for global items or interactions that you want to be active throughout the application. For example, suppose the application root document myAppRoot.vxml declares a variable named company that has an initial value of BeVocal:
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
<var name="company" expr="BeVocal">
...
This variable has application scope. That is, any document in the application can use the variable.
Within a document, a user interacts with dialogs, in which the application produces auditory output, typically asking for information. The user provides input by speaking or pressing keys on the telephone. User speech must be recognized and its meaning interpreted. The telephone key input is interpreted as a sequence of tones in the Dual Tone Multifrequency (DTMF) signalling system.
VoiceXML has two kinds of dialogs: forms and menus.
| | A form interacts with the user to fill in a number of fields. Every field has an associated variable, called its input-item variable, or just input variable. Initially, the variable has a value of undefined. It is filled in when the speech-recognition engine recognizes a valid response in a user utterance. Note: In VoiceXML 1.0, an input-item variable was known as a field-item variable. |
| | A menu presents the user with a number of choices; it transitions to a different dialog based on the user's selection. |
The VoiceXML <form> tag defines a form and the <field> tag defines a field in a form. You specify the name of the input variable with the name attribute of the <field> tag. You can use the input variable's name in expressions to refer to the stored value.
In the example in Simple Example, the input variable is named num:
<field name="num" type="number">
When the user says the number, the number is stored in the num variable. Then the interpreter proceeds to execute the field's <filled> element. Here, the num variable in the <assign> element is evaluated before being passed as the parameter to the factorial function.
<assign
name="result"
expr="factorial(num)" />
The <menu> tag defines a menu; each choice consists of a <choice> element. The next attribute of a <choice> element specifies the destination dialog to which the interpreter should transition when the user selects that choice. If a <form> or <menu> element is to be the destination of a transition, the id attribute for the destination dialog should specify a unique identifier.
For example, the following menu consists of three choices.
<menu>
<prompt>
Please choose one of <enumerate/>
</prompt>
<choice next="#MovieForm">
local movies
</choice>
<choice next="localBroadcast.vxml#RadioForm">
local radio stations
</choice>
<choice next="http://www.nationTV.org/tv.vxml">
national TV listings
</choice>
</menu>
The prompt in this menu includes an <enumerate> tag. This tag lets you set up a template for an automatically generated description of the choices. By default, the <enumerate> template simply lists all the choices. In the above example, the prompt is "Please choose one of local movies, local radio stations, national TV listings."
The destination dialog specified by the next attribute can be in the current document or in a different document:
| | If the user says "local movies", the interpreter transitions to the dialog named MovieForm in the same document. |
| | If the user says "local radio stations", the interpreter transitions to the dialog named RadioForm in the document localBroadcast.vxml. |
| | If the user says "national TV listings", the interpreter transitions to the first dialog in the document tv.vxml in the national TV web site. |
You can set properties to customize the behavior of the interpreter. The <property> tag specifies the property to set and the value for that property.
Various properties control how the interpreter behaves when prompting the user for input, recognizing speech or DTMF input, and fetching documents and other resources. For additional information, see Chapter 12, Properties.
The speech-recognition engine uses grammars to interpret user input. See the Grammar Reference for details on creating and using grammars. Here, we only cover a portion of the relevant information.
Each field in a form can have a grammar that specifies the valid user responses for that field. An entire form can have a grammar that specifies how to fill multiple input variables from a single user utterance. Each choice in a menu has a grammar that specifies the user input that can select the choice.
A VoiceXML application can use built-in grammars and application-defined grammars.
The following basic grammars are built into all standard VoiceXML interpreters:
| Grammar Type | Description |
|
|
|
|
|
|
|
|
|
|
|
Recognizes a telephone number adhering to the North American Dialing Plan (with no extension). |
|
BeVocal VoiceXML contains additional built-in grammars as an extension to standard VoiceXML:
| Grammer Type | Description |
|
Recognizes an airport name or code, such as DFW or Dallas-Fort Worth. |
|
Recognizeds an airline name or code, such as AA or American Airlines. |
|
Recognizes a company symbol or full name, such as IBM or Cisco Systems. |
|
Recognizes US city and state names, for example, "Sunnyvale, California". |
|
Recognizes the names of the major US stock indexes, such as "Nasdaq". |
|
|
|
You can reference a built-in grammar in either of two ways:
| | You can use a standard built-in grammar as the type attribute of a <field> element. The example in Simple Example uses the built-in number grammar: |
<field name="num" type="number"> | |
| This means that the speech-recognition engine tries to interpret what the user says as a number. | |
| | You can use any built-in grammar (standard or BeVocal VoiceXML extension) in a <grammar> element by specifying the src attribute with a URI of the form: |
| |
| For example: | |
<grammar src="builtin:grammar/boolean"/> |
Although the built-in grammars can be useful, you typically need to define your own grammars.
An application-defined grammar can be specified in the following forms:
| | Augmented BNF (ABNF) form of the W3C Speech Recognition Grammar Format |
| | XML form of the W3C Speech Recognition Grammar Format |
| | Nuance Grammar Specification Language (GSL) |
| | Java Speech Grammar Format (JSGF) |
A simple grammar can be defined in the document. An inline grammar is defined within the <grammar> element itself. For example, the following inline ABNF grammar matches the words "add" and "subtract".
<field name="operator">
<grammar>
#ABNF 1.0;
root $op;
$op = add | subtract;
</grammar>
...
With this grammar, if the user says "add," the input variable operator is set to add.
More complex grammars can be written externally. An external grammar is defined in a file separate from the VoiceXML document file and is referenced by the src attribute of the <grammar> element. For example, the following field uses a grammar rule named Colors in an external XML grammar defined in the file partGrammar.grxml.
<field name="part">
<grammar
src="http://www.mySite/partGrammar.grxml#Colors"/>
...
The named rule (Colors in the preceding example) is the one the interpreter will use to start recognition. The specified file may include other grammar rules, which may be used as subrules of the this rule.
The grammar for a menu choice can be specified explicitly with a <grammar> child of the <choice> element. Alternatively, a grammar can be generated automatically from the choice text.
If the accept attribute of the <menu> tag is set to approximate, the user can say a subset of the words in the choice text to select that choice. Adding this attribute to the preceding example allows the user to say "TV listings" or just "TV" to select the third choice:
<menu accept="approximate">
...
<choice ...>
national TV listings
</choice>
</menu>
Note that the words must be spoken in the correct order; "listings, TV" would not be recognized. If you want some choices to be matched exactly and others to allow a subset of the words, you can specify the accept attribute on individual <choice> elements.
The speech-recognition engine uses active grammars to interpret user input. A field grammar is active whenever the interpreter is executing that field. A menu-choice grammar is active whenever the interpreter is executing the containing menu. A form grammar is active whenever the interpreter is executing the containing form.
A form grammar or the collection of choice grammars in a menu can optionally be made active at higher scopes:
| | A grammar with document scope is active whenever the interpreter is executing any dialog in the document. |
| | A grammar with application scope is active whenever the interpreter is executing any document in the application. |
If the interpreter is executing one dialog and the user's input matches an active grammar for a different dialog, control transfers to the latter dialog. If the grammar is in application scope, control might transfer to a dialog in a different document.
Note that within a field, you can temporarily turn off grammars from higher scopes by setting the field's modal attribute to true.
The VoiceXML interpreter can throw a number of predefined events based on errors, telephone disconnects, or user input. For example:
| | A no-input event is thrown if the user does not respond to a question. |
| | A no-match event is thrown when the user does not respond intelligibly--that is, when the user's utterance does not match any active grammar. |
| | A help event is thrown when the user requests help. |
| | An error event is thrown when any kind of error occurs. |
An application can define additional events and can use a <throw> element to throw an event of a specified kind.
An application can catch an event and take the appropriate response in an event handler. A <catch> element is a general-purpose event handler; its event attribute specifies the kinds of event that it handles. Additional event-handling tags are syntactic shorthand: <noinput>, <nomatch>, <help>, and <error>. Each of these shorthand tags catches one type of event, indicated by its name. For example, a <nomatch> element catches no-match events.
When an event is thrown, the associated event handler, if it exists, is invoked. If the handler did not cause the application to terminate, execution resumes in the element that was being executed when the event was thrown.
For more information, see Chapter 3, Event Handling.
A link specifies a grammar that is independent of any particular dialog.
A <link> element defines a link. Each <link> element contains a <grammar> element. A link's grammar is active in the scope of the element that contains the link. For example, if the link is in a form, its grammar is active when the interpreter is executing that form. If a link is under a <vxml> element, its grammar has document scope; if the link is in the application root document, its grammar has application scope. Links in a <vxml> element can implement global behaviors.
A link can specify one of two possible actions to take if the speech-recognition engine detects a match its grammar:
| | The link can cause a transition to a different location; in that case, its next attribute specifies the destination of the transition. Links, like menu choices, can cause transitions to other dialogs or documents. |
| | The link can throw an event; in that case, its expr attribute specifies the event to throw. After the event is handled execution resumes with the element that was being executed when the link grammar was matched. |
For example, the following link is defined at document level; its grammar is active whenever the interpreter is executing any dialog in the document. If the user says "operator," the link transfers control to a different document.
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <link next="operator_xfer.vxml"> <grammar type="application/x-nuance-gsl"> operator </grammar> </link> ...
A universal command is always available--the user can give the command at any point in an interaction. A universal grammar specifies user utterances that can be recognized as a universal command.
The following predefined universal grammars are available to all applications:
| Grammar | Description |
The user wants to retract the last response and go back to an earlier part of the interaction. |
If one of these predefined universal grammars is activated and a user utterance matches the grammar, an event of the same name is thrown. For example, a help event is thrown when the user says "help."
An application creates its own universal command by defining and enabling a new universal grammar and implementing its response to the command.
To define a universal grammar, set the universal attribute in the <grammar> tag that defines the grammar for the command. The attribute value is a name that uniquely identifies the grammar among all universal grammars in the application. In the following example, the new universal grammar is named joke; the user utterance "Tell me a joke" will be a universal command when this universal grammar is activated.
<grammar universal="joke" type="application/x-nuance-gsl"> (tell me a joke) </grammar>
An application can activate any of the universal grammars to enable the corresponding universal commands. When a universal grammar is activated, a user utterance that matches the grammar is treated as a universal command.
All universal grammars are deactivated by default. The application can activate some or all universal grammars by setting the universals property. This property specifies which of the universal grammars should be active; all other universal grammars are deactivated.
| | Set the universals property to all to activate all universal grammars (both predefined and application-defined): |
<!-- Activate help, exit, cancel, and goback --> <property name="universals" value="all" /> | |
| | Set the universals property to a space-separated list of grammars to activate those universal and deactivate others: |
<!-- Activate only help, goback, and joke --> <property name="universals" value="help goback joke" /> | |
| | Set the universals property to none to deactivate all previously activated universal grammars in the current scope. |
<!-- Deactivate all universal grammars --> <property name="universals" value="none" /> |
Note: (VoiceXML 1.0 only) If the <vxml> tag's version attribute is 1.0, all universal grammars are activated by default.
A <link> element containing a universal grammar implements the application's response to the corresponding universal command. Your application can respond to the command in whatever manner is appropriate. Typically, the response is to throw an event or to transition to a different form.
If you throw an application-specific event, you must provide an event handler to take the appropriate action. For example:
<!-- Throw an event when the command is given -->
<link event="joke">
<!-- Define the universal grammar -->
<grammar universal="joke" type="application/x-nuance-gsl">
(tell me a joke)
</grammar>
</link>
<!-- Invoke a subdialog when the event is thrown -->
<catch event="joke">
<subdialog name="joker" src="telljoke.vxml"/>
</catch>
You can use procedural logic, called executable content, within a few basic elements: <block>, <filled>, and event handlers. Within executable content, you can declare and assign values to variables, use simple conditional logic, perform iteration (a BeVocal VoiceXML extension), output speech or audio to the user, or run a JavaScript script.
Variables are declared by the <var> tag. Declarations can appear in a document, a form, or executable content. The <var> tag can optionally specify the variable's initial value; if it doesn't, the variable is initialized to undefined.
A variable has the scope of the element that contains its declaration:
| | A variable has document scope if it is declared in a <vxml> element, or in a <block> or event handler that is a child of the <vxml> element. If the document is the application root document, then the variable has application scope. |
You can refer to a variable x with document scope either as x or document.x (for clarity or to resolve ambiguity). If the variable is in the application root document, then you can refer to it in other documents as application.x. | |
| | A variable has dialog scope if it is declared in a <form> element, or in a <block> or <filled> element that is a child of a <form> element, or an event handler that is a child of a <form> or <menu> element. |
You can refer to a variable x with dialog scope either as x or dialog.x. | |
| | A variable has an anonymous scope, local to a field, if it is declared in an event handler or <filled> element that is a child of a <field> element. |
If a <var> element specifies a variable that is already in scope, it does not declare a new variable with the same name, but simply assigns a value to the existing variable. If the <var> element has an expr attribute, the variable is assigned the specified value; otherwise, the variable is assigned the value undefined.
You can set a variable's value with the <assign> tag.
VoiceXML variables are in all respects equivalent to JavaScript variables--they are part of the same namespace. For additional information, see Scripts.
You can use an <if> element to execute a block of code if a condition is satisfied. Within that element, you can use a sequence of <elseif> elements to execute alternative blocks of code if all previous conditions failed and the condition of the <elseif> element is satisfied. You can use an <else> element to execute and alternative block of code if all previous conditions failed.
The conditions in <if> and <elseif> elements are expressed as Boolean-valued JavaScript expressions.
|
|||
|
You can use the BeVocal VoiceXML extension <bevocal:foreach> to execute the contained elements once for each element of a specified array.
A <prompt> or <reprompt> element generates speech output; an <audio> element plays a prerecorded audio clip. The <value> tag evaluates an expression and produces spoken output of the result.
Prompts can appear in executable contents as well as in elements for collecting user input. Anywhere a <prompt> is valid, text is interpreted as a prompt even if the enclosing <prompt> and </prompt> tags are omitted.
An input item and the <initial> item of a mixed-initiative form has a prompt counter that lets you play different prompts if the user revisits the item several times. For example, you may want to play shorter descriptions after the first or second time the user is prompted for the same information. The prompt counters are reset on each form invocation.
A <script> element executes a JavaScript script, which is run in the scope of the parent element. A <script> element can also define functions that can be called by JavaScript expressions in the same scope.
VoiceXML variables are equivalent to JavaScript variables and are part of the same namespace. VoiceXML variables can be used in a script just as variables defined in a <script> element can be used in VoiceXML. Declaring a variable using a <var> element is equivalent to using a var statement in a <script> element.
If your JavaScript expression contains any of the characters "<", ">", or "&", that character must be escaped. Inside a <script> element, you can do so in one of 2 ways. You can replace the individual characters with the corresponding escape sequence "<", ">", or "&". This may result in code that is difficult to read. Alternatively, you can place the entire script inside a CDATA section. For example, either of the following is correct:
<script>
function factorial(n) {
return (n <= 1) ? 1 : n * factorial(n-1); }
</script>
<script>
<![CDATA[
function factorial(n) {
return (n <= 1) ? 1 : n * factorial(n-1); }
]]>
</script>
| You might argue that the second is a little easier to read. |
VoiceXML supports both application-directed and mixed-initiative interactions with a user.
In an application-directed (or simply directed) interaction, the application prompts for the information it needs and the user supplies the requested information by answering the prompts. The application controls the interaction; the user cannot volunteer information. To be more accurate, the application does not understand volunteered information:
| | If the application is executing a form, the only active grammar is the one for the current field of the form. The only valid user input is one that provides a value for the current field's variable. |
| | If the application is executing a menu, the only active grammars are the grammars of the menu's choices. The only valid user input is one that selects a choice for the current menu. |
In a mixed-initiative interaction, the user and the application both participate in determining what the application does next. A single utterance from the user may provide input for multiple input variables in a form. In response to a prompt in one dialog, the user may provide input that matches a grammar defined in a different form. When this happens, the interpreter transitions to that dialog and fills its input variables from the user input. Similarly, the user may provide input that selects a choice from a different menu or that matches a link grammar, causing a transition to the destination specified by that choice or link.
If an application does not use links or grammars with document or application scope, it may still include mixed-initiative forms. A mixed-initiative form includes a form grammar. It can include an <initial> element to control the initial interaction in the form. This element can request user input or perform other non-interactive initialization tasks. In response to a prompt from the <initial> element, the user could provide input that fills in multiple input variables. If the form prompts for individual fields, any user input that matches the form grammar is valid--even if that input does not fill in the field for which the user was just prompted.
Note: Fewer speech-recognition errors occur in directed interactions than in mixed-initiative interactions.
Execution within a VoiceXML document flows in document order until a dialog (form or menu) is entered. Execution flows from the current dialog to a different dialog or document, based on either:
| | An explicit transition statement in the current dialog. |
| | Speech recognition in the current dialog that causes a transition to a different dialog. |
In addition, execution can temporarily leave the current dialog to execute a subdialog, returning to the current dialog when execution of the subdialog is complete.
If the current dialog completes execution without transitioning to a different location, the application exits. In addition, you can use an <exit> element to end the application explicitly.
You can set up explicit transitions to other dialogs or documents in your application using <goto> or <submit> tags. These transition elements can be placed inside <block> or <filled> elements or event handlers.
The <goto> element lets you transition to another input item in the current form, to another dialog in the current document, or to another document. When you make the transition to the new location, the local variables from the old form or document are lost. This happens even if you transition to the same form you were in before. However, the values of local variables are not affected when you use <goto> to transition between items within a form.
The <submit> tag lets you pass values to another document using an HTTP GET or POST request. Since you use a URI to specify the next document, it need not be a VoiceXML document; for example, it could be a CGI script document.
User input to a dialog may cause a transition to a different location:
| | If the speech-recognition engine matches the grammar of a menu's <choice> element that has a next or expr attribute, the interpreter transitions to the destination specified by that attribute. |
| | If the speech-recognition engine matches the grammar of a <link> element that has a next or expr attribute, the interpreter transitions to the destination specified by that attribute. |
| | If the speech-recognition engine matches a grammar with document or application scope that is defined in a different dialog, the interpreter transitions to that dialog. |
A subdialog is a reusable VoiceXML dialog that you can pass data to and get return values from:
| | The current dialog passes control to a subdialog with a <subdialog> element. It can pass data to the subdialog with <param> elements inside the <subdialog> element. |
| | A subdialog returns control to the calling dialog with the <return> element. It can pass values back using the namelist attribute of the <return> element. |
At any moment, the VoiceXML interpreter is either waiting for input in an input item, such as a field, or transitioning between input items in response to some input. In this sense, input can be a spoken user utterance, a series of DTMF key presses, or an input-related event such as invalid input. What happens in the waiting and transitioning states is rather intertwined.
While waiting for input (also referred to as being in a recognition state), the interpreter is listening for and attempting to match spoken utterances or DTMF key presses against the currently active grammars.
When the interpreter listens for speech input, it constantly compares the incoming audio stream to all active grammars, looking for a match. At some point after the user stops talking, the interpreter decides whether the input is valid. The timing for this is controlled by several properties; the properties are different for spoken grammars and for DTMF grammars. For details on how these properties interact, see Chapter 12, Properties.
While transitioning between input items, the interpreter completely ignores spoken utterances. If the property bevocal.dtmf.flushbuffer is set to false, then it does listen for DTMF key presses. It queues (or buffers) any key presses for the next recognition state and it keeps track of timing information for the key presses. The interpreter also queues asynchronously generated events that are not related directly to execution of the transition (such as the user hanging up).
During this transitioning state, prompts and audio are queued to be played and a program's executable content is run. Prompts get played either at the start of the next waiting state or sometimes when the interpreter goes off to fetch a resource, such as another document. For details on fetching resources, see Chapter 4, Fetching and Caching Resources.
At the beginning of a waiting state, there may be DTMF key presses queued during the previous transitioning state. By default, those key presses are not available for the waiting state to use for recognition. If you do want to use those keys, set the bevocal.dtmf.flushbuffer property to false.
| Café Home |
Developer Agreement |
Privacy Policy |
Site Map |
Terms & Conditions Part No. 520-0001-02 | © 1999-2007, BeVocal, Inc. All rights reserved | 1.877.33.VOCAL |