This chapter describes the syntax for defining a grammar using the XML format. This format uses XML elements to represent the grammar constructs. The structure and possible content of a grammar are almost the same, whether the grammar occurs inline in the <grammar> tag of a VoiceXML document or externally as a separate grammar file referenced by a URI from the <grammar> tag of a VoiceXML document.
A grammar consists of a header followed by a body. The header is information relevant to defining the grammar as a whole. The body is a set of rule definitions that are used to match against user input. All the rules defined in the grammar are active only when the grammar is active.
The current W3C specification for this syntax is the Speech Recognition Grammar Specification, http://www.w3c.org/TR/2002/CR-speech-grammar-20020626. That specification defines the syntax for both ABNF and XML grammars and guarantees that these two formats are semantically equivalent. That is, you can represent exactly the same set of utterances in either grammar format. See Chapter 5, ABNF Grammar Format for details of the ABNF syntax.
This chapter contains the following sections:
| | XML Tags |
| | Differences between Inline and External Definitions |
| | Comments |
| | Header |
| | Rule Definitions |
| | Rule Expansions |
An XML grammar, like HTML and VoiceXML, uses markup tags and plain text. A tag is a keyword enclosed by the angle bracket (< and >) characters. A tag may have attributes inside the angle brackets. Each attribute consists of a name and a value, separated by an equal ( = ) sign; the value must be enclosed in quotes.
Tags occur in pairs; corresponding to the start tag <keyword> is the end tag </keyword>. Between the start and end tag, other tags and text may appear. Everything from the start tag to the end tag is called an element. If one element contains another, the containing element is called the parent element of the contained element. The contained element is called a child element of its containing element. The parent element may also be called a container.
If an element contains no child elements, you can omit the end tag by replacing the final ">" of the start tag with "/>". Such elements are called empty elements.
An inline grammar is defined completely within the <grammar> element in a VoiceXML document. An external grammar, on the other hand, is defined completely in an external file and referenced in the VoiceXML document.
The recognized extensions for an external XML grammar file are .grxml and .xml. Because the .xml extension can be used for any XML file, not just a grammar file, .grxml is the preferred extension for XML grammar files.
When you include an XML grammar definition directly in your VoiceXML document, you put all of the attributes on the enclosing VoiceXML <grammar> tag. Also, an inline XML grammar inherits the XML prolog of the VoiceXML document; it does not have one of its own. So, an inline XML grammar looks like:
<grammar...usage and header attributes...>...grammar header elements......grammar rule definitions...</grammar>
On the other hand, an external XML grammar file must contain a valid XML prolog. An external XML grammar file also contains its own <grammar> element. Remember that this <grammar> element is only for defining the grammar, not for specifying its usage. Consequently, it can have fewer attributes than its inline cousin:
...XML prolog...<grammar...header attributes only...>...grammar header elements......grammar rule definitions...</grammar>
The following sections describe exactly what goes in the header and body for both inline and external XML grammars.
Comments may be placed in most places in a grammar definition. Use standard XML comments of the form:
<!-- this is a comment -->
The header of an XML grammar is split across attributes of the <grammar> element and some special elements that, if present, must be the first children of the <grammar> element.
The header consists of the legal XML Prolog (only for an external grammar) and an appropriately constructed Root Element (always). The root <grammar> element can have attributes specifying the following information:
| | XML Version |
| | XML namespace |
| | Schema attributes |
| | Language |
| | Grammar Mode |
| | Root Rule |
| | Tag Format |
| | Base URI |
The XML version, XML namespace, grammar version, and language attributes are required; all of the other attributes are optional. If the grammar is inline, the root <grammar> element may include additional attributes described in VoiceXML <grammar> element.
The <grammar> element can contain any number of the following elements in any order:
| | Pronunciation Lexicon (any number) |
| | Meta and HTTP-Equiv (any number) |
| | Metadata (any number) |
If the definition contains any of these subelements, they must all occur before any rule definitions.
These attributes and subelements of the <grammar> element constitute the header of the document. The rest of the subelements of the <grammar> are the actual rule definitions and constitute the body of the document.
The following is an example header for an external grammar file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
"http://www.w3.org/TR/speech-grammar/grammar.dtd">
<grammar version="1.0"
xml:lang="en"
mode="voice"
root="myRule"
xmlns="http://www.w3.org/2001/06/grammar"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd"
xml:base="http://www.example.com/base-file-path">
The XML prolog contains the XML declaration and an optional DOCTYPE declaration referencing the grammar DTD. The XML prolog may also contain XML comments, processor instructions and other content.
If the grammar is an internal grammar, then it inherits its prolog from the prolog for the VoiceXML document as a whole; that is, an inline XML grammar cannot have a separate XML prolog. An external XML grammar file, on the other hand, must start with its own XML prolog.
The required version number of the XML declaration indicates which version of XML is being used; currently, its value must be 1.0. The encoding attribute indicates the scheme used for encoding character data in the document. For example, for US applications it would be common to use US-ASCII, UTF-8 (8-bit Unicode) or ISO-8859-1. For Japanese grammars, character encodings such as EUC-JP and UTF-16 (16-bit Unicode) could be used. The declaration of the character encoding is optional but strongly recommended.
The following are examples of XML headers with and without the character encoding declaration.
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml version="1.0" encoding="EUC-JP"?> <?xml version="1.0"?>
The optional DOCTYPE declaration, if present, must be as follows:
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
"http://www.w3.org/TR/speech-grammar/grammar.dtd">
Note: To be a legal XML document, the first 4 characters of any XML file (including an external XML grammar file) must be:
<?xm
No characters, not even whitespace characters such as space or newline, can come before these 4 characters in an external grammar file.
The root element of the grammar definition is a <grammar> element. The attributes of this element and some of its subelements constitute the rest of the grammar's header. There are four standard attributes whose values, if present, never change:
So, in an external file, these standard attributes would be as follows:
<grammar version="1.0"
xmlns="http://www.w3.org/2001/06/grammar"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd">
...
</grammar>
The other header attributes are described in the following sections.
In addition to the version attribute of the XML prolog, you must also specify the version attribute for the grammar. This attribute specifies the version of the grammar format. It does not let you specify a version for a grammar you write. This attribute is required for all XML grammars; currently, its value must be 1.0.
The xml:lang attribute indicates the primary language contained in the document and optionally indicates a country or other variation. Additionally, any legal rule expansion may be labeled with a language identifier to override the language for that expansion. The format is:
<grammar ... xml:lang="lang">
Currently, the supported values for lang are:
Language
Description
en
en-US
es
es-US
fr-ca
This attribute is ignored for DTMF grammars; it is required for voice grammars.
The mode of a grammar indicates whether the grammar is for recognizing speech or DTMF input. The default mode is speech. A single grammar cannot recognize both DTMF and user speech. If you want to have both recognized in a single recognition state, you must have multiple grammars active in that state.
<grammar ... mode="theMode">
where theMode is either voice or dtmf.
In a DTMF grammar, the language header is ignored and any language attachments found in rules are ignored.
As described in Chapter 1, Using VoiceXML Grammars, a reference to a grammar may or may not specify a rule from which to start recognition. If the reference does not specify a rule to start from, the grammar must specify a root rule; in this case, recognition will start from the root rule.
For an internal grammar, the grammar definition must always specify the rule to use as its root rule, even if the grammar contains only 1 rule. This requirement is because there is no rule reference from which the interpreter can determine which rule to use.
For an external grammar, however, specifying the root rule for the grammar in the definition is optional.
If you specify a root rule in the external grammar definition, then when you reference the grammar, you do not have to include a rule name in the URI that references the grammar. If you do not specify a root rule in the grammar definition, then you must include a rule name in the URI (see Referencing Grammars).
Whether or not the grammar definition specifies a root rule, the URI that references the grammar can specify a different rule from which to start recognition.
For example, you might have a general purpose grammar you use for recognizing all manner of animal. There might be one rule that recognizes all animals; that could be the specified root rule. However, within the grammar, you might also have a large number or rules for recognizing specific groups of animals--birds, fish, land mammals, marine mammals, coral, and so on.
You could use this same grammar in different recognition states in your VoiceXML document, specifying different rules in the grammar reference, to recognize more or less specific groups of animals. So, in one place you'd specify the rule as the one for all mammals, in another the rule for fish, and so on.
The rule specified as the root rule must be defined within the grammar. It can be scoped as either public or private. (For information on rule scoping, see Scoping of Rule Definitions.)
The format for specifying the grammar's root rule is:
<grammar root="rulename" ...>
where rulename is the name of a rule defined later in the grammar file.
The tag-format declaration is a URI that indicates the content type of all tags contained within the grammar. It should also indicate a version. Tags typically contain content for a semantic interpretation processor and in such cases the identifier, if present, should indicate the semantic processor to use. See Tags (Semantic Interpretations) for a discussion of tags.
<grammar tag-format="uri/ver" ...>
Currently, the only accepted value is semantics/1.0. The BeVocal VoiceXML interpreter ignores the value; it treats any tags it finds as semantic interpretations and expects them to be in the format described in Tags (Semantic Interpretations).
Relative URIs are resolved according to a base URI, which may come from a variety of sources. You can use the base URI attribute to specify a grammar's base URI explicitly. The path information specified by the base URI declaration only affects URIs in the grammar where the element appears. The format is:
<grammar ... xml:base="uri">
<grammar ... xml:base="http://www.myCompany.com/grammars/base-file-path">
The VoiceXML interpreter calculates the base URI for resolving relative URIs according to the following precedences (highest priority to lowest):
A grammar may optionally reference one or more external pronunciation lexicon documents. A lexicon document is identified by a URI with an optional media type. A lexicon is a child element of the root <grammar> element. The format is:
<lexicon uri="URI" type="mediaType"/>
where URI identifies the location of the pronunciation lexicon and mediaType optionally identifies the media type of the lexicon document. All <lexicon> elements must occur before any <rule> elements.
Currently, the BeVocal VoiceXML interpreter ignores the <lexicon> element.
Metadata is information about the grammar definition itself, rather than information about the content of the grammar definition. You can specify metadata with the meta attribute, the <metadata> element, or the http-equiv attribute. You can have as many as you want in your grammar.
The only predefined meta property name is seeAlso; it is used to specify a resource that might provide additional metadata information about the containing grammar; however, you can specify any name-value pair you want. Providing a meta property name lets you provide general document information such as author, copyright, description, keywords, and so on.
Providing an http-equiv lets you provide information to use as HTTP response headers. You can provide any http-equiv value you want; however, the only ones that the BeVocal VoiceXML interpreter recognizes are those that have to do with caching and fetching of resources. See Chapter 4, Fetching and Caching Resources in the VoiceXML Programmer's Guide.
The legal formats for the meta and http-equiv attributes are:
<meta content="propName" content="value"/> <meta http-equiv="headerName" content="value"/>
where propName is the meta property name, headerName is an HTTP header, and value is an appropriate value for the property or header. For example:
<grammar ...> <meta name="Creator" content="Jane Doe"/> <meta name="seeAlso" content="http://example.com/my-grammar-metadata.xml"/> <meta http-equiv="Expires" content="0"/> <meta http-equiv="Date" content="Thu, 12 Dec 2000 23:27:21 GMT"/> ... </grammar>
All <meta> elements must occur before any <rule> elements.
The <metadata> element is container in which information about the document can be placed using a metadata schema.
Currently, the BeVocal VoiceXML interpreter ignores the <metadata> element.
The body of a grammar consists of a set of rule definitions or, simply, rules. The format of rule definitions is the same, regardless of whether they appear inline or in an external file.
Each rule definition associates a rule name with a rule expansion. The definition also defines the scope of the rule--whether it is local to the grammar in which it is defined or whether it may be referenced within other grammars. Finally, the XML format provides a special syntax for you to provide examples of matching user utterances.
The core purpose of a rule definition is to associate a legal rule expansion with a rule name. You use the <rule> element to define a rule:
<rule id="ruleName" scope="theScope">... ruleExpansion ...</rule>
The id attribute indicates the name of the rule. The content of the <rule> element is the rule expansion. See Scoping of Rule Definitions for a description of the scope attribute.
Because most grammars in VoiceXML identify a set of possible words that a user might say, the top-level rule expansion in a grammar rule is usually a set of alternatives. For example:
<rule id="fish">
<one-of>
<item>Garibaldi</item>
<item>"Nassau Grouper"</item>
<item>Trout</item>
</one-of>
</rule>
This rule is named fish, and the rule expansion is the set of alternatives. This grammar is matched if the user says "Garibaldi", "Nassau Grouper", or "Trout".
The rule name is simply a string that identifies the rule. The rule name for each rule definition must be unique within the grammar. The same rule name may be used in multiple grammars.
In a rule reference (described in Rule References), you can use the rule name to specify a particular rule as the rule from which to start recognition.
A rule name is a case-sensitive character string that does not contain any of the characters:
. : -
In addition, a rule name cannot be the name of a special rule. That is, you cannot name your rule any of the following:
GARBAGE NULL VOID
Rules with the names GARBAGE, NULL, and VOID are predefined. Your grammar must not contain rules with these names. The special rules have the following meanings:
You can use the NULL and VOID rules together to dynamically change whether or not a rule is active. For example, assume your chamber of commerce application provides information that varies from season to season. During the winter, it answers questions about the amount of snow at the local ski resorts; during the summer, it answers questions about the hours of the local amusement parks; all year round, it answers questions about the plays at local theaters. You don't want the application to recognize questions about ski resorts during the summer.
You could completely change your grammars at every season. This might be difficult to maintain. You could ease the problem by organizing your grammars with a judicious combination of VOID and NULL rules. You would use the VOID and NULL rules to turn on and off recognition of the appropriate questions at the appropriate times of year.
The format you use to refer to a special rule is a particular instance of a rule reference. The <ruleref> element can either specify the special attribute, as shown here, or the uri attribute; it cannot specify both (see Rule References).
Each defined rule has a scope of either private or public.
By default, a rule has private scope. The format is:
<rule id="ruleName" scope="theScope"> ...ruleExpansion... </rule>
where theScope is either public or private. If you do not provide the scope attribute, the rule is made private.
For example, the following set of definitions creates one public rule named snapper and two private rules named snapperType and fishColors:
<rule id="snapper" scope="public">
<ruleref uri="#snapperType"/>
<token>Snapper</token>
</rule>
<rule id="snapperType">
<token>Mutton</token>
<ruleref uri="#fishColors"/>
</rule>
<rule id="fishColors" scope="private">
<one-of>
<item>Black</item>
<item>Gray</item>
<item>Red</item>
</one-of>
</rule>
You should only make public the rules in your grammar that you want to be visible to other grammars. The use of private and public scope allows you to write more modular and maintainable grammars. For example, you can define a grammar that has a set internal "worker" rules that are combined to provide a smaller number of externally-accessible rules. Hiding worker rules prevents their accidental misuse.
Note: Do not confuse the scope of a rule with the scope of its containing grammar. The scope of the grammar indicates where in the VoiceXML application the grammar is active. The scope of the rule indicates whether or not the rule is available to other active grammars.
You can write a rule that refers to itself either directly or indirectly through a rule that it references.
<!-- Rule that refers to itself directly --> <rule id="digits"> <one-of> <ruleref uri="digit"/> <item> <ruleref uri="digit"/><ruleref uri="digits"/></item> </one-of> </rule> <!-- Rule that indirectly refers to itself --> <rule id="nounPhrase"> <ruleref uri="noun"/> <ruleref uri="prepositionalPhrase"/> </rule> <rule id="prepositionalPhrase"> <ruleref id="preposition"/><ruleref id="nounPhrase"/></rule>
You should be careful in writing rules that refer to themselves. GSL does not support left-recursive rules. That is, it does not support defining a rule whose first sequential subcomponent contains itself:
<!-- Legal -->
<rule id="digits">
<one-of>
<ruleref uri="digit"/>
<item>
<ruleref uri="digit"/>
<ruleref uri="digits"/>
</item>
</one-of>
</rule>
<!-- Also legal -->
<rule id="digits">
<one-of>
<item>
<ruleref uri="digit"/>
<ruleref uri="digits"/>
</item>
<ruleref uri="digit"/>
</one-of>
</rule>
<!-- Illegal -->
<rule id="digits">
<one-of>
<item>
<ruleref uri="digits"/>
<ruleref uri="digit"/>
</item>
<ruleref uri="digit"/>
</one-of>
</rule>
<!-- Also illegal -->
<rule id="digits">
<one-of>
<ruleref uri="digit"/>
<item>
<ruleref uri="digits"/>
<ruleref uri="digit"/>
</item>
</one-of>
</rule>
This restriction ensures that the interpreter doesn't get lost down an infinite path trying to match a rule.
It can be a great help to people using your grammar if you include with a rule examples of phrases that match the rule definition. The XML format supports a special syntax for including example phrases. You can provide any number of example phrases in each definition. Because the examples follow a specific syntax, instead of simply being free-form comments, automated tools for regression testing and grammar documentation can make use of the examples.
You include an example phrase in an <example> element. Any <example> elements must be before any other content in the <rule> element. For example:
<rule id="snapper" scope="public"> <!-- matches several common varieties of snapper --> <example>red snapper</example> <example>mutton snapper</example> <ruleref uri="#snapperType"/> <token>Snapper</token> </rule>
A rule expansion is the part of a rule definition that actually describes what utterances match the rule. A rule expansion is a token, a rule reference, a semantic interpretation, or an arbitrarily complex combination of these things. For example, a rule expansion might express any of these ideas:
You can change the language associated with any rule expansion; that is, with any token, rule reference, or combination. This allows you to change the language for a short time. The new language affects only the tokens of the expansion; it does not effect rule references or semantic interpretations.
Currently, the only allowed languages are en-us and en. To change the language of a rule expansion, assign a value to the xml:lang attribute of the appropriate element. For example:
<!-- Default grammar language is English -->
<grammar ... xml:lang="en-US">
<!-- Single language attachment to tokens -->
<rule id="yes">
<one-of>
<item>yes</item>
<item xml:lang="es-US">si</item>
</one-of>
</rule>
<!-- Single language attachment to an expansion -->
<rule id="painters">
<one-of xml:lang="es-US">
<item>Frida Kahlo</item>
<item>Diego Rivera</item>
</one-of>
</rule>
This ability to change language will be further illustrated in later sections.
A token is the part of a rule expansion that actually mentions a word that a user might actually speak or a DTMF key a user might press. If the mode of the grammar (see Grammar Mode) is voice, then all tokens must be voice tokens; conversely, if the mode of the grammar is dtmf, then all tokens must be DTMF tokens.
Voice tokens are words that a user can say. Any unmarked text is a token. A token that contains whitespace or other special characters can be enclosed in double quotes. Alternatively, the token can be contained in a <token> element, in which case it must contain only CDATA and is not enclosed in double quotes. The following table shows examples of tokens.
| Token Type | Example |
Garibaldi |
|
2 |
|
"Triton Trigger Fish" |
|
"garibaldi" |
|
trigger fish |
|
<token>San Francisco</token> |
You can specify the language for an individual token, if that language is different from the language of the entire grammar. To do so, use this format:
<token xml:lang="lang"/>
where token is the token and is lang a language identifier, as described in Language.
To improve the portability of your grammar, you should follow some simple rules:
DTMF tokens are keys that a user can press. The DTMF tokens are as follows:
0 1 2 3 4 5 6 7 8 9 * # A B C D
In a DTMF grammar, any unmarked text must be a legal token. As in a speech grammar, tokens must be separated by white space.
Every rule has a name that is unique within the grammar. You can refer to a rule from within another rule either in the same or a different grammar definition. In summary, the formats for doing so are:
See External Grammar Files for a description of how relative URIs work in grammar files. See Special Rules for information on the special rules.
When a grammar rule contains a rule reference, the effect is the same as if the referenced rule's rule expansion appeared in place of the rule name. For example, here the PrimaryColors grammar rule refers to the Shades rule:
<rule id="PrimaryColors">
<ruleref uri="Shades" repeat="0-1" />
<one-of>
<item>red</item>
<item>blue</item>
<item>green</item>
</one-of>
</rule>
<rule id="Shades">
<one-of>
<item>dark</item>
<item>light</item>
</one-of>
</rule>
The PrimaryColors rule could also be written as:
<rule id="PrimaryColors">
<one-of repeat="0-1">
<item>dark</item>
<item>light</item>
</one-of>
<one-of>
<item>red</item>
<item>blue</item>
<item>green</item>
</one-of>
</rule>
When referencing rules defined in the same grammar as the reference, you should always use a simple rule name reference which consists of the local rule name only.
The <ruleref> is always an empty element with a uri attribute. Here, the value of the uri attribute is simply the rule name prefixed by a # character (sometimes called a URI fragment). For example:
<ruleref uri="#fish"/> <ruleref uri="#marineMammals"/>
You can reference rules defined in a different grammar. Here, the other grammar must be an external grammar which you can reference by its URI.
From one grammar, you cannot reference a rule in another grammar if that other grammar is of a different mode. That is, you cannot reference a rule in a DTMF grammar from a rule in a speech grammar or vice-versa.
You can optionally specify a specific rule in that grammar from which to start recognition. If you do not specify a rule, then the referenced grammar must itself specify a root rule.
You can optionally specify a media type, indicating the content type of the referenced grammar. For details on accepted media types, see Grammar Formats.
You can optionally specify a language, indicating the language of the referenced grammar. For details on accepted languages, see Language. XML, ABNF, and JSGF grammars all require that a grammar declare its language. For these grammar formats, any language you specify with a rule reference is ignored. The GSL format does not provide a way to specify the language of a grammar; if a rule reference to a GSL grammar includes the xml:lang attribute, that attribute is used by the interpreter.
An external reference has one of the following formats:
<ruleref uri="uri"/> <ruleref uri="uri" type="mediaType"/> <ruleref uri="uri" xml:lang="lang"/>
where uri is a standard URI (optionally followed by a #ruleName fragment to indicate a rule in the grammar), mediaType is one of the media types described in Grammar Formats, and lang is one of the languages described in Language. For example:
// Reference to a specific rule of an external grammar
<ruleref uri="../fish.gram#butterflies" />
// Reference to the root rule of an external grammar
<ruleref uri="http://www.myCompany.com/grammars/fish.gram" />
// References with associated media types
<ruleref
uri="http://www.myCompany.com/grammars/fish#butterflies"
type="application/srgs"
/>
<ruleref uri="../fish" type="application/srgs" />
// Reference with an associated media type and language
<ruleref
uri="http://www.myCompany.com/grammars/animals#butterflies"
type="application/x-nuance-gsl"
xml:lang="en-US"
/>
XML grammars combine tokens and rule references into more complex expressions. The basic types of combination are:
A set of alternatives matches if the caller says one of the things in the set. You specify a set of alternatives with the <one-of> element. You use the <item> element to indicate each alternative in the set. For example:
<one-of> <item>cat</item> <item>dog</item> <item>fish</item> </one-of>
You can have as many alternatives as you want in the set.
You can optionally specify a different language for the set of alternatives. You specify a language with the same syntax as described in Language. That is, you add the xml:lang attribute to the <one-of> element or the appropriate <item> element. These two <one-of> elements express the same thing:
<one-of xml:lang="es"> <item>Diego</item> <item>Rivera</item> </one-of> <one-of> <item xml:lang="es">Diego</item> <item xml:lang="es">Rivera</item> </one-of>
You can optionally provide a weight for any number of the alternatives. A weight indicates how likely a particular alternative is. You specify a weight as a positive floating point number, such as 2, 2.5, 0.8, or .4. A weight of 1 is the same as not specifying a weight at all. A weight larger than 1 indicates that the alternative is more likely; a weight less than 1 indicates that the alternative is less likely.
You specify a weight by specifying the weight attribute on the <item> element. For example:
<one-of> <item weight="3.1415">cat</item> <item>dog</item> <item weight=".25">fish</item> </one-of> <one-of> <item weight="10">fish</item> <item weight="2">angel fish</item> <item weight=".1">anthia</item> </one-of>
The first of these says that it is quite likely the user will say "cat", less likely but still fairly likely the user will say "dog", and not very likely the user will say "fish". The second says the user will almost certainly say "fish", might say "angelfish", and probably will not say "anthia".
A sequence is a set of expansions that must all be said in the order specified. Sequence is not a separate piece of syntax; rather, think of sequence as the natural order of things. That is, the interpreter always uses all of the expressions in the order presented, unless the expression specifies alternation. Some examples:
<!-- sequence of tokens --> what is coral <!--sequence of rule references--> <ruleref uri="#question"/> <ruleref uri="#subject"/> <!--sequence of tokens and rule references--> <ruleref uri="#subject"/> is <ruleref uri="#type"/>
Grouping is merely a way to treat a set of things as a single term. You use grouping, for example, to attach a language identifier or repeat operator (next) to the whole group. The <item> element indicates a group. For example:
<item>this is a group</item>
You can specify that an expression should be repeated some particular number of times. The syntax for this allows you to specify a variety of repetition types:
Repetition is indicated by use of the repeat attribute on the appropriate element. For example:
<rule id="gear">
<one-of>
<item>mask</item> <item>fins</item> <item>snorkel</item>
<item>booties</item> <item>gloves</item> <item>regulator</item>
</one-of>
</rule>
<rule id="action">
<one-of> <item>buy</item> <item>rent</item> </one-of>
<rule>
<rule id="makeRequest">
<item>I want to</item>
<ruleref uri="action">
<ruleref uri="gear" repeat="1-5">
<item repeat="0-1">
<token>and</token>
<ruleref uri="gear">
</item>
</rule>
This set of rules matches a variety of utterances, including:
I want to buy gloves. I want to rent mask and fins. I want to rent mask, fins, snorkel, and regulator.
Although you're allowed to create a rule that recognizes an unbounded number of expressions, users do not actually speak forever. The speech recognition will proceed more effectively if you are more careful and only indicate a limited range of occurrences.
You can attach a probability to a repeat operator. The value indicates the probability of successive repetition of the repeated expression. A repeat probability must be in the range between 0.0 and 1.0; note that this is different from a weight attached to an entire expression.
A simple example is an optional expansion (zero or one occurrences) with a probability, for example, of 0.6. The grammar indicates that the chance that the expansion will be matched is 60% and that the chance that the expansion will not be present is 40%.
You can only use a repeat probability when specifying a range of repetitions. The syntax is:
repeat="n-m" repeat-prob="prob" repeat="n-" repeat-prob="prob"
That is, you use the repeat-prob attribute to specify the repeat probability; you can only use this attribute if you also use the repeat attribute. Here, n and m are integers and prob is the probability.
<-- The word "angel" is optional and is not very likely to occur. -->
<item repeat="0-1" repeat-prob="0.25">angel</item>
<-- The rule reference to digit must occur between 2 and 4 times -->
<-- and it is very likely it will occur 3 or 4 times. -->
<item repeat="2-4" repeat-prob=".8">
<ruleref uri="#digit"/>
</item>
In general, a tag is an arbitrary string that may be included inline within any rule expansion. You can include as many tags as you want within a single expansion. Tags do not affect what constitutes a legal utterance for a rule nor do they affect how the recognition proceeds.
You use tags to return information--a semantic interpretation--about a recognition to the element that invoked the grammar. Upon successful recognition, the BeVocal VoiceXML interpreter will create a JavaScript object whose properties and values are determined by the tags occurring in the matched rule.
The BeVocal VXML interpreter implements a subset of the W3C specification for Semantic Interpretation for Speech Recognition (http://www.w3.org/TR/semantic-interpretation/). The following functionality from this specification is not currently implemented:
| | Support for attachment of ECMAscript to tags (section 3.2.1) |
| | Use of $ syntax for accessing a grammar rule's Rule Variable |
If you want the BeVocal VoiceXML interpreter to make use of your tags for this purpose, they must be in a specific format. The following is the basic format, where property and value are arbitrary names and values:
<tag>property="value" </tag>
The following format allows the grammar to return multiple slots for a spoken utterance:
<tag>property1="value1";property2="value2";property3="value3"; </tag>
The following format allows the grammar to return the slot value without a name.The browser constructs an internal magic slot for this value and assigns it to the appropriate form input item.
<tag>
"value"
</tag>
For example, the following grammar sets two different tags:
<grammar ...>
<rule id="coloredObject">
<ruleref id="color"/>
<ruleref id="object"/>
</rule>
<rule id="color">
<one-of>
<item> red <tag> color="red" </tag> </item>
<item> pink <tag> color="red" </tag> </item>
<item> yellow <tag> color="yellow" </tag> </item>
<item> canary <tag> color="yellow" </tag> </item>
<item> green <tag> color="green" </tag> </item>
<item> khaki <tag> color="green" </tag> </item>
</one-of>
</rule>
<rule id="object">
<one-of>
<item>
<tag> object="vehicle" </tag>
<one-of><item>truck</item><item>car</item></one-of>
</item>
<item>
<tag> object="toy" </tag>
<one-of><item>ball</item><item>block</item></one-of>
</item>
<item>
<tag> object="clothing" </tag>
<one-of><item>shirt</item><item>blouse</item></one-of>
</item>
</one-of>
</rule>
</grammar>
This grammar recognizes phrases such as "yellow shirt" or "canary blouse". For both of those phrases, it will return the same semantic interpretation:
{
color: yellow;
object: clothing;
}
This simple example allows your grammar to accept synonyms and return a more canonical result that can be used later in your VoiceXML application.
See Setting Input Variables for information on how the interpreter will use the semantic interpretation.
| Café Home |
Developer Agreement |
Privacy Policy |
Site Map |
Terms & Conditions Part No. 520-0004-02 | © 1999-2007, BeVocal, Inc. All rights reserved | 1.877.33.VOCAL |