3   Defining Grammars

An application grammar is one that you define from scratch for your application. Although the built-in grammars are very useful, you typically need to also define grammars of your own. BeVocal VoiceXML supports grammars specified in the following formats:

 •  XML form of the W3C Speech Recognition Grammar Format
 •  Augmented BNF (ABNF) form of the W3C Speech Recognition Grammar Format
 •  Nuance Grammar Specification Language (GSL)
 •  Java Speech Grammar Format (JSGF)

The following chapters describe each of these formats in detail. This chapter describes general information about grammars you define.

A simple grammar can be defined directly in the VoiceXML document. An inline grammar is defined within the <grammar> element itself; an inline grammar is sometimes called an embedded grammar. For example, the following field uses an inline GSL grammar that matches the words "add" and "subtract".

 <field name="operator">
   <grammar> 
     (add subtract)
   </grammar>
   ...
 </field>

Here, if the user says "add," the input variable operator is set to add.

More complex grammars are typically written externally. An external grammar is defined in a file separate from the VoiceXML document file and is referenced by the src or expr attribute of the <grammar> element. For example, the following field uses a grammar rule named Colors in an external ABNF grammar defined in the file partGrammar.gram:

 <field name="part">
   <grammar src="http://www.mySite/partGrammar.gram#Colors"/>
   ...

The interpreter uses the named rule (Colors in this example) as the starting point for recognition. The specified file may include other grammar rules. Depending on the grammar definition, some of these rules (the public ones) could also be named as the starting point for recognition. Other rules (the private ones) cannot be used as the starting point, but are instead used by other rules in the grammar. Rules used by other rules are sometimes referred to as subrules or subgrammars.

The grammar for a menu choice can be specified explicitly with a <grammar> child of the <choice> element. Alternatively, a grammar can be generated automatically from the choice text.

For more information on referencing external grammars, see Referencing Grammars.

Grammar Construction

Although VoiceXML grammars can be written using various grammar formats, those formats share some basic characteristics. This section gives an introduction to this basic information. See the individual format chapters for details on each of these.

Header versus Body

Conceptually, a grammar definition is always divided into a header and a body. The header contains information that is relevant to the grammar as a whole, such as the language or the identification of the root rule. The body contains individual rule definitions.

In the ABNF, JSGF, and GSL formats, the header information is contained in declarations at the start of the grammar definition. In the XML format, the header information is split across the attributes and the initial children of the <grammar> element.

In VoiceXML <grammar> element, we said that the attributes mode, root, tag-format, version, xml:base, and xml:lang are allowed only for XML grammars. These attributes correspond to information in the header of the definition; for the other formats, this information (when available) is instead encoded in the header declarations.

Inline Definition versus External Grammar Files

An inline grammar is defined completely within the <grammar> element in a VoiceXML document. An external grammar, on the other hand, is defined completely in an external file and referenced in the VoiceXML document.

When you include a grammar definition directly in your VoiceXML document, you may have to take special care. ABNF, JSGF, and GSL grammars use special characters not normally permitted in XML documents, such as angle brackets (< and >). For safety, it is helpful to always enclose the grammar rules with <![CDATA[ and ]]>. Without the <![CDATA[ and ]]>, you would need to replace some special characters with their entity equivalents, such as &lt; and &gt; instead of the < and > characters. You'd also need to know exactly which characters to replace. Using CDATA simplifies all this for you.

 <grammar ...usage attributes...>
   <![CDATA[
   ...grammar header declarations...
   ...grammar rule definitions...
   ]]>
 </grammar>

On the other hand, if you have an external grammar file in ABNF, GSL, or JSGF format, the contents of that file should not be inside a CDATA section. Also, an external file in one of these formats should not contain a <grammar> element. So, an external grammar file for these formats simply looks like:

 ...grammar header declarations...
 ...grammar rule definitions...

XML grammars have their own peculiarities (described in detail in Chapter  4, XML Speech Grammar Format). An inline XML grammar puts all of its attributes on the enclosing VoiceXML <grammar> tag. Also, an inline XML grammar inherits the XML prolog of the VoiceXML document; it does not have one of its own. Because the XML grammar format uses XML tags, you do not make a CDATA section out of an inline XML grammar. So, an inline XML grammar looks like:

 <grammar ...usage and header attributes...>
   ...grammar header elements...
   ...grammar rule definitions...
 </grammar>

On the other hand, an external XML grammar file must contain a valid XML prolog. An external XML grammar file also contains its own <grammar> element. Remember that this <grammar> element is only for defining the grammar, not for specifying its usage. Consequently, it can have fewer attributes than its inline cousin:

 ...XML prolog...
 <grammar ...header attributes only...>
   ...grammar header elements...
   ...grammar rule definitions...
 </grammar>

Grammar files start out as uncompiled text files. For the GSL format, the BeVocal VoiceXML interpreter supports compiled versions of the files. See Compiled Grammar Files for information about how to compile a grammar file and how to use the result.

Rule Definitions

A grammar body consists of one or more rule definitions. A grammar rule definition can have two parts:

 •  A rule name that identifies the rule for use in other rules. In some GSL grammars, the name is optional; in all other cases, the name is required.
 •  A rule expansion, which defines the possible utterances associated with that rule. In GSL the expansion is also called a grammar expression.

Rule Expansions

In any format, a rule expansion consists of a combination of tokens, rule references, semantic interpretations, and syntax for combining these into more complex expressions.

Tokens

A token corresponds directly to words the user can legally say, such as "yes", "ten", or "elephant", or to the DTMF keys the user can press.

Rule References

A rule reference is simply a named grammar rule that is referenced (by name) from another rule. GSL grammars sometimes refer to rule references as subgrammars or subrules.

Rule references let you modularize your grammar descriptions. Because a rule reference is just a named rule, it can contain another rule reference, thus allowing you to create hierarchies of grammar rules.

When a grammar rule contains a rule reference, the effect is the same as if the referenced rule's grammar expansion appeared in place of the rule name.

Semantic Interpretations

All of the grammar formats support the ability to specify information for the semantic interpretation of a rule. In GSL, this ability is done with assignment commands; in the other formats, it is done with tags. Do not confuse this use of the word "tag" with its more common meaning in the context of VoiceXML. Here, it refers to a particular piece of syntax in the grammar format--in XML, it happens to be the <tag> tag, but in GSL, ABNF, and JSGF, it is the use of curly braces.

Semantic interpretations are not needed in every grammar. For example, you probably don't need them for grammars in <link> or <choice> elements. If the user speaks a phrase that matches a link grammar, the <link> element is simply activated. If a semantic interpretation is present, the interpreter stores it in the application.lastresult$ variable, but does not use it for any other purpose; your application can use this variable to access the semantic interpretation.

Rules for grammars that appear within <field> or <form> elements frequently specify semantic interpretations. In some simple grammars, however, you may not need to create a semantic interpretation.

Combinations

The various formats use different syntax for how they combine tokens and rule references. The basic combinations are the same, however. The formats have ways of representing:

Combination Description

Alternation

A set of alternatives ("cat" or "dog" or "goat")

Sequence

Multiple expressions that must all be said in a particular order ("great dane")

Grouping

Multiple expressions to treat as one for a purpose (use French for "Jacques Cousteau")

Repetition

Repeat a single expression some number of times ("very" or "very very" or...)

Optional

Special case of repeat 0 or 1 times (the "kitty" in match "kitty cat" or "cat")

Compiled Grammar Files

The first time you reference a particular source grammar file in a VoiceXML document, that grammar file is compiled and the compiled file is cached by the interpreter. Subsequent references to the same source grammar file actually use the compiled file. The grammar file is not recompiled unless it is modified.

If you have an extremely large grammar, such as one that recognizes all company names in a major city, compilation may take a significant amount of time. For example, a 2-megabyte grammar file can take about 15 minutes to compile.

If a large grammar file is compiled when the application references it, the delay may be long enough to cause the application to fail with a timeout error. To avoid this problem, you can compile GSL grammar files before running the application and then refer to the compiled grammar files (instead of the source grammar files) from your application.

Grammar files larger than 200 kilobytes must be compiled (either using the Grammar Compilation tool or with the Grammar Compilation SOAP service), because applications are not allowed to reference that large a source grammar file. (This size restriction also applies to hosting customers.)

Typically, you develop and test your application to reference a small subset of your grammar in a source grammar file. When you are ready to test or deploy the application with the full grammar, you compile the full grammar file and modify your application to reference the compiled file.

You can compile GSL grammars in 3 different ways:

 •  Use the Grammar Compiler tool in the BeVocal Café
 •  Use Nuance's grammar compilation tool
 •  Use the Grammar Compilation Service

Using the Grammar Compilation Service, you can also compile XML grammars. For information on this service, see the Grammar Compilation Service documentation.

Finally, you can compile Nuance SayAnything grammars using Nuance's grammar compilation tool. For information on SayAnything grammars, see Chapter  8, Nuance SayAnything Grammars.

Using the BeVocal Café Grammar Compiler

You can use the Grammar Compiler tool to submit a request for offline compilation of a GSL grammar file. For a complete description of this tool, see Chapter  7, Grammar Compiler in Using the BeVocal Café Tools.

The request must specify the URI of the grammar file and an email address where you can be notified when the compilation is completed. You may also specify the root rule of the grammar; if you do not, the first public rule in the grammar file is used as the root rule. When the grammar file is compiled, it is assigned a unique key and you are sent email informing you of this key. You use this key in a VoiceXML application to reference the compiled grammar.

To reference the grammar in a grammar file compiled with the Grammar Compiler, you set the src attribute of the <grammar> element to a URI of the form:

 compiled:grammar/key

where key is the unique key for the compiled file that you received by email after the file was compiled.

If you want to combine a compiled grammar in a more complex way with another grammar, you can refer to a compiled grammar from within an ABNF grammar:

 <grammar>
   <![CDATA[
     #ABNF 1.0 en-US;
     root $script;
     $script = prescription $<compiled:/key>;
   ]]>
 </grammar>

Using Nuance's Grammar Compilation Tool

Nuance developers can create a Nuance Grammar Object to use with their BeVocal VoiceXML application. You create a Nuance Grammar Object from a GSL grammar file using Nuance compilation tools.

Nuance developers can get information about its tools by visiting http://extranet.nuance.com. The basic steps for creating and using a Nuance Grammar Object (NGO) with BeVocal VoiceXML are:

1. Compile a static package against the English.America.3 master package and with the -enable_jit option turned on. The command line for the nuance-compile utility would look like:
 
nuance-compile Main.grammar English.America.3 -enable_jit 
  where Main.grammar contains the contents of the grammar to precompile into the NGO.
2. Compile the NGO against this static package. The command line for nuance-compile-ngo utility would look like
 
nuance-compile-ngo Main.grammar -package main
  where main is the name of the package compiled in step 1 and Main.grammar contains the contents of the grammar that will be precompiled into the NGO.
3. From your VoiceXML document, reference the resultant NGO in the same way as any other external grammar. For example:
 
<grammar src="http://www.yourserver.com/grammar/foo.ngo"/>
  Either the extension of the NGO must be .ngo or its Content-Type HTTP header must be application/x-nuance-dynagram-binary.

There are trade-offs to be considered before compiling your own NGOs:

 •  After compilation of NGOs, you reference them as you do any other resource (such as a grammar, audio file, or JSP). This means that NGOs are subject to the caching rules that apply to any resource and you must take care to appropriately cache them.
 •  An NGO is very much larger in size than the original uncompiled GSL grammar. Consequently, fetch of the NGO might take a long time.
 •  The advantage of using NGOs is that the platform does not need to compile your grammars, keeping you in total control of the compilation. Allowing you to maintain control over the compilation could be considered to align more closely with the concept of VoiceXML applications as "Web resources", because you can now view your compiled grammar as you would any other Web resource.


[Show Frames]   [FIRST] [PREVIOUS] [NEXT]
BeVocal, Inc. Café Home | Developer Agreement | Privacy Policy | Site Map | Terms & Conditions
Part No. 520-0004-02 | © 1999-2007, BeVocal, Inc. All rights reserved | 1.877.33.VOCAL