Nuance SayAnything grammars allow users to speak freely to an application and to have their sentences interpreted without the application developer having to write complex grammar rules covering the entire sentence. A SayAnything grammar is created using a combination of statistical language models (SLM) and robust natural language interpretation technologies.
The BeVocal VoiceXML interpreter supports using SayAnything grammars in your application and provides several properties to improve recognition performance with a SayAnything grammar. However, the BeVocal Café development environment does not provide tools to help you create these grammars. To create a SayAnything grammar, you must do so with the Nuance 8.0 tools. Once you have created your SayAnything grammar, you can use it within your VoiceXML application by specifying the grammar as the value of the <grammar> tag's src attribute.
Note: Creating an efficient SayAnything grammar is a complex task, requiring a lot of speech expertise and in-depth knowledge of Natural Language Semantics. There are many steps involved in creating and fine-tuning these grammars. Please contact Bevocal Professional Services for details or for help creating commercial-grade grammars specific to your application.
Your application may be best served with an open-ended prompt such as "How can I help you?". In this situation, not only can the user's response be highly variable and hard to predict, but it can also contain disfluencies--things such as restarts, filled pauses (um and uh), and ungrammatical sentences. In addition, the grammar for this state must fill several slots from only one utterance. The challenge is to write a grammar that lets callers say an arbitrary phrase within the domain of the task, fills many slots, and still achieves high accuracy.
Using a normal grammar and devising a comprehensive set of grammar rules may be impractical. In most cases, the out-of-grammar rate obtained with handcrafted grammar rules is very high and any attempt to add more rules often leads to poor in-grammar recognition accuracy.
For an example of using a SayAnything grammar, see the VoiceXML Samples page.
A language model that assigns probabilities to sequences of words is called a statistical language model (SLM).
A very simple form of an SLM is a list of words with probabilities assigned to each; this is commonly called a unigram. A unigram is overly simplistic model for many reasons, primarily because the probability of a word is independent of its position in a sentence. In this model, for example, a sentence such as "I want to travel to Boston next Monday" is considered as likely as any permutation of its words, such as "To want to Boston I travel next Monday".
In contrast, an n-gram SLM is one in which the probability of a word depends on the previous N-1 words; N is called the order of the model.
Unlike a normal grammar, an n-gram SLM grammar is not manually written but instead is trained from a set of examples that models the user's speech. To train an SLM grammar, you pass this set of examples (and optionally a domain-specific vocabulary) to a utility, which estimates the model probabilities. Because the probability assigned to a phrase depends on the context and is driven by data, an SLM provides a grammar in which more plausible phrases are given higher probabilities.
SLMs are useful for recognizing free-style speech, especially when the out-of-grammar rate is high. SLMs are not meant to replace normal grammars, which are quite suitable when the application's prompts are sufficient to restrict the user's response. Since SLMs need a large set of examples to train, a data collection system or a pilot based on a normal grammar must often be developed to gather training examples.
SLMs get started on the understanding free-style speech by recognizing an wider range of user utterances. However, recognizing the utterance is only half the battle. The other half is interpeting the meaning of the utterance. To do that, you still need to fill slots with the appropriate values. Writing these grammar rules can be a tedious task and can defeat the advantages of using an SLM in the first place.
To address this problem, BeVocal platform offers a robust natural language (NL) parsing capability that lets you write slot-filling grammars that only consider the meaningful phrases in an utterance, ignoring the parts that don't matter. The robust NL interpretation engine spots meaningful phrases in the text output of the recognizer and fills appropriate slots.
Natural language interpretation has 2 modes--full and robust. In the conventional operating mode, the full mode, the recognition engine and the NL engine are driven by the same grammar. This grammar both defines the valid phrases and how the slots are filled.
In robust mode, you can use two different grammars. The first grammar drives the recognition phase and the second drives the interpretation phase. For example, the recognition could use an SLM grammar, allowing the application to recognize a large range of user's speech. The text output by the recognition engine could then by processed by the NL engine running a grammar that parses certain phrases only--the meaningful ones--and fills the appropriate slots.
While the conventional NL parser requires a "full" parse of the speaker's sentence by a top-level grammar rule--all the words in the sentence must be matched by a single grammar rule--the robust parser eliminates this requirement. If a full parse is not found, the robust NL parser attempts to fill as many slots as it can from partial parses, by using subsentence phrase fragments.
For example, consider a grammar for a typical flight reservation application and the following user's sentence:
I'd like to um I want to go to boston tomorrow.
The speech recognition engine, driven by the SLM, recognizes the sentence and sends the result to the NL engine, which tries to interpret the text. In full mode, the NL engine would not parse the text, because the sentence cannot be completely parsed by the grammar; consequently, it would reject the sentence and fill no slots. However, in robust mode, the engine could ignore the babbling at the beginning of the sentence and fill the destination and day slots with boston and tomorrow, respectively.
Once you have your SayAnything grammar ready, you can use the following properties of the VoiceXML interpreter to help improve recognition within the grammar.
bevocal.grammar.interpretationtype
Extension. The bevocal.grammar.interpretationtype property specifies whether to use the NL engine in standard mode (full) or in robust mode (robust).
Setting this property to robust facilitates the NL engine's interpretation of more spontaneous utterances from SLM grammars.
Note: This property is relevant only when recognizing against SLM grammars. It should always be set to its default value when recognizing against conventional grammars.
bevocal.grammar.phoneticpruning
Extension. The bevocal.grammar.phoneticpruning property specifies whether the recognizer should perform phonetic pruning. For SLM grammars, set this parameter to true except for grammars with small vocabularies.
Note: This property is relevant only when recognizing against SLM grammars. It should always be set to its default value when recognizing against conventional grammars.
Extension. The bevocal.grammar.weightfactor property controls the relative weighting of acoustic and linguistic scores during recognition.
As this value increases, the recognizer runs faster and hence the value of the speedvsaccuracy property should be increased to get better recognition.
The corresponding speech engine property is in the range between 0 and 100. For well-trained SLM grammars, the optimum value is between 0.58 and 0.6, corresponding to the range of 9-10 in the speech engine.
Note: This property is relevant only when recognizing against SLM grammars. It should always be set to its default value when recognizing against conventional grammars.
The default value is 0.5. This maps to a setting of 5 in the speech engine.
bevocal.grammar.wordtransitionpenalty
Extension. The bevocal.grammar.wordtransitionpenalty property controls the word transition weight. This is the trade-off between inserted and deleted words. For SLM-based grammars, the optimal value is in the range 0 to -50. For conventional grammars, you should leave this property set to its default value.
Note: This property is relevant only when recognizing against SLM grammars. It should always be set to its default value when recognizing against conventional grammars.
| Café Home |
Developer Agreement |
Privacy Policy |
Site Map |
Terms & Conditions Part No. 520-0004-02 | © 1999-2007, BeVocal, Inc. All rights reserved | 1.877.33.VOCAL |