ROBUST DIALOG MANAGEMENT ARCHITECTURE USING VOICEXML FOR CAR TELEMATICS SYSTEMS - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

in each <grammar>, the developer can write ScenarioXML documents simply

by referring to the appropriate <grammar>s.

In speech systems, it is important to keep <grammar>s small; as perplexity

increases, the likelihood of recognition errors will also increase. Therefore,

building <grammar>s requires a balance between two competing constraints:

minimizing the <grammar> size for optimal recognition accuracy, and

expanding the grammar to achieve sufficient coverage for the given task. A

skilled programmer may be able to construct such <grammar>s by hand, but

it would be useful to have a system that can create such

<grammar>s

automatically.

Figure 5-6 describes a procedure for automatic <grammar> creation using

a corpus[9]. We refer to this process as “grammar compilation”; a basic

grammar is written by hand and then compiled into a form that is harder for

humans to read, but more suitable for the specific task. First we create a

unification grammar (UG)[10] that is written in a human-readable format that

is familiar to computational linguists. The UG is then compiled into a

context-free grammar (CFG) by expanding all constraints. For a single UG

rule, a set of CFG rules is created where each CFG rule corresponds to a

single set of legal feature-value assignments on the right-hand side of the

original UG rule. Then the CFG is compiled to a regular grammar (RG) by

introducing an upper limit of the number of recursions allowed for recursive

rules[11]. The derived RG can be expressed as a finite state machine (FSM),

as shown in Figure 5-6. Then the FSM is used to parse the sentences in the

corpus. After parsing all sentences, only the nodes and arcs in the FSM that

were activated by at least one sentence are retained, and other nodes and arcs

are deleted. This procedure creates a reduced regular grammar that covers all

sentences in the corpus and is smaller than the original grammar.

On the other hand, we have yet to create an automatic procedure for

lexicon compilation. If we use only the words from the original corpus that

were recognized by arcs in the grammar, the reduced grammar's coverage

will be very weak. The utterance “How can I get to Tokyo?” would not be

covered, even if the corpus includes the utterance “How can I get to Kyoto?”.

However, if we generalize arcs to recognize any words matching the

appropriate part of speech, the degree of generalization would be too strong,

resulting in poorer speech recognition performance. To boost grammar

coverage, we currently utilize semantic word recognition categories (e.g.,

LOCATION) which are created for each dialog task. Automatic lexicon

compilation using a corpus is part of our ongoing research.

Search WWH ::

Custom Search

Home