Digital Signal Processing Reference
In-Depth Information
in each <grammar>, the developer can write ScenarioXML documents simply
by referring to the appropriate <grammar>s.
In speech systems, it is important to keep <grammar>s small; as perplexity
increases, the likelihood of recognition errors will also increase. Therefore,
building <grammar>s requires a balance between two competing constraints:
minimizing the <grammar> size for optimal recognition accuracy, and
expanding the grammar to achieve sufficient coverage for the given task. A
skilled programmer may be able to construct such <grammar>s by hand, but
it would be useful to have a system that can create such
<grammar>s
automatically.
Figure 5-6 describes a procedure for automatic <grammar> creation using
a corpus[9]. We refer to this process as “grammar compilation”; a basic
grammar is written by hand and then compiled into a form that is harder for
humans to read, but more suitable for the specific task. First we create a
unification grammar (UG)[10] that is written in a human-readable format that
is familiar to computational linguists. The UG is then compiled into a
context-free grammar (CFG) by expanding all constraints. For a single UG
rule, a set of CFG rules is created where each CFG rule corresponds to a
single set of legal feature-value assignments on the right-hand side of the
original UG rule. Then the CFG is compiled to a regular grammar (RG) by
introducing an upper limit of the number of recursions allowed for recursive
rules[11]. The derived RG can be expressed as a finite state machine (FSM),
as shown in Figure 5-6. Then the FSM is used to parse the sentences in the
corpus. After parsing all sentences, only the nodes and arcs in the FSM that
were activated by at least one sentence are retained, and other nodes and arcs
are deleted. This procedure creates a reduced regular grammar that covers all
sentences in the corpus and is smaller than the original grammar.
On the other hand, we have yet to create an automatic procedure for
lexicon compilation. If we use only the words from the original corpus that
were recognized by arcs in the grammar, the reduced grammar's coverage
will be very weak. The utterance “How can I get to Tokyo?” would not be
covered, even if the corpus includes the utterance “How can I get to Kyoto?”.
However, if we generalize arcs to recognize any words matching the
appropriate part of speech, the degree of generalization would be too strong,
resulting in poorer speech recognition performance. To boost grammar
coverage, we currently utilize semantic word recognition categories (e.g.,
LOCATION) which are created for each dialog task. Automatic lexicon
compilation using a corpus is part of our ongoing research.
Search WWH ::




Custom Search