VERBKEY - A SINGLE-CHIP SPEECH CONTROL FOR THE AUTOMOBILE ENVIRONMENT - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

The following, dynamic network aggregates the time-varying local

distances d to a temporally varying distance-vector g. In case of

continuous recognition, hypotheses are selected from the search space

according to their scores and stored to a time-varying n-best list.

Efficient pruning on commands, words and on the acoustic level. For

continuous word recognition, intermediate hypotheses are stored in n-best

lists ordered by their scores. If the command syntax allows it, hypothesis-

trees are grown during the recognition process, and new hypotheses are

only started if a possible word end is found. So the search space of word

and subword units can be reduced substantially and the search is only

conducted through a subset of reference models. On the acoustic level,

score based pruning reduces the number of active grid points in the

matching process [2]. All these measures reduce processing load and

allow the implementation of the recognition engine even on simple, low

performant processor platforms.

Figure 12-1. Associative Dynamic (ASD) classifier in network representation, x primary

feature vector, y secondary feature vector, d local distance, g aggregated, optimal distance

during matching process.

Temporal compression of reference patterns. Temporal redundancy is

avoided by compressing the reference patterns for the basic acoustic units

in a way, that the remaining reference states represent only stable and - in

terms of classification - relevant parts of the original pattern. For reasons

Search WWH ::

Custom Search

Home