VERBKEY - A SINGLE-CHIP SPEECH CONTROL FOR THE AUTOMOBILE ENVIRONMENT - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

recognizer offers a recognition rate near to the human performance. It is

suitable for difficult recognition scenarios (e.g. fluent speech with a large

vocabulary or spontaneous speech). Usually, HMMs demand a high

modeling effort and a floating-point arithmetic for the necessary

computational precision. The component costs to perform HMMs are high

and often oversized for simple control applications with a small number of

commands.

A further speech recognition technique is based on artificial neural

networks (ANN). An ANN is suitable to handle static patterns and self-

adapting processes. Low-cost solutions sometimes employ ANN techniques.

Except using the more complex TDNN approach (Time Delay Neural

Network), these solutions usually do not achieve satisfactory recognition

accuracy.

Recognizers using the principle of Dynamic Time Warping (DTW)

require less computational precision and modeling effort than HMM. A

major drawback is the increasing memory demand, if DTW recognizers are

speaker-independently trained. Generally, DTW recognizers can achieve a

similar recognition accuracy than HMM recognizers.

2.2

ASD algorithm

The patented Associative-Dynamic (ASD) recognizer was developed at

the Dresden University of Technology to provide a very cost-efficient and

simple recognizer alternative [1]. It requires ultra-low resources and it is

suitable to most command and control tasks in mobile applications. It can be

implemented at low-cost processor platforms. Several measures support the

memory reduction and the low processing load:

Reduced feature dimensions by a discriminative network without loss in

classification accuracy. An associative network at the front-end of the

classifier transforms the primary feature vectors x (describing the object

to classify and coming from the analyzer in equidistant time intervals),

into secondary feature vectors y with reduced dimension and improved

discrimination properties. By this transformation, the input pattern is

adapted to the statistical characteristics of the reference knowledge of the

classifier. The transformation weights are optimized for a given

recognition task in a training step by an evolutionary procedure [1].

Task-dependent distance operators. There is a choice of distance

operators by which optimal performance of the classifier for a given

recognition task and under varying accuracy conditions (fixed- vs.

floating point) can be achieved. Local distances are calculated by

applying the distance operator on each input- and reference vector pair.

Search WWH ::

Custom Search

Home