How Computers Recognize - Robots Unlimited: Life in a Virtual Age

Robotics Reference

In-Depth Information

One of the many problems in the science of automatic speech recog-

nition is that the same word, and even segments of the same word, will

often be spoken at different speeds, even if it is the same person speaking

them. So a numerical technique called Dynamic Time Warping has been

devised, which has the effect of stretching and compressing segments of

the speech sound in a word, in order to make the waveform of the word

easier to match with a stored waveform. In essence, the effect of Dynamic

Time Warping is to stretch those segments of a speech waveform that are

shorter than their stored templates, and to compresses those segments of

the waveform that are longer than their stored templates. 17

A pure matching process, by itself, will often be good enough to

recognize isolated words with a high degree of accuracy, even when the

software is running on a micro-processor with relatively little computing

power. But some additional intelligence can be applied to the task, tak-

ing into account a knowledge of the context in which a speech segment

appears. Within a word this contextual information can be applied to

improve the accuracy of recognition of individual segments of the word,

using a technique called Hidden Markov Models or HMMs. Here is how

these models work.

Consider the word “tomato”:

1. Let us assume that the probability 18 of a system recognizing the

first sound in the word, the phoneme “t”, is 1.

2. But assume that the system it is not certain whether the next sound

is the phoneme “ah” (for which it has a probability of 0.4), or “ow”,

for which its probability is 0.6.

3. The system is 100 percent confident that the sound after the “ah”

or “ow” is an “m”, i.e. the probability of an “m” is 1.

4. But again, it is not sure what follows the “m”, it might be an “ey”

sound (a probability of 0.5), or it could be “aa” (also with a proba-

bility of 0.5).

5. Then there is another “t”, about which the system is 100 percent

certain (so the probability is 1).

17 This description of the effect of Dynamic Time Warping is not precisely how the process works,

but provides an easy-to-understand explanation.

18 A probability of 1 represents a 100 percent certainty. To convert from a percentage certainty

to a probability, simply express the percentage as a fraction or a decimal; for example, a 60 percent

certainty corresponds to a probability of 60/100, i.e., 0.6.

Search WWH ::

Custom Search

Home