Robotics Reference
In-Depth Information
One of the many problems in the science of automatic speech recog-
nition is that the same word, and even segments of the same word, will
often be spoken at different speeds, even if it is the same person speaking
them. So a numerical technique called Dynamic Time Warping has been
devised, which has the effect of stretching and compressing segments of
the speech sound in a word, in order to make the waveform of the word
easier to match with a stored waveform. In essence, the effect of Dynamic
Time Warping is to stretch those segments of a speech waveform that are
shorter than their stored templates, and to compresses those segments of
the waveform that are longer than their stored templates. 17
A pure matching process, by itself, will often be good enough to
recognize isolated words with a high degree of accuracy, even when the
software is running on a micro-processor with relatively little computing
power. But some additional intelligence can be applied to the task, tak-
ing into account a knowledge of the context in which a speech segment
appears. Within a word this contextual information can be applied to
improve the accuracy of recognition of individual segments of the word,
using a technique called Hidden Markov Models or HMMs. Here is how
these models work.
Consider the word “tomato”:
1. Let us assume that the probability 18 of a system recognizing the
first sound in the word, the phoneme “t”, is 1.
2. But assume that the system it is not certain whether the next sound
is the phoneme “ah” (for which it has a probability of 0.4), or “ow”,
for which its probability is 0.6.
3. The system is 100 percent confident that the sound after the “ah”
or “ow” is an “m”, i.e. the probability of an “m” is 1.
4. But again, it is not sure what follows the “m”, it might be an “ey”
sound (a probability of 0.5), or it could be “aa” (also with a proba-
bility of 0.5).
5. Then there is another “t”, about which the system is 100 percent
certain (so the probability is 1).
17 This description of the effect of Dynamic Time Warping is not precisely how the process works,
but provides an easy-to-understand explanation.
18 A probability of 1 represents a 100 percent certainty. To convert from a percentage certainty
to a probability, simply express the percentage as a fraction or a decimal; for example, a 60 percent
certainty corresponds to a probability of 60/100, i.e., 0.6.
Search WWH ::




Custom Search