Digital Signal Processing Reference
In-Depth Information
Speech recognition as an example of highly complex, real pattern
recognition
The pattern recognition of 4 periodic signals was „synthetic“ in a certain way: the patterns
are formally describable (theory!) and as a result easily recognizable. Furthermore, the
training material did not undergo any fluctuations.
Considering this, voice signals are the direct opposite. They cannot be described formally
and always undergo big fluctuations. Every repeated word greatly differs in aspects of
time and spectrum from its predecessor. However, because we nevertheless recognize
them, there has to be a similarity!
Thinking about possible methods of speech recognition in the first place requires to think
about how schoolmaster nature - using the example of the human ear - excellently
provides this pattern recognition.
Remark:
Still, the exact process of acoustic signal recognition by the ear and brain isn´t
completely known yet. It is known that we only hear “frequencies”, so the signal
always undergoes a kind of FOURIER- transformation in the ear. From the
physical point of view, there is no sign of a generally different process. This,
however, has to be permanently applied to short voice- or music samples,
otherwise the music would not be hearable until the symphony had ended.
On the other hand, an acoustic signal like that cannot simply be cut into time
pieces, as in this process the information would be cut and therewith lost. To avoid
this, the informational parts have to overlap. The uncertainty principle (chapter 3)
hereby calls for a “smooth” overlapping with the help of a suitable window
function, as abrupt alterations of the time process would lead to additional
frequencies not included in the actual acoustic signal.
Equivalent hereto after the symmetry principle (chapter 5) is a dissection of the
frequency band into frequency sections. The narrower these are, the longer the
complementary time sections have to last ( uncertainty principle ), which will lead
to an overlapping in the time range.
Looking at the physics of the cochlea, nature seems to prefer this process (see
Illustration 79 in chapter 4). The settling processes of these frequency selective
fields (filters) finally lead to a superposition of many time processes into one
whole.
While in the time range the superimposition results into a rather big “bustle”, the
situation in the frequency range is entirely different. Because of the vocals shaping
voice and the tones of music, these nearperiodic parts result in a kind of bed- of-
nails pattern, where the actual voice information is kept inside.
Search WWH ::




Custom Search