Digital Signal Processing Reference
In-Depth Information
=
k
∂E n
∂a j
∂E n
∂a k
∂a k
∂a j
δ j
(2.37)
where the subindices indicates the branches of the network connecting the
unit j with unit k .
For the output unit, the error is given by:
δ k = y k
t k
(2.38)
and from (2.32) and (2.37), the error at any unit of the network can be
expressed as:
δ j = h ( a j )
k
w jk δ k
(2.39)
Notice that the previous analysis has been given, independent on the num-
ber of layers and the type of nonlinear activation functions. However, as it
was mentioned, the activation function is required to be differentiable. Once
the derivatives are calculated based on (2.36), the weights can be updated
by (2.31).
2.3 Acoustic Models
In the previous section we have observed the most common feature extraction
techniques which correspond to the front-end of a speech decoder, shown in
Fig. 2.1. This section presents the fundamentals of the statistical decoder
which is commonly based on Hidden Markov Models (HMM) [Baum 67]. The
HMM is probably the most powerful statistical method for modeling speech
signals. They can characterize observed data samples such as a sequence of
feature vectors with variable time length for pattern classifications. This task
is performed eciently by introducing the dynamic programming principle.
The HMM assumes that the observed samples are generated by a parametric
random process and it provides a well-defined framework for estimating the
parameters of the stochastic process [Huang 01].
The general assumption of the speech decoder is that the message carried
in the speech signal is encoded as a sequence of symbols. In the previous sec-
tion, we have observed that the front-end of the recognizer extracts relevant
information from the speech signal and embeds it in feature vectors. Then,
the task of the statistical decoder is to map the sequence of feature vectors
to the sequence of symbols.
The statistical decoder has to deal in particular with two problems. First,
as it has been mentioned, feature vectors are extracted with a fixed rate,
typically each 10 ms. Therefore, the sequence length of the feature vectors
depends on the length of the current speech signal. However, several speech
signals with different lengths may carry the same message depending on sev-
eral factors such as speech rate, speaker mood, etc. As a consequence, there
 
Search WWH ::




Custom Search