Digital Signal Processing Reference
In-Depth Information
coefficients are a good choice. 5 The multiplication of the bandlimited feature vector
x ( n ) with the N y N x matrix W can be interpreted as a set of N y FIR filter operations.
Each row of W corresponds to an impulse response which is convolved with the signal
vector x ( n ) resulting in one element of the wideband feature vector y i ( n ). As common
in linear estimation theory the mean values of the feature vectors m x and m y are esti-
mated within a preprocessing stage. For obtaining the matrix W a cost function has to
be specified. A very simple approach would be the minimization of the sum of the
squared errors over a large database
F ( W ) ¼ X
N 1
2
y ( n ) y ( n )
! min :
(7 : 43)
0
In case of cepstral coefficients this results in the distance measure described in Section
7.4.3 [see (7.30) and (7.31)]. If we define the entire data base consisting of N zero-
mean feature vectors by two matrices
X ¼ [ x (0) m x , x (1) m x , ... , x ( N 1) m x ],
(7 : 44)
Y ¼ [ y (0) m y , y (1) m y , ... , y ( N 1) m y ],
(7 : 45)
the optimal solution [28] is given by
W opt ¼ YX T ( XX T ) 1
:
(7 : 46)
Since the sum of the squared differences of cepstral coefficients is a well-accepted dis-
tance measure in speech processing often cepstral coefficients are utilized as feature
vectors. Even if the assumption of the existence of a single matrix W which transforms
all kinds of bandlimited spectral envelopes into their broadband counterparts is quite
unrealistic, this simple approach results in astonishing good results. However, the
basic single matrix scheme can be enhanced by using several matrices, where each
matrix was optimized for a certain type of feature class. In a two matrices scenario
one matrix W v can be optimized for voiced sounds and the other matrix W u for non-
voiced sounds. In this case it is first checked to which class the current feature vector
x ( n ) belongs. In a second stage the corresponding matrix is applied to generate the esti-
mated wideband feature vector 6
(
W v ( x ( n ) m x ,v ) þm y ,v ,
if the classification indicates a voiced frame,
y ( n ) ¼
W u ( x ( n ) m x ,u ) þm y ,u ,
else.
(7 : 47)
5 Quadratic cost functions lead to rather simple optimization problems. However, the human auditory sys-
tems weights errors more in a logarithmic than in a quadratic sense. For that reason the cepstral distance
(a quadratic function applied to nonlinearly modified LPC coefficients) is a good choice, since this distance
represents the accumulated logarithmic difference between two spectral envelopes (see Section 7.4.3).
6 Besides two different matrices W v and W u also different mean vectors m x ,v and m x ,u , respectively m y ,v and
m y ,u , are applied for voiced and unvoiced frames.
 
Search WWH ::




Custom Search