BANDWIDTH EXTENSION OF TELEPHONY SPEECH - Adaptive Signal Processing: Next Generation Solutions

Digital Signal Processing Reference

In-Depth Information

coefficients are a good choice. 5 The multiplication of the bandlimited feature vector

x ( n ) with the N y N x matrix W can be interpreted as a set of N y FIR filter operations.

Each row of W corresponds to an impulse response which is convolved with the signal

vector x ( n ) resulting in one element of the wideband feature vector y i ( n ). As common

in linear estimation theory the mean values of the feature vectors m x and m y are esti-

mated within a preprocessing stage. For obtaining the matrix W a cost function has to

be specified. A very simple approach would be the minimization of the sum of the

squared errors over a large database

F ( W ) ¼ X

N 1

y ( n ) y ( n )

! min :

(7 : 43)

n¼ 0

In case of cepstral coefficients this results in the distance measure described in Section

7.4.3 [see (7.30) and (7.31)]. If we define the entire data base consisting of N zero-

mean feature vectors by two matrices

X ¼ [ x (0) m x , x (1) m x , ... , x ( N 1) m x ],

(7 : 44)

Y ¼ [ y (0) m y , y (1) m y , ... , y ( N 1) m y ],

(7 : 45)

the optimal solution [28] is given by

W opt ¼ YX T ( XX T ) 1

(7 : 46)

Since the sum of the squared differences of cepstral coefficients is a well-accepted dis-

tance measure in speech processing often cepstral coefficients are utilized as feature

vectors. Even if the assumption of the existence of a single matrix W which transforms

all kinds of bandlimited spectral envelopes into their broadband counterparts is quite

unrealistic, this simple approach results in astonishing good results. However, the

basic single matrix scheme can be enhanced by using several matrices, where each

matrix was optimized for a certain type of feature class. In a two matrices scenario

one matrix W v can be optimized for voiced sounds and the other matrix W u for non-

voiced sounds. In this case it is first checked to which class the current feature vector

x ( n ) belongs. In a second stage the corresponding matrix is applied to generate the esti-

mated wideband feature vector 6

(

W v ( x ( n ) m x ,v ) þm y ,v ,

if the classification indicates a voiced frame,

y ( n ) ¼

W u ( x ( n ) m x ,u ) þm y ,u ,

else.

(7 : 47)

5 Quadratic cost functions lead to rather simple optimization problems. However, the human auditory sys-

tems weights errors more in a logarithmic than in a quadratic sense. For that reason the cepstral distance

(a quadratic function applied to nonlinearly modified LPC coefficients) is a good choice, since this distance

represents the accumulated logarithmic difference between two spectral envelopes (see Section 7.4.3).

6 Besides two different matrices W v and W u also different mean vectors m x ,v and m x ,u , respectively m y ,v and

m y ,u , are applied for voiced and unvoiced frames.

Adaptive Signal Processing: Next Generation Solutions

Search WWH ::

Custom Search

Home