Digital Signal Processing Reference
In-Depth Information
with:
p 0 ( n )=
h ( i ) y ( n
i )
i = n +1
n
p 1 ( n )= b
h ( i ) y ( n
i
Q )
i =0
n
p k ( n )= g k
h ( i ) c j ( k ) ( n
i )
i =0
We notice that p 1 and p k are in exactly the same form and can be interpreted in the
same way, as the filtering of a known vector by the perceptual filter starting from zero
initial conditions and weighted by a gain. The standard algorithm therefore consists
first of all of minimizing the criterion:
p 0 )
p 1
|| 2
||
( p
to determine Q and b then to minimize the criterion:
p 0
p 1 )
p 2
|| 2
||
( p
to determine j(2) g 2 , etc. Note that the two long-term predictor parameters Q and b can
be calculated in exactly the same way as the parameters j ( k ) and g k with respect to the
condition for constructing an adaptive codebook which involves the past excitation:
y ( −Q max )
···
y ( 2 N )
y ( 2 N +1)
···
y ( −N )
.
.
.
.
.
.
C =
y (
Q max + N
1)
···
y (
N
1)
y (
N )
···
y (
1)
This codebook has two interesting properties. First, the associated matrix has a
Toeplitz structure. We can see that this property allows us to reduce the number of
operations undertaken when the codebook is filtered. Second, when we pass from
one analysis frame to another, the entire codebook is not called into question. Only N
vectors must be updated. The others are deduced by translation leftward.
Remarks
The constraint Q
N is too strong in practice. It is necessary to introduce a sub-
frame processing and to determine an excitation model for each sub-frame. In effect,
the mean fundamental frequency is of the order of 100 Hz for a male speaker and
250 Hz for a female speaker. This means that the likely values for Q are 80 and 32.
Since the value usually chosen for N is 160, it seems necessary to divide the analysis
frame into, at least, five sub-frames but generally this is limited to four sub-frames for
N = 40 samples since more sub-frames would become too costly in terms of bit rate.
Search WWH ::




Custom Search