Digital Signal Processing Reference
In-Depth Information
prediction is evident from the graphs, where similar performance is obtained
for the MA-MSVQ with around 3 bits less than for the MSVQ without
prediction. This 3 bits advantage is present for all performance measures.
Therefore it is possible to achieve a saving of 10-15% in bit rate by using
MA prediction with an MSVQ quantizer, on top of the bit reduction already
obtained by using MSVQ instead of SVQ. The only cost of the MA prediction
is a slightly increased sensitivity to channel errors. However, during testing
of coders using such schemes, this extra sensitivity did not turn out to be a
significant problem as the prediction order is limited to one.
5.9.5 JointQuantizationofLSFs
Prediction is an efficient way of removing correlation from two or more
neighbouring sets of parameters. However it is a one-way only process where
information from frame k
1 is used in the prediction and quantization of
frame k , but information from frame k is not used for the prediction and
quantization of frame k
1. Indeed it is assumed that frame k is not known
when quantizing frame k
1, in order to keep the delay to a minimum.
However, in some applications it is worth accepting a slight increase in delay
and using a quantization scheme which makes use of the extra redundancies.
A simple way of achieving this is to jointly quantize several sets of parameters.
For example, a 1.2 kb/s version of the SB-LPC coder jointly quantizes three
sets of parameters extracted at 20ms intervals, giving a 60ms frame size. This
enables the coder to quantize the three sets of parameters jointly, making
the best use of the redundancies existing between them. This quantizer will
be referred to as JQ-MSVQ, and the large frame composed of several speech
frames will be referred to as a meta-frame. JQ-MSVQ is also used in a 4 kb/s
version of the SB-LPC, where two sets of LSF extracted every 10ms are
quantized jointly, forming a 20ms meta-frame.
One significant issue with a JQ quantizer is that of weighting. Various
weighting functions have been discussed above and they can be used to
provide weights for each individual set of LSFs. However, all sets of LSFs in
a meta-frame are not usually of equal importance. For example, at a speech
onset, the first set can be in a nonspeech region, whereas the other sets can
be in a speech-active region. Therefore the weight vector should ideally take
this into consideration, so as to maximize the quantization efficiency for the
important sets and not waste bits on a set of LSFs which will have very
little influence on the speech quality. This can be achieved by including a
bias based on the relative energies of the speech for each set of LSFs and
multiplying the weights for the nonspeech LSFs by a factor smaller than one.
A value of 0.1 has been found to give good performance. It is risky to use a
smaller value, as problems can arise from interpolation at the decoder if the
'not so important' set of LSFs is too poorly quantized.
Search WWH ::




Custom Search