Databases Reference
In-Depth Information
T A B L E 11 . 2
Recommended input-output
characteristics of the quantizer
for 24-kbits-per-second
operation.
Input Range
Label
Output
log 2 d k
log 2 d k
| I k |
α k
α k
[ 2 . 58 , )
3
2.91
[ 1 . 70 , 2 . 58 )
2
2.13
[ 0 . 06 , 1 . 70 )
1
1.05
( −∞ , 0 . 06 )
0
−∞
The Quantizer
The recommendation assumes that the speech output is sampled at the rate of 8000 samples
per second, so the rates of 40, 32, 24, and 16 kbits per second translate 5 bits per sample, 4
bits per sample, 3 bits per sample, and 2 bits per sample. Comparing this to the PCM rate of
8 bits per sample, this would mean compression ratios of 1.6:1, 2:1, 2.67:1, and 4:1. Except
for the 16 kbits per second system, the number of levels in the quantizer are 2 n b
1, where
n b is the number of bits per sample. Thus, the number of levels in the quantizer is odd, which
means that for the higher rates we use a midtread quantizer.
The quantizer is a backward adaptive quantizer with an adaptation algorithm that is similar
to the Jayant quantizer. The recommendation describes the adaptation of the quantization
interval in terms of the adaptation of a scale factor. The input d k is normalized by a scale
factor
α k . This normalized value is quantized, and the normalization removed by multiplying
with
α k is adapted to the input. Therefore, for
example, instead of expanding the step size, we would increase the value of
α k . In this way the quantizer is kept fixed and
α k .
The fixed quantizer is a nonuniform midtread quantizer. The recommendation describes
the quantization boundaries and reconstruction values in terms of the log of the scaled input.
The input-output characteristics for the 24 kbit system are shown in Table 11.2 . An output
value of
in the table corresponds to a reconstruction value of 0.
The adaptation algorithm is described in terms of the logarithm of the scale factor:
−∞
y
(
k
) =
log 2 α k
(60)
The adaptation of the scale factor
depends on whether the input is speech or
speechlike, where the sample-to-sample difference can fluctuate considerably, or whether the
input is voice-band data, which might be generated by a modem, where the sample-to-sample
fluctuation is quite small. In order to handle both these situations, the scale factor is composed
of two values, a locked slow scale factor for when the sample-to-sample differences are quite
small, and an unlocked value for when the input is more dynamic:
α
or its log y
(
k
)
y
(
k
) =
a l (
k
)
y u (
k
1
) + (
1
a l (
k
))
y l (
k
1
)
(61)
The value of a l (
depends on the variance of the input. It will be close to one for speech
inputs and close to zero for tones and voice-band data.
k
)
 
Search WWH ::




Custom Search