Digital Signal Processing Reference
In-Depth Information
envelope and the bit rate. The 4.8 kb/s STC uses a 14 th -order all-pole model
and quantizes the predictor coefficients in the LSF domain. In addition to the
LSFs, the STC transmits gain, pitch, and voicing.
8.4.2 ImprovedMulti-BandExcitation, INMARSAT-MVersion
Improved multi-band excitation (IMBE) operating at 4.15 kb/s for
INMARSAT-Mdivides the speech spectrum into several voiced and unvoiced
frequency bands, using the multi-band approach described in Section 8.3.1.
However, IMBE makes the voicing decisions for groups of three harmonics
and a single bit is allocated for each group. The total number of voicing bits
B v is limited to a maximum of 12 and the harmonics beyond the coverage of
voicing are declared unvoiced. The refined pitch is transmitted using eight
bits. The frame length is 20ms giving 83 bits per frame at 4.15 kb/s and the
remaining bits, i.e. 83
B v , are allocated for spectral amplitudes. The voiced
amplitudes are estimated using equation (8.11) and the unvoiced amplitudes
are estimated using equation (8.13). The voiced bands are synthesized as
follows:
8
s v (n) =
ˆ
A k cos (kφ 0 (n))
for n
=
0 , 1 , 2 , ... ,N
1
(8.21)
k
=
voiced
where N is the frame length and the fundamental phase evolution, φ 0 (n) ,is
defined by the following equations:
φ 0 (n)
=
φ 0 (n
1 )
+
ω 0 (n)
(8.22)
1
N (N
n) ω l 1
l 0
ω 0 (n) =
+
(8.23)
0
1 ) of the previous frame and ω l 0 is the normalized
fundamental frequency estimated at the end of the l th frame. The amplitudes of
the voiced harmonics are linearly interpolated between the analysis points. If
the corresponding harmonic of one analysis point does not exist or is declared
unvoiced then its amplitude is set to zero and the harmonic frequency stays
constant (set to the frequency of the existing voiced harmonic). However
if the pitch estimate is not steady, neither the pitch nor the amplitudes are
interpolated for any harmonics; instead overlap and add method is used.
The unvoiced component is synthesized using filtered white Gaussian
noise. White noise is generated in the time domain and transformed into the
frequency domain; the bands corresponding to the voiced components are
set to zero; and the unvoiced bands are scaled according to the unvoiced
gain factors. The inverse Fourier transform of the modified spectrum gives
the unvoiced component,
where φ 0 (
1 ) is φ 0 (N
ˆ
s uv (n) , which is produced using the overlap and
Search WWH ::




Custom Search