Digital Signal Processing Reference
In-Depth Information
band excitation (MBE) coders used a constant threshold for all the bands.
However the most recent versions use several heuristic rules to obtain a
better performance [7], e.g. as the frequency increases the threshold function
is decreased, if the same band of the previous frame was unvoiced, if the
high-frequency energy exceeds the low-frequency energy, and if the speech
energy approaches the energy of the background noise.
Sinusoidal Model Approach
McAulay et al . proposed a different voicing determination technique for his
sinusoidal transform coder (STC) [2]. The speech spectrum is divided into
two bands, determined by a voicing transition frequency above which the
spectrum is declared unvoiced. This method estimates the similarity between
the harmonically-synthesized signal,
ˆ
s(n, ω 0 ) , and the original speech signal
s(n) . The signal to noise ratio (SNR), δ , between s(n) and
ˆ
s(n, ω 0 ) is given by,
N
1
s 2 (n)
n
=
0
=
δ
(8.2)
N
1
s (n) − ˆ
s (n, ω 0 ) 2
=
n
0
where N is the analysis frame length and
ˆ
s(n, ω 0 ) is given by
K(ω 0 )
A l exp jnlω 0
l
ˆ
s (n, ω 0 )
=
+
(8.3)
l
=
1
where the harmonic amplitudes, A l , are obtained from the spectral envelope
and θ l are the harmonic phases. McAulay simplified equation (8.2) for reduced
computational complexity, and the simplified δ is given by,
L
A l
l
=
1
δ =
(8.4)
L
A l
2 Nρ (ω 0 )
=
l
1
where A l are the harmonic-frequency spectral amplitudes of the original
signal as shown below,
L
A l exp jnω l
l
s (n)
=
+
(8.5)
l
=
1
Search WWH ::




Custom Search