Digital Signal Processing Reference
In-Depth Information
band excitation (MBE) coders used a constant threshold for all the bands.
However the most recent versions use several heuristic rules to obtain a
better performance [7], e.g. as the frequency increases the threshold function
is decreased, if the same band of the previous frame was unvoiced, if the
high-frequency energy exceeds the low-frequency energy, and if the speech
energy approaches the energy of the background noise.
Sinusoidal Model Approach
McAulay
et al
. proposed a different voicing determination technique for his
sinusoidal transform coder (STC) [2]. The speech spectrum is divided into
two bands, determined by a voicing transition frequency above which the
spectrum is declared unvoiced. This method estimates the similarity between
the harmonically-synthesized signal,
ˆ
s(n, ω
0
)
, and the original speech signal
s(n)
. The signal to noise ratio (SNR),
δ
, between
s(n)
and
ˆ
s(n, ω
0
)
is given by,
−
N
1
s
2
(n)
n
=
0
=
δ
(8.2)
N
−
1
s (n)
− ˆ
s (n, ω
0
)
2
=
n
0
where
N
is the analysis frame length and
ˆ
s(n, ω
0
)
is given by
K(ω
0
)
A
l
exp
jnlω
0
jθ
l
ˆ
s (n, ω
0
)
=
+
(8.3)
l
=
1
where the harmonic amplitudes,
A
l
, are obtained from the spectral envelope
and
θ
l
are the harmonic phases. McAulay simplified equation (8.2) for reduced
computational complexity, and the simplified
δ
is given by,
L
A
l
l
=
1
δ
=
(8.4)
L
A
l
−
2
Nρ (ω
0
)
=
l
1
where
A
l
are the harmonic-frequency spectral amplitudes of the original
signal as shown below,
L
A
l
exp
jnω
l
jφ
l
s (n)
=
+
(8.5)
l
=
1
Search WWH ::
Custom Search