Digital Signal Processing Reference
In-Depth Information
8.3.1 VoicingDetermination
There are many ways of performing the voicing classification of speech,
which was discussed in Chapter 6, but here we briefly summarize two
common techniques.
Multi-Band Approach
Harmonic voicing is estimated by computing the normalized mean squared
error of a synthetic voiced spectrum,
S
w
(ω, ω
0
)
, with respect to the speech
spectrum,
S
w
(ω)
, and comparing it against a threshold function for each
harmonic band [6]. The normalized mean squared error,
D
k
,ofthe
k
th
harmonic band is given by,
(k
+
0
.
5
)ω
0
S
w
(ω)
−
S
w
(ω, ω
0
)
2
dω
(k
−
0
.
5
)ω
0
D
k
=
for
k
=
1
,
2
, ... ,K
(8.1)
+
(k
0
.
5
)ω
0
S
w
(ω) dω
(k
−
0
.
5
)ω
0
=
π
/
ω
0
where
K
and
ω
0
is the normalized fundamental frequency.
Figure 8.3 illustrates
D
k
values of two speech spectra with the corresponding
synthetic spectra. If
D
k
is below the threshold function, i.e. a small error
and a good spectral match, the
k
th
band is declared voiced. The initial multi-
3
3
(a)
(a)
2
2
(b)
(b)
1
1
(c)
(c)
0
0
0
1000
2000
3000
4000
0
1000
2000
3000
4000
frequency (Hz)
frequency (Hz)
(a)
(b)
Figure 8.3
Two speech spectra: (a) original spectrum
S
w
(ω)
, (b) synthetic spectrum
S
w
(ω, ω
0
)
, and (c) normalized
D
k
Search WWH ::
Custom Search