Digital Signal Processing Reference
In-Depth Information
80.0
(a)
50.0
Local Ave. Energy
Average Energy
20.0
10.0
80.0
(b)
Local Max. Energy
Local Ave. Energy
Local Min. Energy
50.0
20.0
10.0
0.0
40.0
80.0
120.0
No of Frames
Figure 6.32 The relationship between the energy levels used in determining voiced
and unvoiced speech
every speech frame according to [28],
E av (n) =
1 ) +
0 . 7 E av (n
0 . 3 E 0
(6.64)
0 . 5 E max (n
1 )
+
0 . 5 E 0
;
if E 0 > E max (n
1 )
E max (n)
=
(6.65)
+
;
0 . 99 E max (n
1 )
0 . 01 E 0
otherwise
1 ) +
;
if E 0
0 . 5 E min (n
0 . 5 E 0
E min (n
1 )
=
0 . 975 E min (n
1 ) +
0 . 025 E 0 ;
if E min (n
1 )
E 0 < 2 E min (n
1 )
E min (n)
1 . 025 E min (n
1 )
;
otherwise
(6.66)
Relative variations of these energy levels are illustrated in Figure 6.32. The
voicing decision for each band is made by comparing the normalized error
for the band with the value of the threshold function which is computed
using the above procedure. If the normalized error is less than the threshold
function, the corresponding frequency band is declared voiced; otherwise,
the frequency band is declared unvoiced. The variations of the threshold and
the corresponding error function are shown in Figure 6.33.
Search WWH ::




Custom Search