Digital Signal Processing Reference
In-Depth Information
1
voiced
0.5
0
unvoiced
0.5
s(n)
unvoiced
s(n)
1
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
samples
samples
(a)
(b)
Figure 9.20 (a) Zero-crossing rate and (b) Voicing decision of the initial classification
1
200
low-band to high-band energy ratio
tracked energy
voiced
0.5
100
zero-crossing rate
0
0
unvoiced
0.5
100
s(n)
s(n)
1
200
0
1000
2000
3000
4000
0
1000
2000
3000
4000
samples
samples
Figure 9.21 Voicing metrics of the initial classification
signal is shifted down for clarity. Figure 9.20b depicts the voicing decision
made by the initial classification. Figure 9.21 depicts the three metrics used
and the final voicing decision over the same speech segment.
Even though the plosives have a significant amount of energy at high
frequencies and a high zero-crossing rate, synthesizing the high energy
spikes of the plosives using ACELP instead of noise excitation improves
speech quality. Therefore we need to detect the plosives, which are classified
as unvoiced by the initial classification, and switch them to ACELP mode.
A typical plosive is depicted at the beginning of the speech segment in
Figure 9.20b.
Search WWH ::




Custom Search