Digital Signal Processing Reference
In-Depth Information
6
Experiments and Analysis
After eliminating the sentences which have problems during recording from the
collected1025 sentences, the final 1000 sentences used as the experimental data by
manually tagging, and prepared the mono-phone list lab file, time tagged lab file,
context-sensitive tagged lab file. After preparing the data, training the mono-phone
HMM model and context-sensitive HMM model and realized the automatic segmen-
tation of these two models.
We carried the segmentation by choosing 900 sentences as training data and the
rest 100 sentences as test data. Finally compared and analyzed the result of two kinds
of segmentation and manually segmentation, make a comparison to the time period of
every phoneme, and find out the error range.
6.1
Performance Evaluation Method of Automatic Segmentation
While evaluating the automatic segmentation performance, a method is used which
believe that manually tagged boundary as the correct boundary and when the boun-
dary points of automatic segmentation and the time deviation between correct boun-
daries is เต‡฿’ , that the segmentation boundary is right boundary, otherwise is wrong
boundary, ฿’ called fault-tolerant threshold.
Fault-tolerant threshold is the percentage of the number of boundaries which are
correctly and automatically segmented to the total number of segmented boundaries.
In speech synthesis, usually fault-tolerant threshold for the automatic segmentation is
20ms.
6.2
Mono-Phones Based HMM Automatic Segmentation
100 sentences include 6955 phonemes. After calculating the fundamental frequency
and spectral parameters of these 100 sentences, based on the mono-phone list of each
sentence correspond with (one phoneme per line) and the trained mono-phone HMM
implemented the segmentation, automatic and manually segmentation results were
compared and analyzed, Table 2 shows the time period statistical result of error range
of each phoneme
Table 2. Automatic segmentation accuracy of mono-phone based HMM model
error range
(ms)
Number
of phoneme
propor-
tions
5
3225
46.37%
10
4250
61.11%
20
5102
73.36%
30
5581
80.24%
40
5877
84.50%
Search WWH ::




Custom Search