Database Reference
In-Depth Information
Table 4. Word accuracy of various noise reduction methods (%)
Baseline
SS
HP+GJ
HP+DS
Method
Channel
HP+
E-EVD
Ch 3 + Ch 5
38.32
57.66
80.66
87.23
89.14
Ch 4 + Ch 7
83.94
85.77
88.50
90.51
91.88
Table 5. PESQ score of various noise reduction methods
Baseline
SS
HP+GJ
HP+DS
Method
Channel
HP+
E-EVD
Ch 3 + Ch 5
2.68
2.74
2.74
2.93
2.91
Ch 4 + Ch 7
3.23
3.23
3.19
3.17
3.28
Table 6. Driving tests on a real car
office
Low-speed high-speed
average
Car
Off-line
99.69
94.44%
92.10%
-
Avante (1800CC)
Men
-
95.4%
96%
95.7%
EF Sonata, SM5(2000CC)
Women
-
92.5%
93.42%
92.96%
EF Sonata, SM5(2000CC)
Average
-
93.95%
94.71%
94.33%
EF Sonata, SM5(2000CC)
Next, speech recognition experiment is performed. The speech recognition func-
tion for the speech interactive agent is classified into embedded ASR and DSR front-
end. The total number of recognizable words on embedded ASR is more than about
5,000 words. However, the tree-based dynamic word recognition approach is applied
according to the operational scenarios on each multimedia service application. The
case of DSR is about 10,000 words for each city. The embedded speech recognition
engine is developed and optimized in car noises, and we implemented DSR by using
the third-party DSR Software Development Kit (SDK).
For embedded ASR, an isolated word recognizer with dynamic vocabularies to re-
duce computing time and optimized memory size [15][22] is applied, and speech signals
are analyzed within 125ms frame with 10ms lapped into 26 th order feature vector that
has 13 th order MFCCs including log energy and their 1 st derivatives. To cope with the
car noises, we applied feature compensation scheme based on multivariate Gaussian-
Based Cepstral normalization (RATZ) [18], and the hidden Markov model (HMM)
based on tied mixture is applied [7][17]. Driving test is performed on a real car as de-
lineated in Table 6. The driving speed was done at a low speed between 20 and 60
Km/H while high speed was between 70 and 110 Km/H. A total of 40 men and women
are tested on a Hyundai EF-Sonata and Samsung SM5 car respectively. The number of
recognizable words is 100 words on each given scenario respectively.
Finally, text-to-speech experiment is performed. The speech interactive agent has
two TTS child-processes. One is related with the CNS, and the other is related with
application services. For fast speed on embedded system, the execution time and code
sizes of the TTS are also optimized. However, the access time of storage to get the
specific tri-phone wave takes a lot of time. It is dependent on the used flash memory.
Search WWH ::




Custom Search