Database Reference
In-Depth Information
The sound quality test is evaluated as in Table 7 for men and women in terms of
mean-opinion score. The output sampling rate of each version is 16Khz. The TTS 1, 2,
4 and 5 are the various versions of the engines developed for embedded environments.
The TTS 1 is 32M in size with a man's voice while the TTS 2 is 64M in size also of a
man's voice. The TTS 4 is 32M with a woman's voice. The TTS 5 is 64M with a
woman's voice. The TTS 3 is 32M with a woman's voice developed by a benchmark
developer. Finally, We applied 16KHz, 40M DB (TTS6) with a woman's voice by
compressing 16 KHz, 64M DB. Even if the sampling rate is down and the memory
required is more than 8MByte, the sound quality is much better than 16Kz, 32M DB.
That is why TTS system has dependency on TTS database for sound quality.
Table 7. MOS(Mean Opinion Score) test for TTS
TTS1 (32
M)
TTS2 (64
M)
TTS3 (32
M)
TTS4 (32
M)
TTS5 (64
M)
TTS6 (40
M)
Men
2.93125
3.41675
4.00625
3.65625
4.01875
3.95421
Women
2.66875
3.04375
3.25
3.0625
3.44375
3.3478
Avg.
2.8
3.23025
3.628125
3.359375
3.73125
3.651005
Respective modules are integrated based on the behavior of the speech interactive
agent. The speech interactive agent is not a best solution for human and machine inter-
face if the usage is not easy. Thus, to provide the efficient tool for speech interaction
and improve the performance of speech recognition rate, usability issues are considered.
These issues include start button to notify speech recognition, undo function, command
mode, verification function, out-of-vocabulary rejection, speech guidance and so on. In
our proposed system, following consideration is implemented. Speech recognition start
button by pushing the external push to talk (PTT) button is provided. Disabling func-
tion of speech recognition is automatically done if the user does not speak any word for
3 seconds after pushing the PTT button. Verification function is applied using TTS to
notify the recognition result. Undo function provides the feedback to the previous state
by pushing the PTT button again within 1 seconds if the recognition result is failed.
Command mode is classified into a global and local command. The user can choose the
command mode; expand mode and local mode. Expand mode includes a global and local
commands, and local mode includes only a local command. Out-Of-Vocabulary (OOV)
rejection is applied to reject the word if there is no one in a given recognition list. Lastly,
Speech guidance using the TTS in order to notify the guideline information for the easy
use is applied. On this situation, a lot of people used this system for some periods. Mostly
used application was road navigation, MP3 player, Radio, and TV in order.
5 Discussions and Conclusions
5.1 Discussions
As the quality of automatic speech recognition (ASR) and text-to-speech (TTS) stead-
ily improves, a variety of multimedia application services using embedded ASR
Search WWH ::




Custom Search