Decision Theoretic Fusion Framework for Actionability Using Data Mining on an Embedded System - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

The sound quality test is evaluated as in Table 7 for men and women in terms of

mean-opinion score. The output sampling rate of each version is 16Khz. The TTS 1, 2,

4 and 5 are the various versions of the engines developed for embedded environments.

The TTS 1 is 32M in size with a man's voice while the TTS 2 is 64M in size also of a

man's voice. The TTS 4 is 32M with a woman's voice. The TTS 5 is 64M with a

woman's voice. The TTS 3 is 32M with a woman's voice developed by a benchmark

developer. Finally, We applied 16KHz, 40M DB (TTS6) with a woman's voice by

compressing 16 KHz, 64M DB. Even if the sampling rate is down and the memory

required is more than 8MByte, the sound quality is much better than 16Kz, 32M DB.

That is why TTS system has dependency on TTS database for sound quality.

Table 7. MOS(Mean Opinion Score) test for TTS

TTS1 (32

M)

TTS2 (64

M)

TTS3 (32

M)

TTS4 (32

M)

TTS5 (64

M)

TTS6 (40

M)

Men

2.93125

3.41675

4.00625

3.65625

4.01875

3.95421

Women

2.66875

3.04375

3.25

3.0625

3.44375

3.3478

Avg.

2.8

3.23025

3.628125

3.359375

3.73125

3.651005

Respective modules are integrated based on the behavior of the speech interactive

agent. The speech interactive agent is not a best solution for human and machine inter-

face if the usage is not easy. Thus, to provide the efficient tool for speech interaction

and improve the performance of speech recognition rate, usability issues are considered.

These issues include start button to notify speech recognition, undo function, command

mode, verification function, out-of-vocabulary rejection, speech guidance and so on. In

our proposed system, following consideration is implemented. Speech recognition start

button by pushing the external push to talk (PTT) button is provided. Disabling func-

tion of speech recognition is automatically done if the user does not speak any word for

3 seconds after pushing the PTT button. Verification function is applied using TTS to

notify the recognition result. Undo function provides the feedback to the previous state

by pushing the PTT button again within 1 seconds if the recognition result is failed.

Command mode is classified into a global and local command. The user can choose the

command mode; expand mode and local mode. Expand mode includes a global and local

commands, and local mode includes only a local command. Out-Of-Vocabulary (OOV)

rejection is applied to reject the word if there is no one in a given recognition list. Lastly,

Speech guidance using the TTS in order to notify the guideline information for the easy

use is applied. On this situation, a lot of people used this system for some periods. Mostly

used application was road navigation, MP3 player, Radio, and TV in order.

5 Discussions and Conclusions

5.1 Discussions

As the quality of automatic speech recognition (ASR) and text-to-speech (TTS) stead-

ily improves, a variety of multimedia application services using embedded ASR

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home