Decision Theoretic Fusion Framework for Actionability Using Data Mining on an Embedded System - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Table 4. Word accuracy of various noise reduction methods (%)

Baseline

SS

HP+GJ

HP+DS

Method

Channel

HP+

E-EVD

Ch 3 + Ch 5

38.32

57.66

80.66

87.23

89.14

Ch 4 + Ch 7

83.94

85.77

88.50

90.51

91.88

Table 5. PESQ score of various noise reduction methods

Baseline

SS

HP+GJ

HP+DS

Method

Channel

HP+

E-EVD

Ch 3 + Ch 5

2.68

2.74

2.93

2.91

Ch 4 + Ch 7

3.23

3.19

3.17

3.28

Table 6. Driving tests on a real car

office

Low-speed high-speed

average

Car

Off-line

99.69

94.44%

92.10%

-

Avante (1800CC)

Men

-

95.4%

96%

95.7%

EF Sonata, SM5(2000CC)

Women

-

92.5%

93.42%

92.96%

EF Sonata, SM5(2000CC)

Average

-

93.95%

94.71%

94.33%

EF Sonata, SM5(2000CC)

Next, speech recognition experiment is performed. The speech recognition func-

tion for the speech interactive agent is classified into embedded ASR and DSR front-

end. The total number of recognizable words on embedded ASR is more than about

5,000 words. However, the tree-based dynamic word recognition approach is applied

according to the operational scenarios on each multimedia service application. The

case of DSR is about 10,000 words for each city. The embedded speech recognition

engine is developed and optimized in car noises, and we implemented DSR by using

the third-party DSR Software Development Kit (SDK).

For embedded ASR, an isolated word recognizer with dynamic vocabularies to re-

duce computing time and optimized memory size [15][22] is applied, and speech signals

are analyzed within 125ms frame with 10ms lapped into 26 th order feature vector that

has 13 th order MFCCs including log energy and their 1 st derivatives. To cope with the

car noises, we applied feature compensation scheme based on multivariate Gaussian-

Based Cepstral normalization (RATZ) [18], and the hidden Markov model (HMM)

based on tied mixture is applied [7][17]. Driving test is performed on a real car as de-

lineated in Table 6. The driving speed was done at a low speed between 20 and 60

Km/H while high speed was between 70 and 110 Km/H. A total of 40 men and women

are tested on a Hyundai EF-Sonata and Samsung SM5 car respectively. The number of

recognizable words is 100 words on each given scenario respectively.

Finally, text-to-speech experiment is performed. The speech interactive agent has

two TTS child-processes. One is related with the CNS, and the other is related with

application services. For fast speed on embedded system, the execution time and code

sizes of the TTS are also optimized. However, the access time of storage to get the

specific tri-phone wave takes a lot of time. It is dependent on the used flash memory.

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home