Applications in Intelligent Speech Analysis - Intelligent Audio Analysis - page 181

Digital Signal Processing Reference

In-Depth Information

After parameter optimisation, classification results for the ternary problem and

by regression for the full 0-100 score range will be shown alongside the out-of

vocabulary resolution and attempts of synergistic fusion of knowledge and data.

However, let us begin with a binary classification by excluding the instances

of the mixed class. This leaves 33 942 instances for training, and 36 094 instances

for testing. A development partition is realised as subset of the training data by

choosing 'every other odd year', starting at 1, i.e., all years for which

(

year

−

1

)

mod 4

0. This gives 15 730 instances for evaluation, 18 212 instances for training

during development.

To cope with the bias towards positive reviews (cf. Sect. 10.4.1.1 ) down sampling

without replacement is used for the training material. This is the only example in

this topic of down-sampling instead of up-sampling. The reason is the sheer size of

data to handle. After balancing, 15 063 training instances are obtained, from which

8 158 instances are used for training during development.

To start, the parameters c and e of the decay function (cf. Sect. 6.3 ) are optimised.

In direct comparison to the decay function in [ 124 ], which is reached by setting c

=

=

1

and e

1. In Fig. 10.5 the WA is visualised

depending on c and e . The maximum WA is reaches 70.29 %.

For classification of the BoW and BoNG features serve SMO-trained SVMs with

polynomial kernels [ 131 ]. After stemming,

=

1, WA gains 0.23 % for c

=

1 and e

=

0

.

62 k word stems are left over from the

83 k vocabulary entries of the Metacritic database. Thus, a minimum term frequency

f min with a 'gentle' value of f min =

>

2 is employed to remove infrequent words, tak-

ing into account that low-frequency words are likely to be meaningful features for

opinionated sentences [ 132 ]. Further, 'periodic pruning' is applied to ensure reduc-

tion without dropping potentially relevant features: The data set is partitioned with

configurable partition size. The pruning discards features that occurred only once

after processing of the partitions by the word or N-Gram tokeniser. With a higher par-

tition size—25 % of the data set was chosen as value in the experiments—, the proba-

bility to eliminate relevant features is lowered. Next, optimal feature transformation

70.30

WA [%]

70.10

70.30

69.90

70.10

69.70

69.90

69.70

69.50

69.50

0.0

1.0

e

2.0

1.8

1.6

1.4

1.2

1.0

0.8

0.6

0.4

2.0

c

0.2

0.0

Fig. 10.5

WA throughout optimisation of the decay function parameters c and e [ 71 ]

Next Page

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home