Digital Signal Processing Reference
In-Depth Information
After parameter optimisation, classification results for the ternary problem and
by regression for the full 0-100 score range will be shown alongside the out-of
vocabulary resolution and attempts of synergistic fusion of knowledge and data.
However, let us begin with a binary classification by excluding the instances
of the mixed class. This leaves 33 942 instances for training, and 36 094 instances
for testing. A development partition is realised as subset of the training data by
choosing 'every other odd year', starting at 1, i.e., all years for which
(
year
1
)
mod 4
0. This gives 15 730 instances for evaluation, 18 212 instances for training
during development.
To cope with the bias towards positive reviews (cf. Sect. 10.4.1.1 ) down sampling
without replacement is used for the training material. This is the only example in
this topic of down-sampling instead of up-sampling. The reason is the sheer size of
data to handle. After balancing, 15 063 training instances are obtained, from which
8 158 instances are used for training during development.
To start, the parameters c and e of the decay function (cf. Sect. 6.3 ) are optimised.
In direct comparison to the decay function in [ 124 ], which is reached by setting c
=
=
1
and e
1. In Fig. 10.5 the WA is visualised
depending on c and e . The maximum WA is reaches 70.29 %.
For classification of the BoW and BoNG features serve SMO-trained SVMs with
polynomial kernels [ 131 ]. After stemming,
=
1, WA gains 0.23 % for c
=
1 and e
=
0
.
62 k word stems are left over from the
83 k vocabulary entries of the Metacritic database. Thus, a minimum term frequency
f min with a 'gentle' value of f min =
>
2 is employed to remove infrequent words, tak-
ing into account that low-frequency words are likely to be meaningful features for
opinionated sentences [ 132 ]. Further, 'periodic pruning' is applied to ensure reduc-
tion without dropping potentially relevant features: The data set is partitioned with
configurable partition size. The pruning discards features that occurred only once
after processing of the partitions by the word or N-Gram tokeniser. With a higher par-
tition size—25 % of the data set was chosen as value in the experiments—, the proba-
bility to eliminate relevant features is lowered. Next, optimal feature transformation
70.30
WA [%]
70.10
70.30
69.90
70.10
69.70
69.90
69.70
69.50
69.50
0.0
1.0
e
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
2.0
c
0.2
0.0
Fig. 10.5
WA throughout optimisation of the decay function parameters c and e [ 71 ]
 
Search WWH ::




Custom Search