Digital Signal Processing Reference
In-Depth Information
After parameter optimisation, classification results for the ternary problem and
by regression for the full 0-100 score range will be shown alongside the out-of
vocabulary resolution and attempts of synergistic fusion of knowledge and data.
However, let us begin with a binary classification by excluding the instances
of the
mixed
class. This leaves 33 942 instances for training, and 36 094 instances
for testing. A development partition is realised as subset of the training data by
choosing 'every other odd year', starting at 1, i.e., all years for which
(
year
−
1
)
mod 4
0. This gives 15 730 instances for evaluation, 18 212 instances for training
during development.
To cope with the bias towards positive reviews (cf. Sect.
10.4.1.1
) down sampling
without replacement is used for the training material. This is the only example in
this topic of down-sampling instead of up-sampling. The reason is the sheer size of
data to handle. After balancing, 15 063 training instances are obtained, from which
8 158 instances are used for training during development.
In direct comparison to the decay function in [
124
], which is reached by setting
c
=
=
1
and
e
1. In Fig.
10.5
the WA is visualised
depending on
c
and
e
. The maximum WA is reaches 70.29 %.
For classification of the BoW and BoNG features serve SMO-trained SVMs with
polynomial kernels [
131
]. After stemming,
=
1, WA gains 0.23 % for
c
=
1 and
e
=
0
.
62 k word stems are left over from the
83 k vocabulary entries of the Metacritic database. Thus, a minimum term frequency
f
min
with a 'gentle' value of
f
min
=
>
2 is employed to remove infrequent words, tak-
ing into account that low-frequency words are likely to be meaningful features for
opinionated sentences [
132
]. Further, 'periodic pruning' is applied to ensure reduc-
tion without dropping potentially relevant features: The data set is partitioned with
configurable partition size. The pruning discards features that occurred only once
after processing of the partitions by the word or N-Gram tokeniser. With a higher par-
tition size—25 % of the data set was chosen as value in the experiments—, the proba-
bility to eliminate relevant features is lowered. Next, optimal feature transformation
70.30
WA [%]
70.10
70.30
69.90
70.10
69.70
69.90
69.70
69.50
69.50
0.0
1.0
e
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
2.0
c
0.2
0.0
Fig. 10.5
WA throughout optimisation of the decay function parameters
c
and
e
[
71
]