Digital Signal Processing Reference
In-Depth Information
Table 10.17
UA and WA plus recalls by early and late fusion for the three review classes (negative
(
)) in Metacritic compared to the baseline (no fusion) with different
conditions for the knowledge-based score S , and the SVM class-wise pseudo-probabilities P , P 0 ,
P +
) / mixed (0) / positive (
+
to decide for the positive class by the classifier
[%]
UA
WA
Recall
Recall 0
Recall +
Baseline
53.99
53.71
60.43
43.91
57.62
Early fusion
54.09
53.84
60.67
43.65
57.96
Late fusion
(
S
>
0
)
45.50
55.93
32.65
16.83
86.72
(
S
>
0
.
6
)
46.93
55.95
37.04
20.60
83.14
(
S
>
0
) (
P + >
0
)
52.67
57.77
51.64
29.74
76.64
(
S
>
0
.
6
) (
P + >
0
)
52.92
57.37
52.74
31.56
74.45
(
S
>
0
) (
P =
0
)
53.72
56.19
60.43
29.74
70.98
(
S
>
0
.
6
) (
P =
0
)
53.83
55.99
60.43
31.56
69.49
(
S
>
0
) (
P + >
0
) (
P 0
>
0
)
53.82
56.83
59.14
29.74
72.58
(
S
>
0
.
6
) (
P + >
0
) (
P 0
>
0
)
53.90
56.54
59.25
31.56
70.89
result, the late fusion significantly outperforms the individual approaches (one-tailed
z-test, 0.1 % level).
Finally, to model the 'continuous' values, SVR is chosen for the determination of
the Metacritic score value in the range of 0-100. As kernel, a radial basis function
with the variance parameter
01 proved optimal on the development set. Given
the continuous approximation task, CC and MLE serve as evaluation measure (cf.
Sect. 7.5.2 ) . On the test data of Metacritic, the result is a CC of 0.570 and MLE of
14.1, i.e., on average, the regressor is mistaken by 14.1 with respect to the score. An
obvious challenge for regression training is the non-even distribution of score values
within the Metacritic database (cf. Fig. 10.4 ).
γ =
0
.
10.4.1.3 Summary
Two main approaches towards automatic sentiment analysis and opinion mining
where discussed in this section—one open-domain approach based on on-line
knowledge sources reaching from annotated dictionaries (General Inquirer) to
comprehensive semantic networks (ConceptNet), and one based on data. Further,
benchmarks were presented for the particular task of film reviews, but the methods
can be applied to other sentiment tasks, as will be shown in Sect. 11.7 , where song
lyrics are analysed in such a way.
The advantage of the on-line knowledge sources-based approach using linguistic
methods, dictionaries, and semantic networks, is that no learning material is required.
Overall, it led to usable results, but the in-domain data-driven approach based on
BoNG features and SVMs reached higher recognition rates. On-line knowledge could
thereby be integrated to resolve 40.5 % of the OOV events and slightly improve
performance. As another way of combination of the two techniques, a late fusion
 
Search WWH ::




Custom Search