Information Technology Reference
In-Depth Information
• For the expert assessment: the scores of the different criteria for every hypothesis 3
are averaged. Note that this will produce values between 1 and 5, with 5 being
the best.
• For the model evaluation: for every hypothesis, both the objective value and
the fitness are considered as follows: whereas the lower the fitness score, the
better the hypothesis, the higher the objective value, the better the hypothesis.
Therefore, we subtract the fitness from 1 for each hypothesis and then we add
this to the average value of the objective values for this hypothesis. Note that
this will produce values between 0 and 2, with 2 being the best.
We then calculated the pair of values for every hypothesis and obtained a (Spear-
man) correlation r =0 . 43 ( t−test =23 . 75 ,df =24 ,p < 0 . 001). From this result, we
see that the correlation shows a good level of prediction compared to humans. This
indicates that for such a complex task (knowledge discovery), the model's behavior
is not too different from the experts' (see Figure 9.6).
4
3.5
3
2.5
2
1.5
Average Score (EXPERT)
Average Fitness (SYSTEM)
1
0
5
10
15
20
25
HYPOTHESIS
Fig. 9.6. Correlation between human and system evaluation of discovered hypothe-
ses
Note that in Mooney's experiment using simple discovered rules, a lower human-
system correlation of r =0 . 386 was obtained. Considering also that the human
subjects were not domain experts as in our case, our results are encouraging as
these involve a more demanding process which requires further comprehension of
both the hypothesis itself and the working domain. In addition, our model was able
to do it better without any external linguistic resources as in Mooney's experiments
[26].
In order to show what the final hypotheses look like and how the good charac-
teristics and less desirable features above are exhibited, we picked one of the best
hypotheses as assessed by the experts (i.e., we picked one of the best 25 of the
100 final hypotheses) based on the average value of the 5 scores they assigned. For
example, hypothesis 65 of run 4 looks like:
3 ADD is not considered here as this does not measure a typical KDD aspect,
Search WWH ::




Custom Search