Information Technology Reference
In-Depth Information
COHERENCE
COHESION
0.84
0.024
run1
run2
run3
run4
run5
run1
run2
run3
run4
run5
0.82
0.022
0.8
0.02
0.78
0.018
0.76
0.016
0.74
0.014
0.72
0.012
0.7
0.01
0.68
0.008
0.66
0.006
0
100
200
300
400
500
600
700
800
900
1000
0
100
200
300
400
500
600
700
800
900
1000
NUM. OF GENERATIONS
NUM. OF GENERATIONS
SWANSON PLAUSIBILITY
RELEVANCE
0.16
0.4
run1
run2
run3
run4
run5
run1
run2
run3
run4
run5
0.14
0.35
0.12
0.3
0.1
0.25
0.08
0.2
0.06
0.15
0.04
0.1
0.02
0.05
0
0
0
100
200
300
400
500
600
700
800
900
1000
0
100
200
300
400
500
600
700
800
900
1000
NUM. OF GENERATIONS
NUM. OF GENERATIONS
Fig. 9.5. GA evaluation for some of the criteria
number of generations is placed against the average objective value for some of
the eight criteria.
Some interesting facts can be noted. Almost all the criteria seem to stabilise
after (roughly) generation 700 for all the runs; that is, no further improvement
beyond this point is achieved and so this may give us an approximate indication
of the limits of the objective function values.
Another aspect worth highlighting is that despite a steady-state strategy being
used by the model to produce solutions, the individual evaluation criteria behave
in unstable ways to accommodate solutions which had to be removed or added.
As a consequence, it is not necessarily the case that all the criteria have to
monotonically increase.
In order to see this behavior, look at the results for the criteria for the same
period of time, between generations 200 and 300 for run 4. For an average
hypothesis, Coherence , Cohesion , Simplicity and Structure get worse, whereas
Coverage , Interestingness and Relevance , improve and Plausibility shows some
variability. Note that not all the criteria are shown in the graph.
b) Expert Assessment: this aims at assessing the quality (and therefore, effective-
ness) of the discovered knowledge on different criteria by human domain experts.
For this, we designed an experiment in which 20 human experts were involved
and each assessed 5 hypotheses selected from the Pareto set. We then asked
the experts to assess the hypotheses from 1 (worst) to 5 (best) in terms of the
 
Search WWH ::




Custom Search