Information Technology Reference
In-Depth Information
solution is better than another in global terms, that is, a child is better if this is a
becomes a non-dominated hypothesis.
Next, since our model is based on a multi-criteria approach, we have to face
three important issues in order to assess every hypothesis' fitness: Pareto dominance,
fitness assignment and the diversity problem [5]. Despite an important number of
state-of-the-art methods to handle these issues [5], only a small number of them
has focused on the problem in an integrated and representation-independent way. In
particular, Zitzler [35] proposes an interesting method, Strength Pareto Evolutionary
Algorithm (SPEA) which uses a mixture of established methods and new techniques
in order to find multiple Pareto-optimal solutions in parallel, and at the same time
to keep the population as diverse as possible. We have also adapted the original
SPEA algorithm to allow for the incremental updating of the Pareto-optimal set
along with our steady-state replacement method.
9.4 Analysis and Results
In order to assess the quality of the discovered knowledge (hypotheses) by the model
a Prolog-based prototype has been built. The IE task has been implemented as a set
of modules whose main outcome is the set of rules extracted from the documents.
In addition, an intermediate training module is responsible for generating informa-
tion from the LSA analysis and from the rules just produced. The initial rules are
represented by facts containing lists of relations both for antecedent and consequent.
For the purpose of the experiments, the corpus of documents has been obtained
from the AGRIS database for agricultural and food science. We selected this kind of
corpus as it has been properly cleaned-up, and builds upon a scientific area which
we do not have any knowledge about so to avoid any possible bias and to make the
results more realistic. A set of 1000 documents was extracted from which one third
were used for setting parameters and making general adjustments, and the rest were
used for the GA itself in the evaluation stage.
Next, we tried to provide answers to two basic questions concerning our original
aims:
a) How well does the GA for KDT behave?
b) How good are the hypotheses produced according to human experts in terms of
text mining's ultimate goals: interestingness, novelty and usefulness, etc.
In order to address these issues, we used a methodology consisting of two phases:
the system evaluation and the experts' assessment.
a) System Evaluation: this aims at investigating the behavior and the results pro-
duced by the GA.
We set the GA by generating an initial population of 100 semi-random hy-
potheses. In addition, we defined the main global parameters such as Mutation
Probability (0.2), Crossover Probability (0.8), Maximum Size of Pareto set (5%),
etc. We ran five versions of the GA with the same configuration of parameters
but different pairs of terms to address the quest for explanatory novel hypothe-
ses.
The different results obtained from running the GA as used for our experiment
are shown in the form of a representative behavior in figure 9.5, where the
Search WWH ::




Custom Search