Databases Reference
In-Depth Information
ff
reasoners and five di
erent heuristics. The two reasoners are standard Pellet and Pellet
combined with approximate reasoning (not described in detail here). The five heuristics
are those described in Section 6.3. For each configuration of CELOE, we generate at
most 10 suggestions exceeding a heuristic threshold of 90%. Overall, this means that
therecanbeatmost2*5*10
=
100 suggestions per class - usually less, because
di
ed
and presented to the evaluators. For each suggestion, the evaluators can choose between
6 options (see Table 6):
ff
erent settings of CELOE will still result in similar suggestions. This list is shu
1 the suggestion improves the ontology (improvement)
2 the suggestion is no improvement and should not be included (not acceptable) and
3 adding the suggestion would be a modelling error (error)
In the case of existing definitions for class A , we removed them prior to learning. In this
case, the evaluator could choose between three further options:
4 the learned definition is equal to the previous one and both are good (equal
)
5 the learned definition is equal to the previous one and both are bad (equal -) and
6 the learned definition is inferior to the previous one (inferior).
+
We used the default settings of CELOE, e.g. a maximum execution time of 10 seconds
for the algorithm. The knowledge engineers were five experienced members of our re-
search group, who made themselves familiar with the domain of the test ontologies.
Each researcher worked independently and had to make 998 decisions for 92 classes
between one of the options. The time required to make those decisions was approxi-
mately 40 working hours per researcher. The raw agreement value of all evaluators is
0.535 (see e.g. [4] for details) with 4 out of 5 evaluators in strong pairwise agreement
(90%). The evaluation machine was a notebook with a 2 GHz CPU and 3 GB RAM.
Table 6 shows the evaluation results. All ontologies were taken from the Protégé
OWL 42 and TONES 43 repositories. We randomly selected 5 ontologies comprising in-
stance data from these two repositories, specifically the Earthrealm, Finance, Resist,
Economy and Breast Cancer ontologies (see Table 5).
The results in Table 6 show which options were selected by the evaluators. It clearly
indicates that the usage of approximate reasoning is sensible. The results are, however,
more di
erent employed heuristics. Using predic-
tive accuracy did not yield good results and, surprisingly, generalised F-Measure also
had a lower percentage of cases where option 1 was selected. The other three heuris-
tics generated very similar results. One reason is that those heuristics are all based on
precision and recall, but in addition the low quality of some of the randomly selected
test ontologies posed a problem. In cases of too many very severe modelling errors,
e.g. conjunctions and disjunctions mixed up in an ontology or inappropriate domain and
range restrictions, the quality of suggestions decreases for each of the heuristics. This is
the main reason why the results for the di
cult to interpret with regard to the di
ff
erent heuristics are very close. Particularly,
generalised F-Measure can show its strengths mainly for properly designed ontologies.
For instance, column 2 of Table 6 shows that it missed 7% of possible improvements.
ff
42 http://protegewiki.stanford.edu/index.php/Protege_Ontology_Library
43 http://owl.cs.manchester.ac.uk/repository/
Search WWH ::




Custom Search