Databases Reference
In-Depth Information
analysis of Gal et al. [ 2005a ], where the exact matching was found, on average, at K
=
7 for the
combined matcher and at K> 100 for the term matcher.
Looking at the shape of the graphs, it appears that with both matchers, the improvement (in
terms of precision) levels off at about t =
0 . 9. In general, the greater demand of a higher threshold
benefits the heuristic up to a point, beyond which it becomes impossible for the top- K algorithm
to keep even its stronger attribute correspondences. The “break-even point" depends to a great
extent on the size of the schema, some evidence for which is given below. The term matcher has a
more wiggly precision result, with some decrease in precision for t =
0 . 1 , 0 . 6 , and 0 . 7. Therefore,
the performance of the term matcher is less predictable (although the difference is not statistically
significant) than that of the combined matcher.
Table 5.4: Stability analysis of ontology classes
Te r m
Combined
Precision
Recall
Precision
Recall
Schema Class
Max Incr.
t
Max Decr.
t
Max Incr.
t
Max Decr.
t
Strongly Similar
19 . 8%
1 . 0 . 1%
1 . 0 . 5%
1 . 0
5 . 8%
1 . 0
Weakly Similar
12 . 1%
0 . 9
13 . 7%
1 . 0
23 . 3%
0 . 9
10 . 3%
1 . 0
Large
7 . 6%
1 . 0 . 5%
1 . 0 . 7%
1 . 0
3 . 6%
1 . 0
Small
20 . 7%
1 . 0
14 . 3%
1 . 0
29 . 5%
1 . 0
11 . 1%
1 . 0
Similar
8 . 9%
0 . 9 . 3%
1 . 0 . 9%
0 . 9
7% 1 . 0
Disimilar
24 . 1%
1 . 0
12 . 6%
1 . 0
32 . 3
0 . 9
9%
1 . 0
Low Initial Precision
11 . 4%
0 . 9 . 9%
1 . 0 . 6%
0 . 9
11 . 6%
1 . 0
High Initial Precision
19 . 2%
1 . 0
7 . 1%
1 . 0
24 . 5%
1 . 0
4 . 7%
1 . 0
Table 5.4 summarizes different ways of partitioning schema pairs according to various prop-
erties. A schema pair for which 60% or more of the attributes in one schema (called the target
schema) can be matched to attributes in the other is considered to be strongly similar . 22 of the 43
pairs were strongly similar, with similarity ranging from 60% to 94 . 7%. K was set to 10. The results
show improvement for strongly similar pairs over the whole group in the level of precision, with a
small drop in recall. The precision of the term matcher was up to 19 . 8% higher for strongly similar
pairs (with t = 1), while for the combined matcher, it increased by 28 . 5% (again with t = 1). Recall
decreased by 10 . 1% for the term matcher and by a smaller 5 . 8% for the combined matcher. We can
conclude from this that stability analysis works better for strongly similar ontologies.
A schema was defined to be large in this experiment if it had more than 20 attributes. There
were 18 large target schemata. The results show less improvement in precision for larger schemata,
but smaller decreases in recall. Precision increased by up to 7 . 6% for the term matcher and 16 . 7%
for the combined matcher (with t = 1 in both cases). As for recall, it was lower by 8 . 5% for the
term matcher and by 3 . 6% for the combined matcher. The smaller gain in precision and the smaller
reduction in recall for larger schemata can be explained by the smaller marginal impact of a single
attribute on the matcher's overall performance. Generally speaking, however, it seems that stability
analysis is better suited for smaller schemata.
Schema pairs were considered similar if the number of attributes in the two schemata differed
by less than 30% of the target schema (there were 23 such pairs). The results show less improvement
 
Search WWH ::




Custom Search