Top-K Schema Matchings - Uncertain Schema Matching

Databases Reference

In-Depth Information

analysis of Gal et al. [ 2005a ], where the exact matching was found, on average, at K

7 for the

combined matcher and at K> 100 for the term matcher.

Looking at the shape of the graphs, it appears that with both matchers, the improvement (in

terms of precision) levels off at about t =

0 . 9. In general, the greater demand of a higher threshold

benefits the heuristic up to a point, beyond which it becomes impossible for the top- K algorithm

to keep even its stronger attribute correspondences. The “break-even point" depends to a great

extent on the size of the schema, some evidence for which is given below. The term matcher has a

more wiggly precision result, with some decrease in precision for t =

0 . 1 , 0 . 6 , and 0 . 7. Therefore,

the performance of the term matcher is less predictable (although the difference is not statistically

significant) than that of the combined matcher.

Table 5.4: Stability analysis of ontology classes

Te r m

Combined

Precision

Recall

Precision

Recall

Schema Class

Max Incr.

Max Decr.

Max Incr.

Max Decr.

Strongly Similar

19 . 8%

1 . 0 . 1%

1 . 0 . 5%

1 . 0

5 . 8%

1 . 0

Weakly Similar

12 . 1%

0 . 9

13 . 7%

1 . 0

23 . 3%

0 . 9

10 . 3%

1 . 0

Large

7 . 6%

1 . 0 . 5%

1 . 0 . 7%

1 . 0

3 . 6%

1 . 0

Small

20 . 7%

1 . 0

14 . 3%

1 . 0

29 . 5%

1 . 0

11 . 1%

1 . 0

Similar

8 . 9%

0 . 9 . 3%

1 . 0 . 9%

0 . 9

7% 1 . 0

Disimilar

24 . 1%

1 . 0

12 . 6%

1 . 0

32 . 3

0 . 9

1 . 0

Low Initial Precision

11 . 4%

0 . 9 . 9%

1 . 0 . 6%

0 . 9

11 . 6%

1 . 0

High Initial Precision

19 . 2%

1 . 0

7 . 1%

1 . 0

24 . 5%

1 . 0

4 . 7%

1 . 0

Table 5.4 summarizes different ways of partitioning schema pairs according to various prop-

erties. A schema pair for which 60% or more of the attributes in one schema (called the target

schema) can be matched to attributes in the other is considered to be strongly similar . 22 of the 43

pairs were strongly similar, with similarity ranging from 60% to 94 . 7%. K was set to 10. The results

show improvement for strongly similar pairs over the whole group in the level of precision, with a

small drop in recall. The precision of the term matcher was up to 19 . 8% higher for strongly similar

pairs (with t = 1), while for the combined matcher, it increased by 28 . 5% (again with t = 1). Recall

decreased by 10 . 1% for the term matcher and by a smaller 5 . 8% for the combined matcher. We can

conclude from this that stability analysis works better for strongly similar ontologies.

A schema was defined to be large in this experiment if it had more than 20 attributes. There

were 18 large target schemata. The results show less improvement in precision for larger schemata,

but smaller decreases in recall. Precision increased by up to 7 . 6% for the term matcher and 16 . 7%

for the combined matcher (with t = 1 in both cases). As for recall, it was lower by 8 . 5% for the

term matcher and by 3 . 6% for the combined matcher. The smaller gain in precision and the smaller

reduction in recall for larger schemata can be explained by the smaller marginal impact of a single

attribute on the matcher's overall performance. Generally speaking, however, it seems that stability

analysis is better suited for smaller schemata.

Schema pairs were considered similar if the number of attributes in the two schemata differed

by less than 30% of the target schema (there were 23 such pairs). The results show less improvement

Uncertain Schema Matching

Search WWH ::

Custom Search

Home