Towards Large-Scale Schema and Ontology Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

3.1.2

Partition-Based Matching

Partition-based matching is a divide-and-conquer strategy to first partition the input

schemas/ontologies and then perform a partition-wise matching. The idea is to per-

form partitioning in such a way that every partition of the first schema has to be

matched with only a subset of the partitions (ideally, only with one partition) of

the second schema. This results in a significant reduction of the search space and

thus improved efficiency. Furthermore, matching the smaller partitions reduces the

memory requirements compared to matching the full schemas. To further improve

performance, the partition-based match tasks may be performed in parallel.

There are many possible ways to perform partitioning, and finding the most effec-

tive approaches is still an open research problem. COMA

was one of the first

systems to support partition-based schema matching by a so-called fragment match-

ing ( Aumueller et al. 2005 ; Do and Rahm 2007 ). Fragment matching works in two

phases. In the first phase, the fragments of a specified type (e.g., user-specified frag-

ments or subschemas such as relational tables or message formats in large XML

schemas) are determined and compared with each other to identify the most similar

fragments from the other schema worth to be fully matched later. The search for

similar fragments is some kind of light-weight matching, e.g., based on the sim-

ilarity of the fragment roots. In the second phase, each pair of similar fragments

is independently matched to identify correspondences between their elements. The

fragment-based match results are finally merged to obtain the complete output map-

ping. In the evaluation for large XML schemas in ( Do and Rahm 2007 ), fragment

matching not only improved execution times significantly but also led to a slight

improvement of match quality.

The ontology matching system Falcon-AO also supports partition-based match-

ing to reduce the search space ( Hu et al. 2008 ). The approach is similar to fragment

matching but uses a structural clustering to initially partition the ontologies into rel-

atively small, disjoint blocks. Matching is then restricted to the most similar blocks

from the two ontologies. To determine the block similarity, Falcon-AO utilizes the

so-called anchors . Anchors are highly similar element pairs that are determined

before partitioning by a combined name/comment matcher. Figure 1.3 illustrates

the idea to limit matching to pairs of similar partitions sharing at least one anchor.

In the shown example, only partitions of the same color are matched with each other

(e.g., B S2 with B T2 ), while partitions without shared anchor

CC

are excluded from

matching. An extension of the Falcon approach has been proposed for the Ta x o m a p

system ( Hamdi et al. 2009 ). They first partition only one of the ontologies like in

Falcon and then try to partition the second ontology accordingly. In particular, it is

tried to achieve that the anchors per partition can be localized in few partitions of

the other ontology to reduce the number of pairs to be matched.

The taxonomy matching system AnchorFlood implements a dynamic partition-

based matching that avoids the a-priori partitioning of the ontologies ( Hanif and

Aono 2009 ). It also utilizes anchors (similar concept pairs) but takes them as a start-

ing point to incrementally match elements in their structural neighborhood until no

further matches are found or all elements are processed. Thus the partitions (called

.

B T3 /

Schema Matching and Mapping

Search WWH ::

Custom Search

Home