Databases Reference
In-Depth Information
3.1.2
Partition-Based Matching
Partition-based matching is a divide-and-conquer strategy to first partition the input
schemas/ontologies and then perform a partition-wise matching. The idea is to per-
form partitioning in such a way that every partition of the first schema has to be
matched with only a subset of the partitions (ideally, only with one partition) of
the second schema. This results in a significant reduction of the search space and
thus improved efficiency. Furthermore, matching the smaller partitions reduces the
memory requirements compared to matching the full schemas. To further improve
performance, the partition-based match tasks may be performed in parallel.
There are many possible ways to perform partitioning, and finding the most effec-
tive approaches is still an open research problem. COMA
was one of the first
systems to support partition-based schema matching by a so-called fragment match-
ing ( Aumueller et al. 2005 ; Do and Rahm 2007 ). Fragment matching works in two
phases. In the first phase, the fragments of a specified type (e.g., user-specified frag-
ments or subschemas such as relational tables or message formats in large XML
schemas) are determined and compared with each other to identify the most similar
fragments from the other schema worth to be fully matched later. The search for
similar fragments is some kind of light-weight matching, e.g., based on the sim-
ilarity of the fragment roots. In the second phase, each pair of similar fragments
is independently matched to identify correspondences between their elements. The
fragment-based match results are finally merged to obtain the complete output map-
ping. In the evaluation for large XML schemas in ( Do and Rahm 2007 ), fragment
matching not only improved execution times significantly but also led to a slight
improvement of match quality.
The ontology matching system Falcon-AO also supports partition-based match-
ing to reduce the search space ( Hu et al. 2008 ). The approach is similar to fragment
matching but uses a structural clustering to initially partition the ontologies into rel-
atively small, disjoint blocks. Matching is then restricted to the most similar blocks
from the two ontologies. To determine the block similarity, Falcon-AO utilizes the
so-called anchors . Anchors are highly similar element pairs that are determined
before partitioning by a combined name/comment matcher. Figure 1.3 illustrates
the idea to limit matching to pairs of similar partitions sharing at least one anchor.
In the shown example, only partitions of the same color are matched with each other
(e.g., B S2 with B T2 ), while partitions without shared anchor
CC
are excluded from
matching. An extension of the Falcon approach has been proposed for the Ta x o m a p
system ( Hamdi et al. 2009 ). They first partition only one of the ontologies like in
Falcon and then try to partition the second ontology accordingly. In particular, it is
tried to achieve that the anchors per partition can be localized in few partitions of
the other ontology to reduce the number of pairs to be matched.
The taxonomy matching system AnchorFlood implements a dynamic partition-
based matching that avoids the a-priori partitioning of the ontologies ( Hanif and
Aono 2009 ). It also utilizes anchors (similar concept pairs) but takes them as a start-
ing point to incrementally match elements in their structural neighborhood until no
further matches are found or all elements are processed. Thus the partitions (called
.
B T3 /
Search WWH ::




Custom Search