Databases Reference
In-Depth Information
3.1
Reduction of Search Space
The standard approach for pairwise schema matching is to compare every element
of the first schema with every element with the second schema to determine match-
ing schema elements, i.e., evaluation of the cross join. Such an approach has at least
a quadratic complexity with respect to schema size and does not scale well. There
are not only efficiency problems for large schemas but the large search space makes
it also very difficult to correctly identify matching element pairs. Hence, it is impor-
tant for large match tasks to reduce the search space in order to improve at least
efficiency and, potentially, match quality.
To reduce the search space for matching, we can adopt similar approaches as in
the area of entity resolution (or object matching), where the number of objects and
thus the search space is typically much larger than for schema matching. The initial
step to reduce the search space for entity matching has been called blocking, and
there exist numerous approaches for this task, e.g., based on clustering on selected
object attributes ( Elmagarmid et al. 2007 ).
For schema and ontology matching, two main types of approaches have been
considered to reduce the search space that we discuss in the following:
Early pruning of dissimilar element pairs
Partition-based matching
3.1.1
Early Pruning of Dissimilar Element Pairs
The idea is to limit the evaluation of the cartesian product to at most a few initial
steps in the match workflow, e.g., one matcher, and to eliminate all element pairs
with very low similarity from further processing since they are very unlikely to
match. This idea is especially suitable for workflows with sequential matchers where
the first matcher can evaluate the cartesian product, but all highly dissimilar element
pairs are excluded in the evaluation of subsequent matchers and the combination of
match results.
Quick ontology matching (QOM) was one of the first approaches to implement
this idea ( Ehrig and Staab 2004 ). It iteratively applies a sequence of matchers and
can restrict the search space for every matcher. The considered approaches to restrict
the search space include focusing on elements with similar names (labels) or similar
structural properties. The authors showed that the runtime complexity of QOM can
be reduced to
.n 2 /
.
Peukert et al. ( 2010a ) propose the use of filter operators within match work-
flows to prune dissimilar element pairs (whose similarity is below some minimal
threshold) from intermediate match results. The threshold is either statically pre-
determined or dynamically derived from the similarity threshold used in the match
workflow to finally select match correspondences. Peukert et al. ( 2010a )alsopro-
pose a rule-based approach to rewrite match workflows for improved efficiency,
particularly to place filter operators within sequences of matchers.
.
O
.n
log
.n//
instead of O
for ontologies of size
n
Search WWH ::




Custom Search