Towards Large-Scale Schema and Ontology Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

3.1

Reduction of Search Space

The standard approach for pairwise schema matching is to compare every element

of the first schema with every element with the second schema to determine match-

ing schema elements, i.e., evaluation of the cross join. Such an approach has at least

a quadratic complexity with respect to schema size and does not scale well. There

are not only efficiency problems for large schemas but the large search space makes

it also very difficult to correctly identify matching element pairs. Hence, it is impor-

tant for large match tasks to reduce the search space in order to improve at least

efficiency and, potentially, match quality.

To reduce the search space for matching, we can adopt similar approaches as in

the area of entity resolution (or object matching), where the number of objects and

thus the search space is typically much larger than for schema matching. The initial

step to reduce the search space for entity matching has been called blocking, and

there exist numerous approaches for this task, e.g., based on clustering on selected

object attributes ( Elmagarmid et al. 2007 ).

For schema and ontology matching, two main types of approaches have been

considered to reduce the search space that we discuss in the following:

Early pruning of dissimilar element pairs

Partition-based matching

3.1.1

Early Pruning of Dissimilar Element Pairs

The idea is to limit the evaluation of the cartesian product to at most a few initial

steps in the match workflow, e.g., one matcher, and to eliminate all element pairs

with very low similarity from further processing since they are very unlikely to

match. This idea is especially suitable for workflows with sequential matchers where

the first matcher can evaluate the cartesian product, but all highly dissimilar element

pairs are excluded in the evaluation of subsequent matchers and the combination of

match results.

Quick ontology matching (QOM) was one of the first approaches to implement

this idea ( Ehrig and Staab 2004 ). It iteratively applies a sequence of matchers and

can restrict the search space for every matcher. The considered approaches to restrict

the search space include focusing on elements with similar names (labels) or similar

structural properties. The authors showed that the runtime complexity of QOM can

be reduced to

.n 2 /

.

Peukert et al. ( 2010a ) propose the use of filter operators within match work-

flows to prune dissimilar element pairs (whose similarity is below some minimal

threshold) from intermediate match results. The threshold is either statically pre-

determined or dynamically derived from the similarity threshold used in the match

workflow to finally select match correspondences. Peukert et al. ( 2010a )alsopro-

pose a rule-based approach to rewrite match workflows for improved efficiency,

particularly to place filter operators within sequences of matchers.

.

O

.n

log

.n//

instead of O

for ontologies of size

n

Schema Matching and Mapping

Search WWH ::

Custom Search

Home