Towards Large-Scale Schema and Ontology Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

Intra-matcher parallelization is more versatile and deals with internal paralleliza-

tion of matchers, typically based on a partitioning of the schemas or ontologies to

be matched. Partitioning leads to many smaller match tasks that can be executed

in parallel with reduced memory requirements per task. By choosing appropri-

ate partition sizes, the approach becomes very flexible and scalable. Furthermore,

intra-matcher parallelism can be applied for sequential as well as independently

executable matchers, i.e., it can also be combined with inter-matcher parallelism.

The partition-based matching discussed in Sect. 3.1 inherently supports intra-

matcher parallelization as well as a reduction of the search space by limiting

matching to pairs of similar partitions. However, intra-matcher parallelization could

also be applied without reduced search space by matching all partition pairs, i.e., to

evaluate the Cartesian product in parallel. As discussed in Gross et al. ( 2010 ), such

a simple, generic parallelization is applicable for virtually all element-level match-

ers (e.g., name matching) but can also be adapted for structural matching. In this

case, one can also choose a very simple, size-based partitioning (same number of

elements per partition) supporting good load balancing.

3.3

Self-Tuning Match Workflows

The match workflows in most current systems need to be manually defined and con-

figured. This affects the choice of matchers to be applied and specification of the

methods to combine matcher results and to finally select match correspondences.

Obviously, these decisions have a significant impact on both effectiveness and effi-

ciency and are thus especially critical for large-scale match tasks. Unfortunately,

the huge number of possible configurations makes it very difficult even for expert

users to define suitable match workflows. Hence, the adoption of semi-automatic

tuning approaches becomes increasingly necessary and should especially consider

the challenges of matching large schemas.

The companion topic chapter ( Bellahsene and Duchateau 2011 ) provides an

overview of recent approaches including tuning frameworks such as Apfel and

eTuner ( Ehrig et al. 2005 ; Lee et al. 2007 ). Most previous approaches for automatic

tuning apply supervised machine learning methods. They use previously solved

match tasks as training to find effective choices for matcher selection and parame-

ter settings such as similarity thresholds and weights to aggregate similarity values,

e.g., Duchateau et al. ( 2009 ). A key problem of such approaches is the difficulty of

collecting sufficient training data that may itself incur a substantial effort. A further

problem is that even within a domain, the successful configurations for one match

problem do not guarantee sufficient match quality for different problems, especially

for matching large schemas. Therefore, one would need methods to preselect suit-

able and sufficient training correspondences for a given match task, which is an open

challenge.

Tan an d L am b r ix ( 2007 ) propose an alternative approach that recommends a

promising match strategy for a given match problem. They first select a limited

Search WWH ::

Custom Search

Home