Databases Reference
In-Depth Information
Intra-matcher parallelization is more versatile and deals with internal paralleliza-
tion of matchers, typically based on a partitioning of the schemas or ontologies to
be matched. Partitioning leads to many smaller match tasks that can be executed
in parallel with reduced memory requirements per task. By choosing appropri-
ate partition sizes, the approach becomes very flexible and scalable. Furthermore,
intra-matcher parallelism can be applied for sequential as well as independently
executable matchers, i.e., it can also be combined with inter-matcher parallelism.
The partition-based matching discussed in Sect. 3.1 inherently supports intra-
matcher parallelization as well as a reduction of the search space by limiting
matching to pairs of similar partitions. However, intra-matcher parallelization could
also be applied without reduced search space by matching all partition pairs, i.e., to
evaluate the Cartesian product in parallel. As discussed in Gross et al. ( 2010 ), such
a simple, generic parallelization is applicable for virtually all element-level match-
ers (e.g., name matching) but can also be adapted for structural matching. In this
case, one can also choose a very simple, size-based partitioning (same number of
elements per partition) supporting good load balancing.
3.3
Self-Tuning Match Workflows
The match workflows in most current systems need to be manually defined and con-
figured. This affects the choice of matchers to be applied and specification of the
methods to combine matcher results and to finally select match correspondences.
Obviously, these decisions have a significant impact on both effectiveness and effi-
ciency and are thus especially critical for large-scale match tasks. Unfortunately,
the huge number of possible configurations makes it very difficult even for expert
users to define suitable match workflows. Hence, the adoption of semi-automatic
tuning approaches becomes increasingly necessary and should especially consider
the challenges of matching large schemas.
The companion topic chapter ( Bellahsene and Duchateau 2011 ) provides an
overview of recent approaches including tuning frameworks such as Apfel and
eTuner ( Ehrig et al. 2005 ; Lee et al. 2007 ). Most previous approaches for automatic
tuning apply supervised machine learning methods. They use previously solved
match tasks as training to find effective choices for matcher selection and parame-
ter settings such as similarity thresholds and weights to aggregate similarity values,
e.g., Duchateau et al. ( 2009 ). A key problem of such approaches is the difficulty of
collecting sufficient training data that may itself incur a substantial effort. A further
problem is that even within a domain, the successful configurations for one match
problem do not guarantee sufficient match quality for different problems, especially
for matching large schemas. Therefore, one would need methods to preselect suit-
able and sufficient training correspondences for a given match task, which is an open
challenge.
Tan an d L am b r ix ( 2007 ) propose an alternative approach that recommends a
promising match strategy for a given match problem. They first select a limited
Search WWH ::




Custom Search