Databases Reference
In-Depth Information
6.6
Conclusion
This last section underlines the fact that selecting an appropriate schema matching
tool is the first issue to be considered. A few works have been proposed in this
domain, which is recognized as one of the ten matching challenges for the next
decade [ Shvaiko and Euzenat 2008 ]. If we exclude the AHP approach, for which
no experiment is provided, the remaining tools are all based on machine learning
techniques. This is an interesting feature since more datasets with correct corre-
spondences are becoming available. However, discovering the features of a dataset
to determine the most appropriate tool could be a challenging task.
7Conclu ion
In this chapter, we have provided an overview about what has been done for tuning
schema matchers. At first, schema matchers enabled users to configure some of
their low-level parameters (e.g., thresholds). They mainly allow to filter or select
the output (the set of mappings). The next step deals with parameters for combining
similarity measures. They add more flexibility and the set of discovered mappings
depends on the configuration of these parameters. More recently, some works went
up one level further by selecting the appropriate matcher for a given matching task.
These tools lessen the burden of the user by automatically tuning most of the low-
level parameters.
In the meanwhile, much effort has also been spent to integrate user preferences
or input data parameters. Most of them are based on machine learning techniques so
that schema instances or expert feedback can be used in the process. The integration
of such parameters is often an extra means for improving matching quality. User
preferences such as the promotion of precision or recall let users choose how they
intend to manage post-match effort. These options are also interesting in contexts,
where high dynamicity leads to a quick evolution of data sources, thus implying
that a high precision is preferred. On the contrary, recall can be promoted when data
sources are going to be fully integrated and manually checked.
Although a default configuration should still be proposed with a matcher, we
believe that we are heading towards a specific configuration of a schema matcher
for a given matching task. Namely, various properties of the matching scenario can
be computed by the tool. The latter can then deduce, based on previous experiments
or properties values, the best configuration. Visual tools have a strong impact on the
manual post-match effort. By displaying the results of different matching strategies,
one has sufficient information to check and (in)validate the mappings. Combined
with user preferences, these tools would clearly reduce manual post-match effort. To
the best of our knowledge, there are currently no works which study the impact of
the tuning (during pre-match effort) over matching quality (and post-match effort).
A balanced effort between parameters that would bring significant impact on the
matching quality given a matching task might be further investigated.
Search WWH ::




Custom Search