Tuning for Schema Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

6.6

Conclusion

This last section underlines the fact that selecting an appropriate schema matching

tool is the first issue to be considered. A few works have been proposed in this

domain, which is recognized as one of the ten matching challenges for the next

decade [ Shvaiko and Euzenat 2008 ]. If we exclude the AHP approach, for which

no experiment is provided, the remaining tools are all based on machine learning

techniques. This is an interesting feature since more datasets with correct corre-

spondences are becoming available. However, discovering the features of a dataset

to determine the most appropriate tool could be a challenging task.

7Conclu ion

In this chapter, we have provided an overview about what has been done for tuning

schema matchers. At first, schema matchers enabled users to configure some of

their low-level parameters (e.g., thresholds). They mainly allow to filter or select

the output (the set of mappings). The next step deals with parameters for combining

similarity measures. They add more flexibility and the set of discovered mappings

depends on the configuration of these parameters. More recently, some works went

up one level further by selecting the appropriate matcher for a given matching task.

These tools lessen the burden of the user by automatically tuning most of the low-

level parameters.

In the meanwhile, much effort has also been spent to integrate user preferences

or input data parameters. Most of them are based on machine learning techniques so

that schema instances or expert feedback can be used in the process. The integration

of such parameters is often an extra means for improving matching quality. User

preferences such as the promotion of precision or recall let users choose how they

intend to manage post-match effort. These options are also interesting in contexts,

where high dynamicity leads to a quick evolution of data sources, thus implying

that a high precision is preferred. On the contrary, recall can be promoted when data

sources are going to be fully integrated and manually checked.

Although a default configuration should still be proposed with a matcher, we

believe that we are heading towards a specific configuration of a schema matcher

for a given matching task. Namely, various properties of the matching scenario can

be computed by the tool. The latter can then deduce, based on previous experiments

or properties values, the best configuration. Visual tools have a strong impact on the

manual post-match effort. By displaying the results of different matching strategies,

one has sufficient information to check and (in)validate the mappings. Combined

with user preferences, these tools would clearly reduce manual post-match effort. To

the best of our knowledge, there are currently no works which study the impact of

the tuning (during pre-match effort) over matching quality (and post-match effort).

A balanced effort between parameters that would bring significant impact on the

matching quality given a matching task might be further investigated.

Search WWH ::

Custom Search

Home