Tuning for Schema Matching - Schema Matching and Mapping

Databases Reference

In-Depth Information

machine learning techniques [ Duchateau 2009 ]. Similarly, Anchor-PROMPT [ Noy

and Musen 2001 ] automatically computes the threshold values by averaging all

similarity scores obtained on different runs with various parameter configurations.

In a broader way, authors of [ Melnik et al. 2002 ] discuss the notion of fil-

ters to select the mappings. These filters include not only the thresholds, but also

constraints between elements (types and cardinality) and a selection function.

Note that the threshold may be a parameter applied to a global similarity value,

i.e., different similarity values are aggregated into a global one (given a strategy,

see Sect. 5 ) and the threshold represents the decision-maker for accepting the pair

of schema elements as a correspondence or not.

4.3

Various

Contrary to most aggregation-based approaches, Similarity Flooding/Rondo

[ Melnik et al. 2002 , 2003 ] uses a graph propagation mechanism to refine simi-

larities between schema elements. Thus, it holds specific parameters. The first one

is fixpoint formula, which enables the computation of updated similarities and the

end of execution of the propagation. Different fixpoint formulas have been tested

and evaluated in Melnik et al. [ 2002 ]. In addition, several filters are proposed

to select among all candidate pairs the ones that Rondo displays as mappings.

Constraints (on cardinality and types) or thresholds are examples of filters.

For a given schema element, we do not know in advance to how many elements

it should be matched [ Avigdor 2005 ]. However, approaches such as COMA

CC

[ Aumueller et al. 2005 ]oriMAP[ Dhamankar et al. 2004 ] can display the top-K

correspondences (for future interactive mode), thus enabling users to disambiguate

complex correspondences. Other works have been specifically designed to discover

complex mappings, such as Porsche [ Saleem and Bellahsene 2009 ].

4.4

Conclusion

This section describes the parameters related to similarity measures. Although they

have a significant impact, parameters inside the similarity measures are often set

to default values. Schema matching tools let users tune the thresholds ,whichisa

traditional decision maker for deciding what happens to a pair of schema elements.

Finally, we have detailed specific parameters that users have to understand before

optimizing the matchers. In the next section, we reach one level up by studying the

parameters related to the combination of similarity measures.

Search WWH ::

Custom Search

Home