Databases Reference
In-Depth Information
machine learning techniques [ Duchateau 2009 ]. Similarly, Anchor-PROMPT [ Noy
and Musen 2001 ] automatically computes the threshold values by averaging all
similarity scores obtained on different runs with various parameter configurations.
In a broader way, authors of [ Melnik et al. 2002 ] discuss the notion of fil-
ters to select the mappings. These filters include not only the thresholds, but also
constraints between elements (types and cardinality) and a selection function.
Note that the threshold may be a parameter applied to a global similarity value,
i.e., different similarity values are aggregated into a global one (given a strategy,
see Sect. 5 ) and the threshold represents the decision-maker for accepting the pair
of schema elements as a correspondence or not.
4.3
Various
Contrary to most aggregation-based approaches, Similarity Flooding/Rondo
[ Melnik et al. 2002 , 2003 ] uses a graph propagation mechanism to refine simi-
larities between schema elements. Thus, it holds specific parameters. The first one
is fixpoint formula, which enables the computation of updated similarities and the
end of execution of the propagation. Different fixpoint formulas have been tested
and evaluated in Melnik et al. [ 2002 ]. In addition, several filters are proposed
to select among all candidate pairs the ones that Rondo displays as mappings.
Constraints (on cardinality and types) or thresholds are examples of filters.
For a given schema element, we do not know in advance to how many elements
it should be matched [ Avigdor 2005 ]. However, approaches such as COMA
CC
[ Aumueller et al. 2005 ]oriMAP[ Dhamankar et al. 2004 ] can display the top-K
correspondences (for future interactive mode), thus enabling users to disambiguate
complex correspondences. Other works have been specifically designed to discover
complex mappings, such as Porsche [ Saleem and Bellahsene 2009 ].
4.4
Conclusion
This section describes the parameters related to similarity measures. Although they
have a significant impact, parameters inside the similarity measures are often set
to default values. Schema matching tools let users tune the thresholds ,whichisa
traditional decision maker for deciding what happens to a pair of schema elements.
Finally, we have detailed specific parameters that users have to understand before
optimizing the matchers. In the next section, we reach one level up by studying the
parameters related to the combination of similarity measures.
Search WWH ::




Custom Search