Schema Matcher Ensembles - Uncertain Schema Matching

Databases Reference

In-Depth Information

precision when individual performance is considered, yet it was not even part of the SMB decision

making! The most important matcher for SMB was ( Term , Intersection ) , ranked 11th according to

individual performance in terms of F-Measure and 10th in terms of precision. ( Precedence , SM ) ,

ranked second for SMB , has a mediocre individual performance. Figure 4.2 (bottom) highlights the

performance (on a precision vs. recall scale) of the four top matchers of SMB .

Our first observation is that the decision making of SMB is not linear in the individual perfor-

mance of matchers, and therefore the SMB training process is valuable. Second, we observe that SMB

seeks diversity in its decision making. It uses Term , Value (combined with Term due to its individual

poor performance), Composition , and Precedence . Given these four matchers, SMB has no need for

the Combined matcher, which provides a weighted average of the four. This explains the absence of

( Combined , Dominants ) .

As a final remark, Duchateau et al. [ 2008 ] discuss a different aspect of ensemble construction.

In their work, a set of matchers is built into a decision tree. Then, in run-time and based on

intermediate results, the ensemble suits itself to the needs of the specific matching instance. This

setting can be considered as a run-time dynamic ensemble construction as opposed to the design

time construction of SMB .

Search WWH ::

Custom Search

Home