Databases Reference
In-Depth Information
precision when individual performance is considered, yet it was not even part of the
SMB
decision
making! The most important matcher for
SMB
was
(
Term
,
Intersection
)
, ranked 11th according to
individual performance in terms of F-Measure and 10th in terms of precision.
(
Precedence
,
SM
)
,
ranked second for
SMB
, has a mediocre individual performance. Figure
4.2
(bottom) highlights the
performance (on a precision vs. recall scale) of the four top matchers of
SMB
.
Our first observation is that the decision making of
SMB
is not linear in the individual perfor-
mance of matchers, and therefore the
SMB
training process is valuable. Second, we observe that
SMB
seeks diversity in its decision making. It uses
Term
,
Value
(combined with
Term
due to its individual
poor performance),
Composition
, and
Precedence
. Given these four matchers,
SMB
has no need for
the
Combined
matcher, which provides a weighted average of the four. This explains the absence of
(
Combined
,
Dominants
)
.
As a final remark,
Duchateau et al.
[
2008
] discuss a different aspect of ensemble construction.
In their work, a set of matchers is built into a decision tree. Then, in run-time and based on
intermediate results, the ensemble suits itself to the needs of the specific matching instance. This
setting can be considered as a run-time dynamic ensemble construction as opposed to the design
time construction of
SMB
.