Databases Reference
In-Depth Information
boosting algorithms ( e.g. ,[ Schapire , 1990 ]). This class of algorithms can strengthen weak classifiers
to achieve arbitrarily high accuracy; they have been shown to be effective in the construction of
successful classifiers. Given a set of weak classifiers, the algorithm iterates over them while re-
weighting the importance of elements in the training set.
Many boosting algorithms have been proposed in the literature. Following Gal and Sagi
[ 2010 ], we describe a matcher based on the AdaBoost algorithm [ Freund and Schapire , 1999 ].
AdaBoost, presented in Section 4.3.1 , is the most popular and historically most significant boost-
ing algorithm. SMB , the boosting heuristic for schema matching, is described first, followed by a
discussion of its role in selecting matchers for ensembles.
4.3.1
ADABOOST
Algorithm 1 Boosting
1: Input: S
={
(x 1 ,y 1 ), …, (x m ,y m )
}
, and a space hypotheses
H
.
i m, x i
X
i m, y i {−
1 , +
}
2: /*
1
, and
1
1
*/
3: /* initialization: */
4: for all 1
i m do
5: D 1 (i) = 1 /m
6: end for
7: t = 1
8: repeat
9:
/* training phase: */
Find the classifier h t : X →{−
1 ,
+
1
}
, h t
H
that minimizes the error with respect to the
10:
distribution D t : h t = arg h j min ε j .
if ε t
0 . 5 then
11:
2 ln 1 ε t
Choose α t R . α t =
12:
ε t
D t (i) exp ( α t y i h t (x i ))
Z t
Update D t + 1 (i)
=
where Z t is a normalization factor
13:
t = t +
1
14:
15: end if
16: until t = T or ε t > 0 . 5
17: /* upon arrival of a new instance: */
18: Output the final classifier: H(x) = sign( min (t,T )
k = 1
α k h k (x))
The AdaBoost algorithm template is given in Algorithm 1 . The input to a boosting algorithm
is a set of m examples where each example (x i ,y i ) is a pair of an instance x i and the classification
of the instance mapping, y i . y i typically (though not always) accepts a binary value in
{−
1 ,
+
1 }
,
where
1 stands for a correct classification. Therefore,
the algorithm is aimed at binary classifications. The last input element is a hypothesis space
1 stands for an incorrect classification and
+
H
, a set
of weak classifiers.
 
Search WWH ::




Custom Search