Database Reference
In-Depth Information
L
1
2 L
=
arg
min
t F , y ∈{± 1 }
(1
y i h t , y ( x i )),
(13.1)
i
=
1
F = i = 1 {
F
where
is a set of candidate graphs or a feature set ( i.e. ,
t
|
t
x i }
) and
·
I (
) is the indicator function. The gain function for a rule
t , y
is defined as
L
=
gain (
t , y
)
y i h t , y ( x i ) .
(13.2)
i
=
1
Using the gain, the search problem in Eq. ( 13.1 ) becomes equivalent to the
problem:
t ,
y
ˆ
=
arg max t F , y ∈{± 1 } gain (
t , y
). Then the gain function is used
instead of error rate.
gboost applies AdaBoost [ 17 ] by repeatedly calling the decision stumps and
finally produces a hypothesis f , which is a linear combination of K hypotheses pro-
duced by the decision stumps f ( x )
sgn k = 1 α k h t k , y k ( x ) . In the k th iteration,
a decision stump is built with weights d ( k )
=
d ( k )
1
, ... , d ( k L on the training data,
=
where i = 1 d ( k )
1, d ( k )
i
0. The weights are calculated to concentrate more
on hard examples than easy ones. In the boosting framework, the gain function is
redefined as
=
i
L
gain (
t , y
)
=
y i d i h t , y ( x i ) .
(13.3)
i = 1
A Branch-and-Bound Search Approach According to the gain function in
Eq. ( 13.3 ), the problem of finding the optimal rule
t ,
y
from the training dataset is
defined as follows.
Problem 1 [Find Optimal Rule] Let T
={
x 1 , y 1 , d 1
, ... ,
x L , y L , d L }
be a training
data set where x i is a labeled graph, y i ∈{±
1
}
is a class label associated with x i and
d i ( i = 1 d i
=
1, d i
0) is a normalized weight assigned to x i .Given T , find the
t ,
t ,
optimal rule
y
ˆ
that maximizes the gain, i.e. ,
y
ˆ
=
arg max t F , y ∈{± 1 } y i d i h
,
t , y
F = i = 1 { t | t
where
.
A naive method is to enumerate all subgraphs
x i }
and then calculate the gains for
all subgraphs. However, this method is impractical since the number of subgraphs is
exponential to their size. To avoid such exhaustive enumeration, the method to find
the optimal rule is modeled as a branch-and-bound algorithm based on the upper
bound of the gain function which is defined as follows.
Lemma 13.10 (Upper bound of the gain) For any t
F
t and y
∈{±
1
}
, the gain of
t , y
t , y
is bounded by μ ( t ) (i.e., gain (
)
μ ( t ) ), where μ ( t ) is given by
2
{ i | y i =+ 1, t x i }
L
d i ,2
{ i | y i =− 1, t x i }
L
.
μ ( t )
=
max
d i
y i ·
d i +
y i ·
d i
i = 1
i = 1
(13.4)
Search WWH ::




Custom Search