Database Reference
In-Depth Information
3.3
gPLS: A Partial Least Squares Regression Approach
Saigo et al. [
34
] proposed
gPLS
, an iterative mining method based on partial least
squares regression (PLS). To apply PLS to graph data, a sparse version of PLS is
developed first and then it is combined with a weighted pattern mining algorithm.
The mining algorithm is iteratively called with different weight vectors, creating one
latent component per one mining call. Branch-and-bound search is integrated into
graph mining with a designed gain function and a pruning condition. In this sense,
gPLS
is very similar to the branch-and-bound mining approach in
gboost
.
Partial Least Squares Regression
This part is a brief introduction to partial least
squares regression (PLS). Assume there are
n
training examples (
x
1
,
y
1
),
...
,(
x
n
,
y
n
).
The output
y
i
is assumed to be centralized
i
y
i
=
0. Denote by
X
the design matrix,
where each row corresponds to
x
i
. The regression function of PLS is
m
α
i
w
i
x
,
f
(
x
)
=
i
=
1
where
m
is the pre-specified number of components that form a subset of the original
space, and
w
i
are weight vectors that reduce the dimensionality of
x
, satisfying the
following orthogonality condition,
1
i
=
j
)
w
i
X
T
Xw
j
=
j
)
.
0
i
=
Basically
w
i
are learned in a greedy way first, then the coefficients
α
i
are obtained
by least squares regression without any regularization. The solutions to
α
i
and
w
i
are
n
y
k
w
i
x
k
,
α
i
=
(13.5)
k
=
1
and
k
=
1
y
k
w
T
x
k
2
w
T
w
w
i
=
arg max
w
,
subject to
w
T
X
T
Xw
1,
w
T
X
T
Xw
j
1.
Next we present an alternative derivation of PLS called
non-deflation sparse
PLS
. Define the
i
th latent component as
t
i
=
=
0,
j
=
1,
...
,
i
−
=
Xw
i
and
T
i
−
1
as the matrix of latent
components obtained so far,
T
i
−
1
=
(
t
1
,
...
,
t
i
−
1
). The residual vector is computed by
r
i
=
I
−
T
i
−
1
T
i
−
1
y.
Then multiply it with
X
T
to obtain
η
X
T
I
−
T
i
−
1
T
i
−
1
y.
1
v
=