Information Technology Reference
In-Depth Information
A
≈
WH
,
where
W
=(
w
ia
)
n
×
k
are the reduced
k
(
k
≤
m
) basis vectors (factors), and
H
=
(
h
aj
)
k
×
m
contains the coefficients of the linear combinations of the basis vectors
(encoding vectors). All matrices
A
H
are nonnegative and the columns of
W
are
normalized. Thus, the entry
a
ij
can be expressed as
,
W
,
k
a
=
1
w
ia
h
aj
.
a
ij
≈
(
WH
)
ij
=
Based on Poisson likelihood, the objective function of this factorization is to mini-
mize the divergence function, i.e.,
a
ij
log
)
ij
n
i
=
1
m
j
=
1
a
ij
min
D
(
A
,
WH
)=
)
ij
−
a
ij
+(
WH
.
(
WH
The solution to this objective function of finding
W
,
H
uses an iterative algorithm
with random number initialization [8].
The nsNMF method, which will [8] “produce more compact and localized fea-
ture representation of the data than standard NMF” of finding sparse structures in
data matrix, is an improvement of NMF. The nsNMF method introduces a smooth
distribution of the factors to get sparseness, and the decomposition of data matrix
A
is
A
≈
WSH
,
ee
T
where the matrix
S
k
is a positive smothness matrix,
I
is the iden-
tity matrix,
e
is a row vector of
k
1s, and
=(
1
−
θ
)
I
+
θ
θ
controls the sparseness of the model,
satisfying 0
≤
θ
≤
1. And now the objective function for nsNMF method is
a
ij
log
)
ij
n
i
=
1
m
j
=
1
a
ij
min
D
(
A
,
WSH
)=
)
ij
−
a
ij
+(
WSH
.
(
WSH
1, the vector
SX
(
X
is a
positive nonzero vector) tends to the constant with all elements almost equal to the
average of the elements of
X
and all entries are equal to the same nonzero value,
which is the smoothest possible vector, in the sense of “nonsparseness.” The al-
gorithm to solve this objective function can be done as the same way of previous
function with small changes [8].
Bimax.
Prelic et al. [44] presented a fast-and-conquer approach, binary inclusion-
maximal biclustering algorithm (Bimax). This algorithm assumes that the data ma-
trix
A
is binary with
a
ij
∈{
When
θ
=
0, the nsNMF backs to NMF; when
θ
→
0
,
1
}
where an entry 1 means feature
j
is important in
sample
i
.
In this algorithm, a named inclusion-maximal bicluster is defined to be
B
k
=
(
S
k
,F
k
)
such that
a
ij
=
1 for any
i
∈S
k
,
j
∈F
k
, and there does not exist another