Database Reference
In-Depth Information
The permutation matrix then turns out to be
0010
0100
0001
1000
P =
.
(3.9)
The permuted similarity matrix S and the corresponding label vector λ and
data matrix X are:
S = PSP = P λ, X = PX .
(3.10)
So in our example
1
2
2
3
λ =
is now serialized (through row swapping) and the corresponding similarity
matrix S contains the entries shown in
Clusion
(through row and column
swapping).
For a “good” clustering algorithm and k
n, this is related to sparse
matrix reordering, for this results in the generation of a “banded matrix”
where high entries should all fall near the diagonal line from the upper left
to the lower right of the matrix. Because Eq. 3.10 is essentially a partial
ordering operation we also refer to it as coarse seriation, a phrase used in
disciplines such as anthropology and archaeology to describe the reordering
of the primary data matrix so that similar structures (e.g., genetic sequences)
are brought closer [3.20], [3.21].
3.4.2 Visualization
The seriation of the similarity matrix, S , is very useful for visualization. Be-
cause the similarity matrix is two dimensional, it can be readily visualized
as a gray-level image where a white (black) pixel corresponds to minimal
(maximal) similarity of 0 (1). The darkness (gray-level value) of the pixel
at row a and column b increases with the similarity between the samples
x a and x b . When looking at the image it is useful to consider the similar-
ity s as a random variable taking values from 0 to 1. The similarity within
cluster is thus represented by the average intensity within a square region
with side length n around the main diagonal of the matrix. The off-diagonal
rectangular areas visualize the relationships between clusters. The brightness
distribution in the rectangular areas yields insight toward the quality of the
clustering and possible improvements. To make these regions apparent, thin
Search WWH ::




Custom Search