Graphics Reference
In-Depth Information
ular and fairly successful nonlinear embedding technique is the Isomap algorithm
(Tenenbaum et al., ). he main algorithm computes for each point in the data
seta K-nearest neighbor graph and then stitches them together in the adjacency ma-
trix. It then calculates distances using the resulting graph and then applies MDS. he
main idea in the first step of the construction is to capture well the local geometry of
the data. An illustration of the idea based on the Swiss Roll data set is shown next.
Specifically, random points lying on a roll have been generated and their Eu-
clidean pairwise distances computed. In addition, a graph that connects data points
onlywiththeirclosest neighborsintheEuclideanmetricwascomputedandshort-
estpath distances calculated. Subsequently, MDSwasappliedtoboth distance matri-
ces and a -D embedding obtained. he coloring scheme shows that straightforward
MDS does not capture the underlying geometry of the roll (since the points do not
follow their progression on the roll, blue, cyan, green, etc.), whereas the first dimen-
sion using the Isomap algorithm recovers the underlying structure. he hole in the
middle is mostly due to the low density of orange points.
The Pulling Under Constraints Model
4.3.3
In this model, the similarity of the nodes is important. In the case of a simple graph,
only connections between nodes are taken into consideration, whereas in aweighted
graph, edges with large weights play a more prominent role. However, the normal-
ization constraint, as discussed in Sect. , pushes points apart and avoids the triv-
ial solution of all points collapsing to the origin. his model, under various dis-
tance functions, has been studied in a series of papers by Michailidis and de Leeuw
( , , ). We examine next the case of squared Euclidean distances, where
ϕ
d ij
,whichturns outtobeparticularly interesting fromadata visu-
alization point ofview. Somealgebra showsthat the objective function can bewritten
in the following matrix algebra form:
(
d
(
X
))
(
X
)
X LX
Q
(
X
A
)=
trace
(
)
( . )
whereL
AisthegraphLaplacian(Chung, ),with D beingadiagonalmatrix
containing therowsumsoftheadjacency matrix A.Itcanbeseen thatbyminimizing
( . ),nodessharingmanyconnections wouldbepulledtogether,whereasnodeswith
few connections would end up on the periphery of the layout. For a weighted graph,
the larger the weights, the stronger the bond between nodes and hence the more
pronounced the clustering pattern.
A normalization constraint that leads to a computationally easy-to-solve problem
is X DX
=
D
I p . Some routine calculations show that minimizing ( . ) subject to this
constraint corresponds to solving a generalized eigenvalue problem. Further, notice
that the solution is not orthogonal in Euclidean space, but in weighted (by D)Eu-
clidean space. Figure . shows the graph layout for the small protein interactions
network shown in Fig. . . It can be seen that proteins PIB and BET that have very
few interactions are located on the periphery of the layout. Moreover, the 'hub' pro-
=
Search WWH ::




Custom Search