Data Visualization Through Their Graph Representations - Data Visualization

Graphics Reference

In-Depth Information

ular and fairly successful nonlinear embedding technique is the Isomap algorithm

(Tenenbaum et al., ). he main algorithm computes for each point in the data

seta K-nearest neighbor graph and then stitches them together in the adjacency ma-

trix. It then calculates distances using the resulting graph and then applies MDS. he

main idea in the first step of the construction is to capture well the local geometry of

the data. An illustration of the idea based on the Swiss Roll data set is shown next.

Specifically, random points lying on a roll have been generated and their Eu-

clidean pairwise distances computed. In addition, a graph that connects data points

onlywiththeirclosest neighborsintheEuclideanmetricwascomputedandshort-

estpath distances calculated. Subsequently, MDSwasappliedtoboth distance matri-

ces and a -D embedding obtained. he coloring scheme shows that straightforward

MDS does not capture the underlying geometry of the roll (since the points do not

follow their progression on the roll, blue, cyan, green, etc.), whereas the first dimen-

sion using the Isomap algorithm recovers the underlying structure. he hole in the

middle is mostly due to the low density of orange points.

The Pulling Under Constraints Model

4.3.3

In this model, the similarity of the nodes is important. In the case of a simple graph,

only connections between nodes are taken into consideration, whereas in aweighted

graph, edges with large weights play a more prominent role. However, the normal-

ization constraint, as discussed in Sect. , pushes points apart and avoids the triv-

ial solution of all points collapsing to the origin. his model, under various dis-

tance functions, has been studied in a series of papers by Michailidis and de Leeuw

( , , ). We examine next the case of squared Euclidean distances, where

ϕ

d ij

,whichturns outtobeparticularly interesting fromadata visu-

alization point ofview. Somealgebra showsthat the objective function can bewritten

in the following matrix algebra form:

(

d

(

X

))

(

X

)

X ′ LX

Q

(

X

A

)=

trace

(

)

( . )

whereL

AisthegraphLaplacian(Chung, ),with D beingadiagonalmatrix

containing therowsumsoftheadjacency matrix A.Itcanbeseen thatbyminimizing

( . ),nodessharingmanyconnections wouldbepulledtogether,whereasnodeswith

few connections would end up on the periphery of the layout. For a weighted graph,

the larger the weights, the stronger the bond between nodes and hence the more

pronounced the clustering pattern.

A normalization constraint that leads to a computationally easy-to-solve problem

is X ′ DX

=

D

−

I p . Some routine calculations show that minimizing ( . ) subject to this

constraint corresponds to solving a generalized eigenvalue problem. Further, notice

that the solution is not orthogonal in Euclidean space, but in weighted (by D)Eu-

clidean space. Figure . shows the graph layout for the small protein interactions

network shown in Fig. . . It can be seen that proteins PIB and BET that have very

few interactions are located on the periphery of the layout. Moreover, the 'hub' pro-

=

Data Visualization

Search WWH ::

Custom Search

Home