Matrix Visualization - Data Visualization

Graphics Reference

In-Depth Information

Presentation of the Raw Data Matrix

15.3.1

he first step in the matrix visualization of continuous data is the production of

a raw data matrix X , and two corresponding proximity matrices for the rows,

R ,andthecolumns,C , which are calculated with user-specified similarity

(or dissimilarity) measures. he three matrices are then projected through suitable

color spectra to construct corresponding matrix maps in which each matrix entry

(raw data or proximity measurement) is represented by a color dot. he let panel in

Fig. . shows the raw data matrix of log -transformed ratios of expressions coded

byabidirectional green-black-red spectrumforDataset ,with Pearson correlations

for between-array relations coded by a bidirectional blue-white-red spectrum, and

Euclidean distances for between-gene relations coded by a unidirectional rainbow

spectrum.

In the raw data matrix map, a red (green) dot in the ijth position of the map

for X means that the ith gene at the jth array is relatively up (down)-regulated.

A black dot stands for a relatively nondifferentially expressed gene/array combina-

tion. Ared(blue)pointin the ijthposition ofthe C matrix maprepresentsapos-

itive(negative)correlationbetweenarrays i and j.Darker(lighter)intensitiesofcolor

standforstrongerabsolutecorrelationcoe cients,whilewhitedotsrepresentnocor-

relations. A blue (red) point in the ijth position of the R matrix map represents

arelatively small (large)distance between genes i and j,whileayellowdotrepresents

a median distance.

Data Transformation

Itmaybenecessarytoapplytransformationssuchaslog,standardization(zeromean,

unit variance), or normalization (normal score transformation) to the raw data be-

fore the data map is constructed or proximity matrices calculated in order to get

a meaningful visual representation of the data structure, or comparable visual effects

between displays. hetransformation-visualization processmayhave to berepeated

several times before the embedded information can be fully explored.

Selection of Proximity Measures

Proximity matrices have two major functions: ( ) to serve as the direct visual rep-

resentation of the relationships among variables and between samples; ( ) to serve

as the medium used to reorder the variables and samples for better visualization of

the three matrix maps. he selection of proximity measures in matrix visualization

plays a more important role than it does in numerical or modeling analyses. Pear-

son correlation oten serves as the measure of proximity between variables, while

Euclidean distance is commonly employed for samples (Fig. . ).For potential non-

linear relationships, Spearman's rank correlation and Kendall's tau coe cient can be

used instead of the Pearson correlation to assess the between-variable relationship,

whilesomenonlinearfeatureextractionmethodssuchastheIsomap(Tenenbaum

et al., ) distance can be used to measure nonlinear between-sample distances.

More sophisticated kernel methods can also be applied when users see the need for

them.

Data Visualization

Search WWH ::

Custom Search

Home