Graphics Reference
In-Depth Information
Presentation of the Raw Data Matrix
15.3.1
he first step in the matrix visualization of continuous data is the production of
a raw data matrix X , and two corresponding proximity matrices for the rows,
R ,andthecolumns,C , which are calculated with user-specified similarity
(or dissimilarity) measures. he three matrices are then projected through suitable
color spectra to construct corresponding matrix maps in which each matrix entry
(raw data or proximity measurement) is represented by a color dot. he let panel in
Fig. . shows the raw data matrix of log -transformed ratios of expressions coded
byabidirectional green-black-red spectrumforDataset ,with Pearson correlations
for between-array relations coded by a bidirectional blue-white-red spectrum, and
Euclidean distances for between-gene relations coded by a unidirectional rainbow
spectrum.
In the raw data matrix map, a red (green) dot in the ijth position of the map
for X means that the ith gene at the jth array is relatively up (down)-regulated.
A black dot stands for a relatively nondifferentially expressed gene/array combina-
tion. Ared(blue)pointin the ijthposition ofthe C matrix maprepresentsapos-
itive(negative)correlationbetweenarrays i and j.Darker(lighter)intensitiesofcolor
standforstrongerabsolutecorrelationcoe cients,whilewhitedotsrepresentnocor-
relations. A blue (red) point in the ijth position of the R matrix map represents
arelatively small (large)distance between genes i and j,whileayellowdotrepresents
a median distance.
Data Transformation
Itmaybenecessarytoapplytransformationssuchaslog,standardization(zeromean,
unit variance), or normalization (normal score transformation) to the raw data be-
fore the data map is constructed or proximity matrices calculated in order to get
a meaningful visual representation of the data structure, or comparable visual effects
between displays. hetransformation-visualization processmayhave to berepeated
several times before the embedded information can be fully explored.
Selection of Proximity Measures
Proximity matrices have two major functions: ( ) to serve as the direct visual rep-
resentation of the relationships among variables and between samples; ( ) to serve
as the medium used to reorder the variables and samples for better visualization of
the three matrix maps. he selection of proximity measures in matrix visualization
plays a more important role than it does in numerical or modeling analyses. Pear-
son correlation oten serves as the measure of proximity between variables, while
Euclidean distance is commonly employed for samples (Fig. . ).For potential non-
linear relationships, Spearman's rank correlation and Kendall's tau coe cient can be
used instead of the Pearson correlation to assess the between-variable relationship,
whilesomenonlinearfeatureextractionmethodssuchastheIsomap(Tenenbaum
et al., ) distance can be used to measure nonlinear between-sample distances.
More sophisticated kernel methods can also be applied when users see the need for
them.
Search WWH ::




Custom Search