Graphics Reference
In-Depth Information
Figure . . Schematic illustration of the relative e ciencies of the scatterplot matrix, the parallel
coordinates plot, and matrix visualization, for varying numbers of dimensions
combinationsforthemissingstructurecanbeaccessedvisually.MVuserscanbenefit
fromasimplevisual perception ofthe mechanismassociated with themissing obser-
vations (random or not, ignorable orunignorable) before formal statistical modeling
of the missing values is implemented.
Matrix Visualization of Binary Data
15.7
While scatterplots, PCP, and MV displays have their own advantages and disadvan-
tages forcontinuous data structures ofvarious dimensions, anMVdisplayistheonly
statistical graph that can meaningfully display binary data sets over all dimensions.
We use the KEGG (Kyoto Encyclopedia of Genes and Genomes) metabolism path-
ways (http://www.genome.jp/kegg/pathway.html) for Saccharomyces cerevisiae yeast
toillustrate howanMVdisplaycanbegeneralized tovisually extractalloftheimpor-
tant information embedded in multivariate binary data. heKEGG website provides
detailed information on the related genes involved in metabolism pathways
in Saccharomyces cerevisiae yeast. We simplified the complex information structure
down to a two-way binary data matrix of genes by pathways. his binary
data matrix is called Dataset in our study. A one (zero) encoded at the ith row and
jth column of the matrix means that the ith gene is (not) involved in jth pathway
activities.
Similarity Measure for Binary Data
15.7.1
he usual measures used to evaluate associations between samples and variables for
continuous data-Euclideandistanceandcorrelation coe cients -cannotbeapplied
directly to binary data sets. Two issues are noted here in relation to the selection of
similarity measures for binary data in an MV display.
Search WWH ::




Custom Search