Biology Reference
In-Depth Information
problem to an image provides not only a standardization or an aggrega-
tion of data, but also a way of seeing through the messiness (this was
often the way my informants described it) of biology to the hidden pat-
terns in the data.
One of the most common images seen in contemporary biology pa-
pers is the heat map. 44 A typical heat map, like the one shown in fi g-
ure 6.5, shows the correspondence between phenotypic or disease data
from patients and the levels of particular proteins reported in bioassays
of the same patients' cells. Here, the labels across the top correspond
to cell samples taken from patients with cancer, while the labels on the
left report the levels of particular proteins within their cells. Squares are
shaded lighter where the level of a protein is elevated above normal in
a particular patient, darker where the level is depressed, and black if
there is no measurable difference. Both the patients and the proteins are
ordered using a “clustering algorithm”—software that takes a multi-
dimensional set of measurements and clusters them into a tree (shown
at the far top and far left of the heat map) according to how similar
they are to one another. 45 Heat maps summarize a vast amount of in-
formation; their convenience and popularity are due to the fact that
they provide a quick visual representation of multidimensional data.
Figure 6.5, for instance, presents the results of at least 538 (17
×
34)
distinct measurements. 46
Heat maps are often used for DNA microarray data, where they pro-
vide the possibility of simultaneously displaying the effects of multiple
conditions (drugs, cell types, environmental conditions) on many genes
(all the genes spotted onto the array). Indeed, heat maps have become an
integral part of the presentation and understanding of microarray data.
In one of their early papers, the developers of microarrays discussed the
problems raised by the large data volumes produced by the arrays:
Although various clustering methods can usefully organize ta-
bles of gene expression measurements, the resulting ordered but
still massive collection of numbers remains diffi cult to assimi-
late. Therefore, we always combine clustering methods with a
graphical representation of the primary data by representing
each data point with a color that quantitatively and qualita-
tively refl ects the original experimental observations. The end
product is a representation of complex gene expression data
that, through statistical organization and graphical display, al-
lows biologists to assimilate and explore the data in a natural
intuitive manner. 47
Search WWH ::




Custom Search