Biomedical Engineering Reference
In-Depth Information
through the window of transcriptional activity (see also this volume, Part III,
chapter 2.1, by Huang, Sultan, and Ingber). As is the case with other nascent
high-throughput technologies (such as protein arrays, single nucleotide poly-
morphism profiling) the completion of a DNA microarray experiment requires a
concerted effort between the data producers and the data analysts. In this sense,
both the wet lab protocols as well as the methods used in the analysis of the en-
suing data are two faces of the same coin that enable this emerging technology
reach its full potential.
Gene expression array experiments present us with vast amounts of data
containing substantial biological information. In order to obtain the most from
these data, a considerable variety of approaches for statistical and algorithmic
analyses have been developed. Each one of these analyses, if sufficiently differ-
ent, can provide important biological clues. In this sense, gene expression data
have become a paradise for statisticians. Evidence of this is the exponential
growth in the number of publications on gene expression analysis during the last
few years (1). Many useful resources for gene expression data, links, and litera-
ture have flourished on the internet (see, e.g. (2-6) and the links therein) to sup-
port the growing needs of the field.
There are many types of questions that can be explored with microarray
experiments. Some of the common themes in DNA array data analysis—
including gene selection, clustering of similarly expressing genes, class predic-
tion, and pathway inference—have recently been reviewed in (1,7). In this chap-
ter we concentrate on supervised statistically based methods to identify genes
that show differences between two classes of tissues, a problem known as gene
or feature selection. Among these methods, we shall discuss univariate and mul-
tivariate techniques for gene selection. In the former, genes are selected on the
basis of their individual merits to separate between two or more classes of tis-
sues (typically cases and controls). In the latter we deal with the differential be-
havior of groups of genes in distinct tissue types. Both types of analysis yield
important information and the possibility of rich interpretations. For example, a
gene that is identified by univariate methods as strongly transcribed in cancer
patients compared to control subjects could be a good candidate for an onco-
gene, or the result of chromosomal instabilities that resulted in more copies of
that gene. A multivariate methodology, instead, can reveal subtler changes of
ensembles of genes that work in coordination, such as may be the case in a
pathway that is deregulated in the transformed cells. Multivariate analyses are
better aligned with a complex-systems perspective of the data, in that we explore
interactions between genes rather than genes in isolation. After all, it is the co-
ordinated activity of interacting genes that gives the cellular environment its
complex behaviors.
Once we have selected genes that express differentially in cases and con-
trols, there remains the need to establish the validity of the results. Two methods
are typically used to provide some degree of validation for the selected genes. In
Search WWH ::




Custom Search