High-Throughput Analysis of Microdissected Tissue Samples - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

the FDR at 10% would allow for 10% of declared differentially expressed genes

to be falsely identified.

11.6.5 Microarray Analysis

One of the major areas of focus among the bioinformatics community is the anal-

ysis of microarray (chip) data, where thousands of data points are generated in a

single experiment either for gene mRNA (oligonucleotide expression array as an

example) or DNA deletion studies (SNP chips as an example) [114]. Although a

thorough discussion of this topic is beyond the scope of this chapter, it is impor-

tant to raise some important issues. In general, there are two types of analytic

problems: (1) class comparison and (2) class prediction.

Class comparison involves comparing the high dimensional gene expression

profiles across groups or conditions (for example, high versus low grade tumors,

responsive versus resistant diseases, normal versus tumor cells, stroma versus ep-

ithelium cells). One approach is to compare genes one by one in order to identify

overexpressed mRNAs. Here, the problems of multiple comparisons that were dis-

cussed above for qRT-PCR are magnified since thousands of genes are compared

across groups rather than only dozens of genes. Therefore, it becomes especially

essential to control the probability of a false positive result using multiple compar-

isons procedures such as controlling the false discovery rate (FDR). One approach

that is commonly used is to do all testing at a 0.001 significance level (i.e., a gene

is differentially expressed when the p-value

0.001 using a t-test). A gene list is

then constructed based on the statistically significant differences between genes,

and a false discovery rate can be computed. An overall test of whether the pat-

terns in gene expression are different between groups can be constructed using a

permutation test. Specifically, we can scramble the class labels (group identifiers)

and redo the analysis many times (say, 5,000). The p-value for a test of whether

the gene expression profiles are different across groups can be computed as the

proportion of times the number of significant genes is above the number of ''signif-

icant'' genes in the scrambled datasets. Visually, these gene expression patterns can

be compared across groups by multidimensional scaling (MDS). MDS compresses

differences between sample expression profiles into three eigenvectors for plotting

in three-dimensional space.

Class prediction involves developing a predictor of disease outcome from

high-dimensional microarray data. There are many methods for developing a

class predictor including discriminate analysis, logistic regression, neural network

methodology, and classification trees (see [114] for a comparison of approaches).

It is particularly important to emphasize that any predictive model needs to be

validated on a completely independent dataset. Validating a predictive model on

the same dataset for which the model was developed can result in over-fitting and

an overoptimistic assessment of the quality of the predictive model. One approach

that is commonly used is to split the dataset into a training set in which the predic-

tive model is developed and a test-set in which the predictive model is validated.

This can be done by splitting the data in two and fitting the predictive models

in the first half of the data and evaluating the accuracy of the predictions in the

second half.

<

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home