Biology Reference
In-Depth Information
Statistical Issues on the Microarray Data Analysis
To determine the embryo sac transcriptome, we used coa and wild-type pistil
samples (late 11 to late 12 floral stages [39]) in three biologic replicates, and fol-
lowed the Affymetrix standard procedures from cRNA synthesis to hybridization
on the chip. Finally, raw microarray data from the coa and wild-type samples in
triplicate were retrieved after scanning the Arabidopsis ATH1 'whole genome'
chips, which represent 24,000 annotated genes, and they were subjected to statis-
tical analyses. The normalized data were examined for their quality using cluster
analysis [40]. There was strong positive correlation between samples within the
three replicates of wild-type and coa (Pearson coefficients: r = 0.967 for for wild-
type and r = 0.973 for coa). Therefore, the data were considered to be of good
quality for further analyses. It was necessary to ensure that the arrays of both the
wild type and coa did not differ in RNA quality and hybridization efficiency. The
hybridization signal intensities of internal control gene probes were not signifi-
cantly altered across the analysed arrays, hence assuring the reliability of the results
(data not shown). The quality of data for the spl mutant and wild-type microarray
was described previously [34]. Subsequently, differentially expressed genes were
identified using three independent microarray data analysis software packages.
To identify genes that are expressed in the female gametophyte, we subtracted
the transcriptomes of coa or spl from the corresponding wild type. Genes that
were identified as being upregulated in wild-type gynoecia are candidates for
female gametophytic expression, and genes highly expressed in coa and spl are
probable candidates for gain-of-expression in the sporophyte of these mutants.
However, this comparison was not straightforward because we were not in a posi-
tion to compare the mere four cell types of the mature embryo sac with the same
number of sporophytic cells. Whether using whole pistils or isolated ovules, a
large excess of sporophytic cells surrounds the embryo sac. The contaminating
cells originate from the ovule tissues such as endothelium, integuments and fu-
niculus, or those surrounding the ovules such as stigma, style, transmitting tract,
placenta, carpel wall and replum. Therefore, we anticipated that the transcript
subtraction for embryo sac expression would suffer from high experimental noise.
We examined the log transformed data points from the coa and spl datasets (with
their corresponding wild-type data) in volcano plots. This procedure allows us to
visualize the trade-offs between the fold change and the statistical significance. As
we anticipated, the data points from the sporophytic gain outnumbered the em-
bryo sac transcriptome data points on a high-stringency scale (data not shown).
This problem of dilution in our data for embryo sac gene discovery was more
pronounced in the coa dataset than that of spl, because we did not dissect out
the ovules from the carpel. Therefore, we made the following decisions in analyz-
ing the gametophytic data: to use advanced statistical packages that use different
Search WWH ::




Custom Search