Biology Reference
In-Depth Information
Summarization to probe set level aggregates the intensities of multiple probes
that query the same transcript into a single number. This can be achieved by simply
taking the median or the average of the log2-values. However, other approaches try
to combine information from multiple samples to correct for systematic trends of the
different probes that may reflect probe affinity (RMA) or to achieve greater robust-
ness by giving non-equal weights to the different probes (error-weighted mean in the
Rosetta Resolver software, Weng et al. , 2006 ; Tukey's biweight robust mean,
Hubbell et al. , 2002 ). In the context of long isothermal probes, the variations between
probes are more limited and the choice of the aggregation method and its position in
the workflow have correspondingly less impact. While aggregation is generally per-
formed after normalization, for tiling array data with long near-isothermal probes
( Nicolas et al. , 2012 ), the authors preferred to perform normalization as the last step
of the preprocessing workflow. Indeed, in their set-up, the choice of the aggregation
method was less critical than the choice of the normalization method and, depending
on the question, it might be preferable to work with data that have been normalized
using different approaches (see expression vs. differential expression). Being able to
redo the normalization without having to redo the aggregation has a clear practical
advantage. This also makes it easier to add or remove samples.
With respect to quality checks, the first issue is the quality of the array measure-
ments, which depends on the success of the experimental steps from RNA prepara-
tion to scanning that include sample labelling, hybridization and washing. Quality
control (QC) is essential to identify arrays which need to be repeated or might have
to be excluded from the analysis. The QC procedures include visual inspection of the
image to identify spatial patterns indicative of artefacts as well as quantitative anal-
ysis based on sets of quality metrics. In contrast to relative quality metrics such as
signal intensities and distribution, absolute measures rely on control probes as, for
example, spike-in probes and replicate probes ( Kauffmann and Huber, 2010 ). As
an example, the variability across replicate probe measurements facilitates the
assessment of intra-array reproducibility. A dedicated R package “arrayQuality-
Metrics” has been proposed for the quality assessment of microarray data
( Kauffmann et al. , 2009 ). For Agilent microarrays, application-specific QC reports
are generated when applying the Feature Extraction software. Raw data quality of
each array is evaluated using statistical measures largely based on specific control
probes designed for the use with the One-Color or Two-Color RNA Spike-in Kits
(Agilent Technologies). A set of 10 spike-in transcripts is used to assess the linear
dynamic range of the microarray experiment and the reproducibility of replicate
probes. Another part of the QC report details outlier statistics and includes display
of spatial distribution of the outlier probes for detecting potential regional biases or
artefacts ( Figure 6.2 ). In their work on tiling array data, the authors used a simple
statist ic to capture the quality of the signal for each particular sample into a single
summary statistic that consisted of the ratio of the average variance between probes
querying the same protein coding region and the overall variance. A value below 0.1,
interpreted as 10% of noise, was found to be indicative of high quality, whereas
above 0.2,
the data was considered of poor quality. Artefacts that generate
Search WWH ::




Custom Search