Biology Reference
In-Depth Information
monitoring for batch effects;
appropriate application of replicated cross-validation including nesting of feature
selection;
application of a predictive single model to an independent test set;
comparison of the final predictive model to a model derived from available clinical
covariates [20,28-31] .
The minimum prerequisites for predictive modeling include a matrix of expression values
paired with (typically) dichotomized class labels, and are therefore not necessarily depend-
ent on the origin of the expression values be it from mRNAs or miRNAs. The same could
also be said for digital gene expression values derived from RNA-Seq (or microRNA-Seq)
with qualifiers for regions with insufficient coverage. That is, biomarkers with insufficient
coverage by RNA-Seq or low expression by microarray should be filtered from biomarker
discovery efforts. Although the algorithms for classification are not as dependent on normal-
ized expression data from different platforms, the tools for differential expression analysis
are more technology specific. For example, the limma software package [32] , among the most
popular packages for performing differential expression analysis on microarray platforms,
inspired concepts in edgeR [33] , an analogous procedure for moderating mRNA (or miRNA)
specific variance for digital gene expression.
Biomarker discovery is a painstakingly incremental process with numerous challenges
[34] . The general aspects of bias and experimental design are critical considerations inde-
pendent of the biomarker discovery platforms although greater precedent exists in the
microarray literature [35] . Consistent implementation of valid methodology is essential for
reproducible research, but it is often challenged by negative results [36] . Guidelines estab-
lished from QUADAS and STARD can serve as useful check points along the path of discov-
ery to clinical validity and utility [37,38] .
5.2.2 Migration to a Clinical Platform
When selecting biomarkers for clinical validation, it is important to consider the changes
in model performance that accompany model and signature migration to a platform with
inherently different detection properties. Although the literature contains numerous exam-
ples of model stability across microarray platforms [39-43] , a very common problem is
migrating a model trained on microarray expression values (typically observed in the log
space) to accepting inputs based on RT-qPCR data [typically observed as Ct (cycle threshold)
values]. Typical solutions include mean centering of inputs across samples, but that can be
complicated when samples are evaluated individually and prospectively. Another includes
normalization of the microarray and RT-qPCR data in a gene-specific manner so that inputs
are always as log ratios. Setting aside the platform specific interpretations of underlying ana-
lyte concentration, noise is inherently introduced by the process of model migration. This
noise introduced by the process of platform migration will affect the number of samples
needed for a successful biomarker discovery study and, potentially, estimates of predictive
performance. Some work has focused on gene-specific behavior as opposed to model-specific
behavior during platform migration [44-47] , although these studies did not address how the
lack of correlation between the platforms can affect the sample size estimates for biomarker
Search WWH ::




Custom Search