Biology Reference
In-Depth Information
of the true outcome of interest. The exact yardstick for evaluating
the performance of a biomarker varies on the basis of the intended
use. Standards have been proposed for designing and reporting the
results of studies evaluating the performance of biomarkers for
diagnosis ( 1 ) and for prognosis ( 2 ). In general, the performance
of biomarkers is seldom as good in a second sample as in the sam-
ple in which they were initially assessed. Consequently, it has been
proposed that biomarker discovery studies should be organized in
two experiments: (a) an initial discovery experiment (in our case a
DIGE analysis) which identifi es potential biomarkers in a training
set of patient samples and (b) a validation experiment which deter-
mines the performance of these markers in an independent valida-
tion set. Such an approach results then in a list of validated
biomarkers. These guidelines have been generally accepted in the
intervening years and potential biomarkers without validation are
regarded today as insuffi ciently characterized for publication ( 3 ).
Here we try to provide an outline for the study design of a DIGE-
based biomarker discovery study including both the discovery
experiment and the validation experiment.
The discovery experiment can be considered a sophisticated
comparative analysis of many (approximately 500-1,500) parame-
ters in parallel. However, the task of selecting useful biomarkers
from a very large number of candidate proteins is a diffi cult exercise.
Generally, statistical tests are applied for each of the candidate
proteins, and the signifi cant proteins are considered as potential
biomarkers. However, if a large number of statistical tests are per-
formed simultaneously, the problem of multiple testing arises.
Consider fi rst as an example a clinical study, where expression values
of only one single protein are compared between two groups (e.g.,
between responder and nonresponder to a specifi c therapy), i.e.,
only one statistical hypothesis is tested. In this case, we may calculate
a two-sided two-sample t test at a prespecifi ed level a for the type
1 error (usually, this so-called signifi cance level is set to 0.05; see
also Table 1 for a list of statistical terms). Thus, we consider the
expression values to be differentially expressed between the two
groups if the standard p value of the t test is smaller than the
significance level. Here a is the probability of identifying the
protein to be differentially expressed if, in truth, the protein is not
related with the outcome (and we therefore make a type 1 error,
i.e., a false-positive decision).
Multiple testing refers to testing of more than one hypothesis
(protein) at the same time. In proteomic studies, hundreds of pro-
teins can be investigated simultaneously. Now, using for each of the
proteins a standard t test with signifi cance level 0.05, the probability
of declaring at least one protein as differentially expressed, which in
truth, is not related with the clinical outcome, will greatly exceed
the prespecifi ed level 0.05. The expected number of false-positive
decisions is increasing with an increasing number of investigated
Search WWH ::




Custom Search