Biology Reference
In-Depth Information
However, upon calculating s
one finds that the
conditional distribution function P (dq|q 0 ) as a function of the rescaled
noise distribution is roughly independent of q 0 for this
data as well. Thus, one can follow the procedure outlined above to
determine p -values using this one-zero null hypothesis data set.
The two-zero hypothesis (case 3) is formulated from data for which
one of the biological replicates shows zero counts for a given signature
in both sequencing replicates, while the other biological replicate
shows at least one nonzero measurement for the signature. The proba-
bility distribution of the aggregate tpm values of the nonzero replicate
measurements is plotted (figure 4.13c) and the significance region for a
particular p -value is defined as the area under the high signal tail of the
distribution whose ratio to the area under the entire curve equals the
desired p -value.
Figure 4.11d shows a plot of the fraction of points left out of the
delimiting curves given by as a function of the p -value
(|dq¢| depends on the p -value). The curve with solid diamonds corre-
sponds to the subset of signatures for which the nonzero hypothesis
applies. The open diamonds consider all the measured signatures
and the corresponding null hypotheses. Both the nonzero hypothesis
and the all-hypothesis curves show that the fraction of points left
out of the delimiting curves is very well estimated by the p -value cal-
culation over four orders of magnitude. The precipitous drop-off of
the curves at the small p -value range is due to the two outliers indicated
with arrows in figure 4.12.
The above p -value formalism has been developed in the context
of binary comparisons (typically case/control studies). Using the
same formalism it is also possible to estimate error bars for the log-tpm
for a given signature. To do this, simply notice that if the log-tpm
value of a signature yields a value q, then [q
(
q 0 )
(figure 4.13b)
,
δθ θσθ
K
/( )
0
δθ
=
δθ
(
θ
)
0
2.13s (q)]
is an estimate of the 95% confidence interval. In other words, the prob-
ability that a subsequent measurement of that signature falls outside
that interval is smaller than 0.05. This confidence interval interpreta-
tion is especially useful when data from only a single MPSS run of
a given condition is extant, and error bars need to be assigned to these
measurements. The calculation of a 95% confidence interval requires
an estimate of s (q
2.13s (q), q
+
). When replicate measurements are not available,
0
s (q
) can be estimated from studies such as the one presented in this
chapter. Computational tools to analyze MPSS data for confidence
intervals, as well as p -values in case/control measurements and time
traces, can be obtained at the website www.research.ibm.com/FunGen.
Finally, note that the method to determine the statistical signifi-
cance for binary MPSS measurements is essentially the same as the
USE-fold method introduced in ref. [12], and presented earlier in this
chapter in the context of DNA microarray analysis.
0
Search WWH ::




Custom Search