Biology Reference
In-Depth Information
ignored (i.e., one takes the maximum of the two runs). If the two runs
in B1 measure zero and one of the runs in B2 is nonzero, then the
nonzero signature counts are distributed as in figure 4.10c (open and
solid squares). This latter distribution can be used to determine the null
hypothesis when, for a given signature, the two MPSS runs comprising
one of the replicates both yield values of zero.
THREE NULL HYPOTHESES ARE REQUIRED FOR BINARY COMPARISONS
To determine the biological significance of changes in tpm value
observed for different signatures in the LPS-activated macrophage
data, it is first necessary to formulate null hypotheses using biologi-
cal replicates. This is accomplished using the biological replicate data
from ref. [13] (where each biological replicate data set is composed of
two sequencing replicates). Each signature that was measured at
least once in a pair of biological replicates yields two aggregate (i.e.,
determined from two or more sequencing replicates) tpm values, t 1
and t 2 . Three possibilities can arise in these measurements:
(1) none of the counts (n i 's) used to compute t 1 and t 2 were zero; (2) at
least one of the counts was zero, but neither t 1 nor t 2 are zero; (3) either
t 1 =
0. As shown above, the statistics char-
acterizing the expression of signatures when measurements of zero
counts are observed are fundamentally different from those resultant
when no zeros are observed. Thus, it is necessary to formulate three
distinct, conditional hypotheses—one for each of the three conditions
above. That is, given two samples and their respective MPSS meas-
urements, one inspects the pattern of zeros obtained in the different
sequencing replicas and uses the appropriate null hypothesis on a
signature by signature basis.
The formulation of the null hypothesis is begun for signatures with
no zero count measurements (case 1) by plotting, in figure 4.11a,
all (q i,j ,q i,j
0 and t 2 >
0 or t 2 =
0 and t 1 >
0 (where
each q is the log of an aggregate tpm count) for all signatures i that
have nonzero tpm values in all replicate MPSS runs. Also plotted are
equivalent points for which j and j' are biological replicates taken at
t
) where j and j' are biological replicates taken at t
=
4 h. The standard deviation between measurements as a function
of expression level s
=
(
q 0 )
is calculated following the methods discussed
above. A plot of s
(
q 0 )
derived from this data (along with a fit of the
calculated values of s
q 0 ) to an exponential) is shown in figure 4.11b.
For a given q 0 , one can, as before, define a distribution for the rescaled
noise dq
(
K dq/s(q 0 ), and obtain the conditional distribution func-
tion P k (dq¢|q 0 ). This distribution is plotted for several ranges of
q 0 in figure 4.11c. These ranges of values for q 0 correspond to the
regions delimited by the dashed lines in figure 4.11a, and the symbols
drawn in each region correspond to the symbols in figure 4.11c. Notice
that, once normalized by its standard deviation, the distribution of
Search WWH ::




Custom Search