Biology Reference
In-Depth Information
t -test), Kruskal Wallis (one-way ANOVA) or Friedman test
(one-way repeated measures ANOVA). These tests can also be
easily performed in R using wilcox.test{stats}, kruskal.test{stats},
or friedman.test{stats} respectively.
When multiple hypotheses are tested as a result of multiple com-
parisons a method for controlling the error should be implemented
in our workflow. For example in a standard proteomic analysis of
about 800 variables (spots, identified proteins, etc.) in which uni-
variate statistics have been performed with a significance of 95 %
( p = 0.05), 40 false positives can be expected (800 × 0.05 = 40). For
controlling the error different solutions can be applied: the family-
wise error rate (FWER), such us the Bonferroni correction, or
False Discovery Rate (FDR) procedures [ 12 ]. FWER procedures
are very conservative because they only control the probability of
false rejections of the null hypothesis (false positives), at the cost of
increasing the number of false negatives. These methods are
adequate when looking for a small number of strong biomarkers.
On the other hand the FDR controls the expected proportion of
incorrectly rejected null hypotheses (i.e. 1 %, 5 %), being in conse-
quence more powerful than FWER procedures but at the cost of
increasing the likelihood of obtaining type I errors between the
rejected hypotheses. Classically after a FDR analysis we will have
p -values and q -values (corrected p -value after FDR). To interpret
the q- values it is necessary to look at the ordered list of q -values.
For example if we now that 52 spots have a q -value less than 0.023
we should expect 52 × 0.0181 = 0.9412 false positives until the
52th spot. This is another way to use the q -values, order them and
know the number of expected false positives until different thresh-
olds, to adapt the cut-off to our experiment (it is not the same the
establishment of a specific biomarker than descriptive proteomics).
Obviously the use of the q -values will not always result in a lower
number of false positives, but it gives a more accurate indication of
the level of false positives for a given cut-off value.
3.7 Controlling
Statistical Error of
Multiple Comparisons
3.8 Multivariate
Statistics
These methods are intended to reduce the complexity of the
sample to a minimum by “condensing” the original variation of
the samples into a reduced number of elements (process which is
called reduction of dimensionality). Multivariate data analysis
methods are useful for pinpointing the relevant variables for treat-
ment discrimination by focusing not only on single spot differ-
ences, but on the covariance structure between proteins [ 13 ]. It is,
thereby possible to point out which combination of spots could be
valuable to identify and characterize them in more detail or also
quickly define outlier samples.
Principal Component Analysis (PCA) generates new variables,
called components (PCs), which condense the variability of the
samples. PCs are not correlated between each other and are
3.8.1 Principal
Component Analysis
Search WWH ::




Custom Search