Biomedical Engineering Reference
In-Depth Information
while simultaneously estimating population allele frequencies. First a clustering
algorithm estimates the number K of subpopulation in which the population is
structured. Then, using the estimated allele frequencies the likelihood that a given
genotype (for all individual and all locus) originates in a particular subpopulation
is calculated using a bayesian approach by calculating the conditional probability
P f x il D j j Z; P g where Z is the original population, P are the frequencies of all
the subpopulation and x i is the genotype of the individual i at the locus l . Finally,
the probability P. z i D k/ of each individual to belong to a particular subpopulation
is computed starting from the condition that all these probabilities are equal to 1=K,
where z i is the population from which the individual i originates. Individuals of un-
known origin can be assigned to a specific population according to these likelihoods.
In this way, it is possible to estimate the substructure of the original population, but
it is not possible to correct for PS.
9.2.2.3
Eigensoft
The Eigensoft package (version 2.0 for Linux platform, Department of Genetics,
Harvard Medical School, Boston, USA) assesses stratification by performing a PCA
with the highest possible number of SNPs. PCA involves a mathematical procedure
that transforms a number of possibly correlated variables into a smaller number of
uncorrelated variables called principal components. The first principal component
accounts for as much of the variability in the data as possible, and each succeeding
component accounts for as much of the remaining variability as possible. PCA was
invented in 1901 by Karl Pearson30. PCA involves the calculation of the eigenvalue
decomposition of a data covariance matrix. Eigensoft uses PCA to reduce the num-
ber of variables that describe the sample (300K SNPs scattered along the genome) in
fewer dimensions that allow clustering the individuals on the basis of their genetic
variance. The package contains many tool, the most important for our task are the
smartpca and eigenstrat. SMARTPCA has been used to perform PCA on genotype
data and to generate eigenvectors (principal components, PCs) and eigenvalues. To
estimate the statistical significance of the population divergence in PC scores, anal-
ysis of variance (ANOVA) is performed among individuals divided in cases and
controls and also according to the ethnic groups. Along each PC, a comparison
between means and variances within subgroups (case/control and ethnic group) are
computed in order to estimate the population differences. We represent the scree plot
of the eigenvalues of PCA to evaluate which are the PCs that describe the largest
genetic variance and to confirm the ANOVA results. A scree plot is a simple line
segment plot that shows the fraction of total variance in the data as explained or
represented by each PC. We used the software to adjust genotypes and phenotypes
by variation attributable to ancestry along each PC, by computing residuals of linear
regressions. Adjusted genotype is given by:
P j a j g ij
P j a j
g . adj /
ij
D g ij i a j
i D
;
(9.3)
Search WWH ::




Custom Search