Biomedical Engineering Reference
In-Depth Information
condition in which some combinations of alleles or genetic markers occur more or
less frequently than can be accounted by chance. LD indicates that alleles at differ-
ent loci on the same DNA strand are transmitted together. Leveraging on specific
“properties” of the single nucleotide polymorphisms (SNPs), like their high allelic
frequency and their unique position within the human genome, SNPs have been
shown to act as universal markers able to flag genes and/or chromosomal regions
potentially relevant for the disease under investigation. When a given SNP - or a
cluster of SNPs - shows a statistically significant difference in their allelic or geno-
typic frequency between cases and controls, this finding points to a role of the locus
mapped by those SNPs in the etiopathogenesis of the disease. If genetic variations
are more frequent in subjects with the disease, the variations are said to be posi-
tively “associated,” representing risk factors to develop the disease. The associated
genetic variations are then considered pointers to the region of the human genome
where the hypothetical disease-causing locus resides [ 1 - 4 ]. In case-control asso-
ciation studies, population stratification (PS) occurs when allele frequencies differ
between cases and controls due to ancestry differences, ethnic background or even
to “hidden” stratification. Population structure can lead to spurious findings between
a phenotype and unlinked candidate loci, causing either false positive or false neg-
ative results when analysing SNPs for association [ 5 - 7 ]. To control these issues,
different strategies have been proposed. Fst [ 8 ]andSTRUCTURE[ 9 ] methods al-
low only detecting but not correcting the possible population substructure. Fst test
measures the population genetic differentiation and assesses the variation in the sub-
populations by quantifying the loss of heterozygosity. Fst strongly depends on the
number of SNPs used. STRUCTURE assigns subjects to discrete subpopulations
computing the likelihood a given genotype originated in each population. The ma-
jor limitations of STRUCTURE are the intensive computational cost on large data
sets if applied to genome-wide data sets and the sensitivity to the number of clus-
ters defined by users before analysis. Recently, Li et al. [ 10 ] proposed a likelihood
based algorithm that can substantially speed-up the calculations. Genomic control
(GC) [ 11 ], EIGENSTRAT, based on principal-component analysis (PCA) [ 12 ]and
Cochran-Mantel-Haenszel test (CMH) in PLINK [ 13 ] are the currently most com-
mon approaches used to correct PS in genetic association analysis. GC rescales the
association statistics by a common overall factor (inflation factor) at each marker,
while EIGENSTRAT and PLINK use multivariate techniques designed to reduce
the data to a small number of dimensions, taking into account as much variabil-
ity as possible. These methods enable explicit detection and correction of PS on a
genome-wide scale. Limitations of GC depend on the uniform adjustment that may
be insufficient because it is not specific for each marker and for the related allele
frequency across populations. The threshold of inflation factor that allows consider-
ing a sample as sub-structured and the resulting association inflated by stratification
is, however, not universally defined. PCA appears to be widely used with STRUC-
TURE to analyse the population structure of different worldwide populations as
reported by Bauchet et al. [ 14 ], Tian et al. [ 15 ] and Price et al. [ 16 ] in European
and European-American populations. In several simulation study, the PCA method
was also the most powerful to control stratification effects using PCs as covariates
Search WWH ::




Custom Search