Biomedical Engineering Reference
In-Depth Information
are related, since regions of high LD exhibit reduced haplotype diversity and vice versa.
The availability of extensive LD data on multiple human populations via the International
HapMap Consortium (www.hapmap.org) has been a critical resource for designing associ-
ation studies [17, 18].
LD occurs when alleles at two loci co-occur at frequencies different than expected under
independent assortment. The traditional measures of pairwise LD include the LD coefficient
D
p A1 p B1 ,where p A1 (or p B1 ) is the allele frequency of allele 1 at locus A (or
locus B), and h 11 is the frequency of the haplotype consisting of allele 1 at locus A and
allele 1 at locus B (that is the 1 - 1 haplotype). Because the range of D varies depending on
the allele frequencies at loci A and B, D is often normalized to range from
=
h 11
1 to 1, giving
the normalized disequilibrium coefficient D =
| max is the maximum
possible (absolute) value for D given the allele frequencies of the two loci. High values of
|
D/
|
D
| max ,where
|
D
D |
are often used to recognize genomic regions of reduced recombination. The correlation
coefficient is r
D/(p A1 p A2 p B1 p B2 ) 0 . 5 ,and r 2 is commonly used for determining whether
one locus may serve as a proxy for another due to strong correlation between the loci, as
discussed further below.
A typical SNP tagging method in the haplotype-based category is to choose 'haplotype
tag SNPs' to represent contiguous blocks of SNPs that show reduced haplotype diversity
[19 - 21]. These tag SNPs then allow all haplotypes, or all common haplotypes, to be dif-
ferentiated from each other based on genotypes only at the tags.
In contrast, popular LD-based methods do not necessarily take explicit note of the hap-
lotypes that may be inferred or observed across multiple SNPs. Rather, they use pairwise
LD measures (which for genotype data usually require estimation of two-locus haplotype
frequencies) to determine a group of markers which can serve as proxies for unassayed
markers based on the pairwise correlations. Carlson et al . [22] developed a greedy algo-
rithm to define 'bins' of markers, not necessarily contiguous, where at least one marker in
the bin has sufficiently high r 2
=
with all other markers in that bin; such a marker may then
be chosen as a tag for that bin.
The r 2 bin method has gained popularity due to ease of use and interpretability, especially
considering the following useful relationship between the power to detect disease association
and the strength of LD between the tags and bin members. For a given value of r 2 between
a disease locus and SNP marker, the sample size needed to have equivalent power to detect
disease association with alleles at the marker, rather than at the true locus, is increased by
a factor of approximately 1/ r 2 [23]. More precise relationships can also be computed [24].
A popular threshold to define bin tags is to require that the tag has r 2
0.8 with all bin
members. It is useful to keep in mind that allele frequencies must be similar for two SNPs
to have a high value of r 2 . Thus, it is typical for an LD block defined by D to be partitioned
into multiple r 2 bins, each of which may consist of non-consecutive SNPs, according to the
underlying allele frequencies of the markers in the block.
For researchers wishing to select tag SNPs for particular genes or regions of interest,
the HapMap website offers a browser interface allowing selection of tag SNPs from a
selection of methods, including the r 2 bin method. Figure 4.1 illustrates the use of the
HapMap browser (http://www.hapmap.org/cgi-perl/gbrowse/hapmap_B35/) to select tags
for the gene CHRNA5 . Other software tools for tag SNP selection include SNPtagger
[20], Snagger [25] and Haploview [19] for haplotype tagging, and LDselect [22] and
Tagger [26] for r 2 bin tagging. The SNP Annotation and Proxy Search (SNAP) web-
site [27] is a user-friendly tool that will look up all SNPs tagged by a provided list
of SNPs.
Search WWH ::




Custom Search