Genetic Mapping of Complex Traits - Genomics: Essential Methods

Biomedical Engineering Reference

In-Depth Information

n The program subsets the dataset fulldata so that only observations for which loc1 is not

missing are used; this ensures that both logistic regression runs are carried out on the same set

of observations. The variable 'pvalue' thus indicates the strength of evidence for association

between case/control status and the single SNP being analyzed.

o In practice, step 2 would be programmed into a loop across all SNPs for analysis. A ''by''

statement to merge by the SNP name should be added to the merge step.

4.2.2 Association methods: family-based samples

As explained in the previous section, a case -control association study design is susceptible

to false positives attributable to population admixture rather than due to proximity of the

associated marker to a trait-causing mutation. This problem can be minimized by using

family-based designs. Use of so-called family-based controls was popularized by the trans-

mission disequilibrium test (TDT) [66]. The premise of the TDT is that, if a locus has an

allele associated with disease, a parent who is heterozygous at the locus will transmit the

associated allele to affected offspring more often than the proportion of 0.5 expected under

the null of no association. In effect, the transmitted 'case' alleles and the non-transmitted

'control' alleles are contributed by a single person (the parent) and thus are expected to be

well matched, given the ancestry of the parent. Thus, the potentially confounding effect of

cryptic population stratification is reduced or eliminated.

The original TDT was geared for analysis of trios consisting of two parents and an

affected offspring. Extensions have allowed the analysis of more extended pedigrees [67].

A unifying framework for family-based association tests (FBATs) was introduced by Rabi-

nowitz and Laird [68] and Laird et al . [69]. The key feature of their approach is that

it computes the null distribution of marker alleles in offspring conditional on appropri-

ate features of the data, so that this conditional distribution follows Mendelian segregation

regardless of the phenotype configuration in the family. When parental genotypes are known,

the distribution is computed conditional on all phenotypes and parental genotypes. When

parental genotypes are missing, the distribution is conditional also on the offspring geno-

type configuration. The point is that resulting tests of association that compare the observed

pattern of allele segregation with that expected under the conditional null distribution have

a correct Type I error rate and are protected from biases caused by population admix-

ture. A software toolkit for carrying out a broad range of FBAT-based tests is available at

http://www.biostat.harvard.edu/

fbat/default.html, and new tests based on the core condi-

tional framework continue to be developed [70 - 72]. The PLINK package [33] also carries

out basic family-based association tests.

∼

4.2.2.1 Quality control for family data

In addition to some of the safeguards mentioned in the discussion of unrelated case - control

data, additional measures are important for family data (whether for association analysis or

for linkage designs described below). The availability of reported familial relationships, and

the reliance of analysis methods on these relationships, means that, first, these relationships

should be validated by evaluating the compatibility of the genotype data with these rela-

tionships. The family structures also provide an added means of detecting some genotyping

errors. Several user-friendly and well-documented programs for these tasks exist, such as

PEDSTATS [32] and PREST [74, 75].

Genomics: Essential Methods

Search WWH ::

Custom Search

Home