Biomedical Engineering Reference
In-Depth Information
n The program subsets the dataset fulldata so that only observations for which loc1 is not
missing are used; this ensures that both logistic regression runs are carried out on the same set
of observations. The variable 'pvalue' thus indicates the strength of evidence for association
between case/control status and the single SNP being analyzed.
o In practice, step 2 would be programmed into a loop across all SNPs for analysis. A ''by''
statement to merge by the SNP name should be added to the merge step.
4.2.2 Association methods: family-based samples
As explained in the previous section, a case -control association study design is susceptible
to false positives attributable to population admixture rather than due to proximity of the
associated marker to a trait-causing mutation. This problem can be minimized by using
family-based designs. Use of so-called family-based controls was popularized by the trans-
mission disequilibrium test (TDT) [66]. The premise of the TDT is that, if a locus has an
allele associated with disease, a parent who is heterozygous at the locus will transmit the
associated allele to affected offspring more often than the proportion of 0.5 expected under
the null of no association. In effect, the transmitted 'case' alleles and the non-transmitted
'control' alleles are contributed by a single person (the parent) and thus are expected to be
well matched, given the ancestry of the parent. Thus, the potentially confounding effect of
cryptic population stratification is reduced or eliminated.
The original TDT was geared for analysis of trios consisting of two parents and an
affected offspring. Extensions have allowed the analysis of more extended pedigrees [67].
A unifying framework for family-based association tests (FBATs) was introduced by Rabi-
nowitz and Laird [68] and Laird et al . [69]. The key feature of their approach is that
it computes the null distribution of marker alleles in offspring conditional on appropri-
ate features of the data, so that this conditional distribution follows Mendelian segregation
regardless of the phenotype configuration in the family. When parental genotypes are known,
the distribution is computed conditional on all phenotypes and parental genotypes. When
parental genotypes are missing, the distribution is conditional also on the offspring geno-
type configuration. The point is that resulting tests of association that compare the observed
pattern of allele segregation with that expected under the conditional null distribution have
a correct Type I error rate and are protected from biases caused by population admix-
ture. A software toolkit for carrying out a broad range of FBAT-based tests is available at
http://www.biostat.harvard.edu/
fbat/default.html, and new tests based on the core condi-
tional framework continue to be developed [70 - 72]. The PLINK package [33] also carries
out basic family-based association tests.
4.2.2.1 Quality control for family data
In addition to some of the safeguards mentioned in the discussion of unrelated case - control
data, additional measures are important for family data (whether for association analysis or
for linkage designs described below). The availability of reported familial relationships, and
the reliance of analysis methods on these relationships, means that, first, these relationships
should be validated by evaluating the compatibility of the genotype data with these rela-
tionships. The family structures also provide an added means of detecting some genotyping
errors. Several user-friendly and well-documented programs for these tasks exist, such as
PEDSTATS [32] and PREST [74, 75].
Search WWH ::




Custom Search