Biology Reference
In-Depth Information
Box 4.2 Computational Approaches for Predicting Transcriptional Regulatory Interactions
De novo motif finding:
What it provides:
l
Not all TFeDNA-binding site sequence matches will
necessarily correspond to functional, regulatory sites
l
Sequence motif(s) over-represented within an input set of
DNA sequences
How it works:
l Input sequences are searched, using one of multiple avail-
able search algorithms, for over-representation as compared
to user-defined background sequences
Advantages:
l Does not require prior hypothesis of which TF DNA-binding
sites might be over-represented
Disadvantages:
l
Gene regulatory network inference:
What it provides:
l Varies depending on the algorithm; can provide information
on either regulatory DNA motifs or on regulatory proteins
and their putative target genes
How it works:
l
Lever algorithm identifies which DNA motifs (from a previ-
ously compiled set of motifs) or motif combinations are
enriched within predicted CRMs associated with user-
defined, input gene sets as compared to background genes
[64]
Identified sequence motifs might not actually serve a regu-
latory role
Module Networks algorithm applies a machine learning
approach to user-input gene expression data to assign genes
to sets of putatively co-regulated genes and to learn which
regulatory proteins (from a previously compiled set of
putative regulatory proteins) appear to regulate the gene sets
[65]
l
Identified motifs may not permit unambiguous mapping to
a putative DNA-binding protein
l
Depends
on
appropriate
definition
of
background
l
sequences
Scanning genomic sequence for TF binding site sequence matches:
What it provides:
l
ARACNe algorithm applies a mutual information approach
to identify interacting genes from user-input gene expression
data [66]
l
Locations and identities of candidate TF DNA-binding sites,
according to user-defined search parameters, within user-
defined input sequences
How it works:
l
Bayesian analysis has been applied to gene expression data
[67] and ChIP-chip data [68] to infer GRNs
Advantages:
l
l
A user-defined input sequence (e.g., a promoter or candi-
date CRM) is scored for matches to TF binding site
sequences using either a word-based or position weight
matrix (PWM) model of the binding sequences [41]
Advantages:
l Can be automated to search many and/or lengthy
sequences, for matches to binding sites for one or more TFs
l Systematic scan of acceptable DNA-binding site sequences
with match scores (depending on choice of algorithm)
l Input DNA sequences can be either unbiased or filtered for
likely regulatory regions
Disadvantages:
l
Can be automated to search many and/or lengthy
sequences, for matches to binding sites for one or more TFs
l Input DNA sequences can be either unbiased or filtered for
likely regulatory regions
Disadvantages:
l
Enriched motifs or motif combinations might not permit
unambiguous mapping to a putative DNA-binding protein
Regulation of a gene set by a protein, as inferred from gene
expression data, could be due to either direct or secondary
effects
l
Some regulatory proteins are not controlled at the level of
gene expression, but rather post-transcriptionally
l
Depends on quality and depth of TFeDNA-binding site data
and corresponding binding model and user-defined scoring
threshold
three novel enhancers that confer Shh-like expression [69] .
Another example was the testing of 167 highly conserved
non-coding elements that resulted in the identification of 61
enhancers that are active in the brain [70] . Another
strategy has focused on non-coding regions containing
multiple TF-binding sites identified by ChIP (see below and
Box 4.1 ); seven of 13 regions bound by multiple sequence-
specific cardiac TFs and the transcriptional co-activator
p300 were found to drive cardiac expression in transient
transgenic mouse embryos [71] . Finally, DNase I hyper-
sensitive (DHS) sites [72 e 74] and regions detected in
formaldehyde-assisted isolation of
(FAIRE) assays [75] correspond to nucleosome-depleted
regions and have been shown to be effective in predicting
TF-binding locations that may function as enhancers [76] .
Extensive data associated with regulatory elements,
including DHS, FAIRE, and ChIP-seq on TFs, histone
modifications, and CBP have been generated by the
ENCODE project [77] , the NIH Epigenomics Roadmap
project, and other groups for many commonly studied
human cell lines and cell types [78,79] . Similar data sets
have been generated for cell lines and whole animals by the
modENCODE Consortium for D. melanogaster [80] and C.
elegans [81] .
regulatory elements
Search WWH ::




Custom Search