Biology Reference
In-Depth Information
detail, we will first focus on the two types of node: TF
proteins and DNA targets.
eukaryotic genomes have similar numbers of TF-encoding
genes, more complex metazoans such as mammals have
effectively more TFs than relatively simple multicellular
organisms such as C. elegans [10] . A second mechanism of
obtaining functionally distinct TFs that are encoded by the
same gene is by post-translational modifications (e.g.,
phosphorylation) that can alter the activity of the TF. Indeed,
many TFs are the functional endpoint of signaling pathways
that, for instance, instruct differentiation programs during
embryonic development, or in response to environmental
cues. An example is the DAF-16/Foxo TF, which is the
downstream target of the insulin/IgF signaling pathway in
worms, flies and mammals. When the pathway is active,
DAF-16 is phosphorylated and prevented from translocating
into the nucleus to activate relevant target genes [14] .
Finally, differential dimerization can also greatly affect the
number of active TF entities in an organism because many
TFs bind DNA as obligatory or facultative dimers. For
instance, bHLH TFs bind their cognate target sequences
only as dimers [15] , whereas NHRs can often bind either as
monomers or as dimers [16] . Differential dimerization can
greatly affect the number of active TF entities. For instance,
if TF-X and TF-Y function as both monomers, homodimers
and heterodimers, this is equivalent to five distinct TF
entities [17] . High-throughput system-level protein e protein
interaction studies have been applied to delineate TF
dimerization. For instance, yeast two-hybrid assays (see
Chapter 3) have been used to identify dimers within the
C. elegans bHLH family [18] , and mammalian two-hybrid
assays were applied to large sets of human and mouse TFs
[19] . However, for a complete picture of TF dimerization,
multiple protein e protein interaction assays need to be
applied to all TFs and in a variety of model organisms.
The comprehensive study of TFs is facilitated not only
by the prediction of which genes in a genome encode
putative TFs, but also by the generation of clone resources
that enable their characterization in a variety of experi-
mental assays ( Box 4.1 ). The Gateway cloning system
[20,21] has provided a versatile method for open reading
frame (ORF) cloning, and large collections of clones
(referred to as 'ORFeomes') are available for a variety of
organisms, including C. elegans and human [22,23] . These
resources form the basis for comprehensive TF clone
collections [24 e 27] that can be used in functional assays to
test for protein e protein interactions using yeast two-hybrid
assays; to test for DNA binding using yeast one-hybrid
(Y1H) assays and protein-binding microarrays (PBMs); or
to test for in vivo function using RNA interference (RNAi)
( Box 4.1 ).
GRN Nodes: Transcription Factors
TFs can be grouped into families based on their DNA-
binding domain(s). There are different types of DNA-
binding domains, some of which are lineage specific and
others that are more ubiquitous. Surprisingly, only three
types of DNA-binding domains are found in all kingdoms
of life: cold shock, helix-turn-helix (HTH) type 3, and HTH
psq [4] . Some DNA-binding domains are clearly involved
in direct DNA interactions, including basic helix-loop-
helix (bHLH), basic leucine zipper (bZip) and nuclear
hormone receptor (NHR)-type zinc fingers, whereas other
potential DNA-binding domains, most notably Cys 2 His 2
(C 2 H 2 )-type zinc fingers, can also be involved in RNA
binding or mediate protein e protein interactions [5,6] . This
complicates the annotation of TFs and emphasizes the need
for the application of systematic assays to determine, for
individual domains in individual proteins, whether they
mediate nucleic acid or protein binding.
Identifying all the genes in a genome that encode TFs is
not a trivial task because it can be challenging to recognize
protein domains based on sequence alone, and because of
ambiguity in proposed protein functionality, as described
above. Databases such as InterPro, Pfam and SMART [7 e 9]
can be used to predict which genes in a genome encode TFs,
based on how well their amino acid sequence matches
canonical DNA-binding domains. However, the use of such
tools is limited by how predictive the different domains are
of DNA binding, how accurately proteins are annotated to
possess a particular domain, as well as the quality of the
computational domainmodel. The accuracy and coverage of
TF predictions can be greatly improved by manually
curating protein sequences to ensure that they indeed
possess a likely DNA-binding domain [10,11] . The final
number of TFs encoded in the genome is likely to change for
most organisms based on either improved curation or
experimental identification of sequence-specific pro-
tein e DNA interactions (see below). Overall, between 5%
and 10% of all protein-coding genes of most organisms are
predicted to encode a TF, illustrating the overall importance
of this class of proteins [10 e 13] .
The number of TF-encoding genes in a genome is not the
sole determinant of the total number of functional TFs that
occur in an organism. Alternative splicing can result in
multiple TFs encoded by a single gene, and these TFs can
have different functional or biochemical properties. For
instance, some TF genes encode multiple DNA-binding
domains that can be included or excluded in the protein
product through the use of different gene promoters or by
alternative splicing. Alternative splicing increases with
organismal complexity, and therefore, even though many
GRN Nodes: Cis-Regulatory Elements
CREs serve as the DNA-binding sites for sequence-specific
TFs that either activate or repress gene expression. The term
Search WWH ::




Custom Search