Biology Reference
In-Depth Information
binding DNA. Further, TF binding in the context of the
yeast nucleus to the DNA fragment assayed (which is
integrated into the yeast genome) will depend on the
assembly of nucleosomes in a similar manner as in the
system from which the gene under study was selected. ChIP
assays also have specific limitations: ChIP-grade antibodies
are not available for most TFs, ChIP is challenging to
perform for low-abundance TFs, or TFs expressed in only
few cells in the organisms, and may detect indirect
DNA e TF interactions due to the cross-linking of larger
protein e DNA complexes such as enhancer e promoter
loops (see Chapter 7). Finally, because of statistical
thresholds chosen, true but weaker ChIP interactions may
not have been included.
False positives are a much more contentious data
quality issue than false negatives. First, with any assay,
whether high- or low-throughput, there are two types of
false positive: technical false positives that cannot be
reproduced, even with the same assay, and biological false
positives that can be reproduced technically, but that are not
valid biologically. The first type of false positive is obvi-
ously highly detrimental to the quality of the resulting
GRN, and many quality control steps have been introduced
in the different assays to assure that such erroneous inter-
actions are kept to a minimum (see Chapter 3). Examples
for ChIP assays are the use of multiple antibodies in
parallel, or comparing ChIP data in the test sample to data
obtained in a sample in which the TF or its DNA-binding
activity is lost due to a mutation or knockdown by RNAi. In
Y1H assays interactions can be re-tested in fresh yeast
cells, and, where possible, should confer a positive readout
with both reporters. Biological false positives, or true
negatives, are much more challenging to define and there-
fore to identify. Technically, it should be appreciated that
the assay used to validate the regulatory or phenotypic
consequence of a TF-DNA interaction will have its limi-
tations and will therefore not detect all true interactions.
Aside from technical issues, the question is how a biologi-
cally meaningful interaction is defined. Textbook views on
transcriptional regulation (such as Figure 4.1 C) draw a TF-
binding event, link it to a single gene and assume that each
binding event will have a regulatory consequence. It is
becoming clear, however, that it is not that simple, as many
TF-DNA-binding events appear to be silent (e.g.,
[121 e 123] . For instance, when ChIP and expression
profiling data of the transcriptome were compared, only
a small overlap in genes that are both bound by a TF and
change in expression in its absence appears to be the rule
rather than the exception [124] .
There are both conceptual and technical reasons for the
lack of overlap between physical and regulatory interactions
[44] . First, regulatory relationships between TFs and genes
that change in expression upon TF reduction or over-
expression can be indirect. For instance, in linear cascades
( Figure 4.3 C), where one TF affects the expression of
another, reduction of either can affect the expression of
a downstream gene, but only one may directly bind its
regulatory DNA region(s). Practically, it can also be that
both TFs do physically associate with the gene in vivo, but
only one can be detected with any of the physical interaction
assays ( Box 4.1 ). Conversely, several reasons can explain
why not all physical interactions have a measurable regu-
latory consequence. For example, a physical interactionmay
affect gene expression only under particular conditions, for
instance upon activation of the bound TF in response to an
outside signal. Further, the functional role of a TF may be
masked because of redundancy with other bound TFs. Also,
the binding event may have been attributed to the wrong
gene. This is a more likely issue in larger genomes where
CREs can be located far from TSSs. Taken together, it is
extremely challenging to identify which interactions in
a GRN may never be functional. Overall, it is clear that
there is no single method that will identify all TF e DNA
interactions, and therefore there is a need for the develop-
ment and application of a variety of methods, as well as the
integration of the data obtained with other types of pertinent
information such as gene
expression patterns
and
phenotypes.
GRN Structure and Function
GRNs can be analyzed computationally using graph-theo-
retical algorithms and statistics, both for their overall
structure or topology and for their local circuitry (see
Chapter 9). Such analyses can provide insights into the
design principles of gene regulation at different levels,
which cannot be obtained from single-gene studies. The first
level of GRN analysis is usually to determine the number of
edges in which each node in the network participates. This is
referred to as the degree of a node ( Figure 4.3 A). Because
GRNs are both bipartite and directional, there are two types
of degree: out-degree and in-degree. The out-degree (k out )is
defined as the number of genes bound or regulated by
a particular TF, while the in-degree (k in ) is the number of TFs
that bind or regulate a certain gene. To obtain a view of the
overall wiring and hence the topology of a GRN, the degree
for each node in the network is computed and the distribu-
tion of the degrees is plotted on a graph ( Figure 4.3 A).
Interestingly, most real networks, whether biological or not,
exhibit a scale-free degree distribution [125] . This indicates
that the vast majority of nodes have relatively few connec-
tions, but that a small number of nodes are extremely highly
connected. Such highly connected nodes are referred to as
hubs. In GRNs, the out-degree is best fit by a scale-free
degree distribution. The in-degree, however, is fit better by
an exponential degree distribution [105] . Although the
difference between these distributions is not yet understood,
it is clear that most TFs bind or regulate small sets of genes,
Search WWH ::




Custom Search