Information Technology Reference
In-Depth Information
2.1 The Genome
In gene expression programming, the genome or chromosome consists of a
linear, symbolic string of fixed length composed of one or more genes. De-
spite their fixed length, we will see that GEP chromosomes code for expres-
sion trees with different sizes and shapes.
2.1.1 Open Reading Frames and Genes
The structural organization of GEP genes is better understood in terms of
open reading frames (ORFs). In biology, an ORF, or coding sequence of a
gene, begins with the start codon, continues with the amino acid codons, and
ends at a termination codon. However, a gene is more than the respective
ORF, with sequences upstream of the start codon and sequences downstream
of the stop codon. Although in GEP the start site is always the first position
of a gene, the termination point does not always coincide with the last posi-
tion of a gene. It is common for GEP genes to have noncoding regions down-
stream of the termination point. (For now we will not consider these noncoding
regions, as they do not interfere with the product of expression.)
Consider, for example, the algebraic expression:
a
b
u
c
d
(2.1)
It can also be represented as a diagram or expression tree (ET):
Q
a
c
b
d
where “Q” represents the square root function.
This kind of diagram representation is in fact the phenotype of GEP genes,
the genotype being easily inferred from the phenotype as follows:
01234567
Q*-+abcd (2.2)
which is the straightforward reading of the ET from left to right and from top
Search WWH ::




Custom Search