Information Technology Reference
In-Depth Information
Ta b l e 2 . 1 2
Sizes of genomic dictionaries
6
|
=
4
6
12
|
=
4
12
18
|
=
4
18
24
|
=
4
24
|
Γ
|
Γ
|
Γ
|
Γ
10
9
10
12
4096
16
,
777
,
216
68
.
719476736
×
>
281
×
|
D
6
|/
4
6
|
D
12
|/
4
12
|
D
18
|/
4
18
Organism genome
≈
7
×
10
−
9
Nanoarchaeum equitans
0.99
0.025
≈
8
×
10
−
9
Mycoplasma genitalium
0.99
0.029
≈
14
×
10
−
9
Mycoplasma mycoides
0.99
0.038
10
−
9
Haemophilus influenzae
1
0.089
≈
26
×
≈
67
×
10
−
9
Escherichia coli
1
0.207
≈
90
×
10
−
9
Pseudomonas aeruginosa
1
0.175
≈
169
×
10
−
9
Saccharomyces cerevisiae
1
0.393
10
−
9
Sorangium cellulosum
1
0.23
≈
185
×
10
−
9
≈
×
H. sapiens chr. 19
1
0.639
610
≈
1,315
×
10
−
9
Caenorhabditis elegans
1
0.83
≈
1,712
×
10
−
9
Drosophila melanogaster
1
0.947
Ta b l e 2 . 1 3
Genomic indexes, dictionaries, and tables
Indexes/Dictionaries/Tables
Notation
Definition
{
α
∈
Γ
∗
:
Genomic Dictionary
D(G)
α
⊂
G
}
k
D
k
(
G
)
∩
D
(
G
)
k
-Genomic Dictionary
Γ
k
-Genomic Table
T
k
(
G
)
α
→
α
(
G
)
:
α
∈
D
k
(
G
)
k
-Lexicality
L
k
(
G
)
|
D
k
(
G
)
|/|
T
k
(
G
)
|
1
/
(
1
+
4
k
/|
G
|
)
−|
D
k
(
G
)
|/
4
k
k
-Dictionary Selectivity
DS
k
(
G
)
Multiplicity-coMultiplicity
k
-distribution
MC
k
(
G
)
j
→|{
α
∈
D
k
(
G
)
: α
(
G
)=
j
}|
k
F
k
(
G
)
\
D
k
(
G
)
Forbidden
k
-Factors
Γ
k
Minimal Forbidden Length
MF
(
G
)
⊥{
k
: Γ
⊆
D
k
(
G
)
}
Factor Length Selectivity
LS
(
G
)
lg
4
|
G
|−
(
MF
(
G
)+
1
)
Hapaxes
H
(
G
)
{
α
∈
D
(
G
)
: α
(
G
)=
1
}
k
k
-Hapaxes
H
k
(
G
)
Γ
∩
H
(
G
)
k
-Hapax-factor ratio
HD
k
(
G
)
|
H
k
(
G
)
|/|
D
k
(
G
)
|
MH
(
G
)
⊥{|
α
|
α
∈
H
(
G
)
}
Minimal Hapax Length
:
Repeats
R
(
G
)
{
α
∈
D
(
G
)
: α
(
G
)
>
1
}
k
k
-Repeats
R
k
(
G
)
Γ
∩
R
(
G
)
Maximal Repeat Length
MR(G)
{|
α
|
:
α
∈
R
(
G
)
}
Repeat Positions
RP
(
G
)
α
→
pos
G
(
α
)
: α
∈
R
(
G
)
Length-Multiplicity Repeatability
LM
(
G
)
j
→
α
(
G
)
:
|
α
|
=
j
,
α
∈
R
(
G
)
Average
k
-Repeatability
AR
k
(
G
)
|
T
k
(
G
)
\
H
k
(
G
)
|/|
R
k
(
G
)
|
k
-Repeat-factor ratio
RD
k
(
G
)
|
R
k
(
G
)
|/|
D
k
(
G
)
|