Strings and Genomes - Infobiotics: Information in Biotic Systems

Information Technology Reference

In-Depth Information

cases, there are few elements with maximal multiplicity, indeed Zipf curves initially

slope down steeply.

Selectivity, Lexicality, and Forbidden Words

4 k , which is the per-

centage of different k -mers occurring in G with respect to all the possible ones. Of

course,

We call k - lexical fraction of a genome G the value

D k (

) |/

4 k is an upper bound for

4 k . A better evaluation for such an

T k (

) |/

D k (

) |/

4 k

upper bound is given by the value 1

/ (

| )

which approximates

D k (

) |/

for a random sequence over

. In fact, let us assume that G is

random, then if q is the fraction of k -mers occurring at least once in G , then the

fraction of k -mers occurring at least twice in G is q 2 , and in general the fraction of

k -mers occurring at least i times is q i , therefore, assuming q

having length

1, for a very long

genome G , its length can be approximated in the following way [25]:

4 k

q 2

q i

4 k

| =

(

+ ...

... )=

q .

−

Therefore,

4 k q

| (

−

that is:

4 k

| =

( |

| +

)

which implies:

4 k

=( |

| +

) /|

or equivalently, the fraction of k -mers occurring in a random genome of length

(of length sensibly shorter than 4 k )is:

4 k

| ) .

(2.5)

4 k for the genomes of Table 2.11 are in all cases

sensibly under this estimation. For example, for H. sapiens chr. 19 ,1

The computations of

D k (

) |/

4 12

/ (

| )

4 12

is equal to 0

791, while

D 12 |/

is equal to 0

639. We define for a genome G its

k - dictionary selectivity DS k (

)

as the following difference:

4 k

DS k (

/ (

| ) −|

D k (

) |/

(2.6)

Dictionary selectivity very often proves more indicative than the k -empirical entropy

of E k (

)

, which can be defined as:

(

T k (

))

by applying to T k (

)

the following general definition of entropy E

(

)

of a multiset

X of size n with m elements of multiplicities n 1 ,

n 2 ,...,

n m :

Infobiotics: Information in Biotic Systems

Search WWH ::

Custom Search

Home