Complete and Exact Peptide Sequence Analysis Based on Propositional Logic - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

reported in Table 1.1 , considering the sequence written horizontally, the last (the

rightmost) element would correspond to the symbol multiplying 21 0 , the last-but-

one element would correspond to the symbol multiplying 21 1 and so on. Moreover,

the first symbol (Gly) in the list of possible components (Table 1.1 ) would mean

number 1, the second (Ala) number 2 and so on. An empty position (no amino acid)

would mean number 0. This holds because, if any other amino acid would mean 0,

a sequence beginning with that amino acid would correspond to the same number

as the same sequence without the initial amino acid, and the correspondence would

not be biunivocal.

Example 1.5. The sequence Gly-Ser-Gly-Tyr, or, more precisely,

< no amino acid > < no amino acid > Gly Ser Gly Tyr

would then corresponds to the number 0 ... 0 1 3 1 20(or K) in base 21, that in base 10

is 20 21 0 . D 20/ C 1 21 1 . D 21/ C 3 21 2 . D 1323/ C 1 21 3 . D 9261/ D 10625.

The weights of all sequences up to molecular weight are therefore computed off-

line and stored in correspondence with the described natural numbers representing

the sequences. This computation may be done efficiently using smaller solutions to

gradually compute larger solutions. Note that more sequences may have the same

molecular weight; hence, one weight may correspond to more than one natural num-

ber, even if one natural number corresponds to only one sequence, hence to one

weight. The natural numbers may also be not stored, but simply be the indices of an

array memorizing the weights. This constitutes the weights database: given a molec-

ular weight, it allows to find almost instantaneously which are all the sequences of

components that could produce a portion of normalized peptide having that weight.

Va l u e is chosen big enough to cover all the possible gaps that one could need to

sequence in the set of current analyses.

Therefore, for each gap b hC1 b h , the set of all the possible subsequences

S.b hC1 b h / covering that gap is computed in extremely short times by search-

ing the weights database for all natural numbers corresponding to the weight

b hC1 b h , and by explicitly generating the subsequences corresponding to such

natural numbers.

When all the sets of subsequences S.b hC1 b h /; h D 0;:::;p are available, all

the possible sequences

S of the normalized peptide under the peak interpretation

can be generated with the concatenation of such sets in all possible ways, oper-

ation which we denote by ˚ , but eliminating sequences violating the requirements

regarding minimum m i

or maximum M i

value on the number of each component.

S D S.b 1 b 0 / ˚ S.b 2 b 1 / ˚˚ S. w 0 c a c 0 b p /

Finally, when considering the sets of all the possible sequences

f S 1 ; S 2 ;:::;

S r g

for all the possible models f 1 ; 2 ;:::; r g

, the complete set of all

possible sequences

of the normalized peptide is obtained:

S D S 1 [ S 2 [[ S r

Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Search WWH ::

Custom Search

Home