Search Engines - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

At each step, the number of characters in the search string—in this case, three—advances the search

unless a character in the search text corresponding to the length of the search string is also in the

search string. For example, in line 2 of Figure 4-11 , there is no space in the search string and so the

search advances three characters. However, in line 3, the "n" character in "nth" is also in the search

string, but the n's don't line up. In this way, skipping continues until line 6, where the "rot" in

"protein" contains the "r" character in the search string. A comparison of the first characters is made,

which matches, and so the second characters are compared. This comparison fails, and the search is

again advanced. In line 7, there is again a possible match, and the search string is advanced one

character to verify a match, as in line 8. However, the space at the end of "protein" doesn't match

the "A" in "RNA", and the search position is shifted three places to the right, as in line 10.

This process continues, stopping to check for matches whenever the next three characters contains

an "r", "n", or "a". That is, if the algorithm is examining a character that doesn't occur in the search

string at all, the algorithm moves ahead by the length of the entire search string. If a character does

appear in the search string, the algorithm advances the search marker by the distance between that

character and the right end of the string. By step 18, the search is complete, on a total of 44

characters (including spaces). The number of comparisons per character is approximately 0.4—an

efficient search algorithm.

A second search heuristic makes use of repeating patterns in the search string, such as "hand to

hand, door to door", and attempts to match the first repeating word, according to the algorithm

described for skipping characters. When a match is made, the search string is advanced to the point

that the first occurrence of the repeated word is aligned with the first occurrence of the matching

term in the main text. A comparison is now made for the second occurrence of the search term in the

text. Obviously, the major gain in computational efficiency and performance is obtained by the first

heuristic, and the advantage of the second heuristic is dependent on the appearance of repeating

words in the search pattern.

In addition to running time, another major metric for characterizing search algorithms is the need for

backtracking. Some search algorithms are linear, working efficiently from the beginning of the

sequence to be searched to the end, whereas others move back and forth in the text to be searched

during processing. For example, the skipping search algorithm moves the index ahead by the number

Search WWH ::

Custom Search

Home