Biomedical Engineering Reference
In-Depth Information
At each step, the number of characters in the search string—in this case, three—advances the search
unless a character in the search text corresponding to the length of the search string is also in the
search string. For example, in line 2 of Figure 4-11 , there is no space in the search string and so the
search advances three characters. However, in line 3, the "n" character in "nth" is also in the search
string, but the n's don't line up. In this way, skipping continues until line 6, where the "rot" in
"protein" contains the "r" character in the search string. A comparison of the first characters is made,
which matches, and so the second characters are compared. This comparison fails, and the search is
again advanced. In line 7, there is again a possible match, and the search string is advanced one
character to verify a match, as in line 8. However, the space at the end of "protein" doesn't match
the "A" in "RNA", and the search position is shifted three places to the right, as in line 10.
This process continues, stopping to check for matches whenever the next three characters contains
an "r", "n", or "a". That is, if the algorithm is examining a character that doesn't occur in the search
string at all, the algorithm moves ahead by the length of the entire search string. If a character does
appear in the search string, the algorithm advances the search marker by the distance between that
character and the right end of the string. By step 18, the search is complete, on a total of 44
characters (including spaces). The number of comparisons per character is approximately 0.4—an
efficient search algorithm.
A second search heuristic makes use of repeating patterns in the search string, such as "hand to
hand, door to door", and attempts to match the first repeating word, according to the algorithm
described for skipping characters. When a match is made, the search string is advanced to the point
that the first occurrence of the repeated word is aligned with the first occurrence of the matching
term in the main text. A comparison is now made for the second occurrence of the search term in the
text. Obviously, the major gain in computational efficiency and performance is obtained by the first
heuristic, and the advantage of the second heuristic is dependent on the appearance of repeating
words in the search pattern.
In addition to running time, another major metric for characterizing search algorithms is the need for
backtracking. Some search algorithms are linear, working efficiently from the beginning of the
sequence to be searched to the end, whereas others move back and forth in the text to be searched
during processing. For example, the skipping search algorithm moves the index ahead by the number
Search WWH ::




Custom Search