Database Reference
In-Depth Information
Table 8.3. Edit distance lower bound estima-
tion between a string and a string interval for
dictionary order.
f
u
n
d
0
1
2
3
f
1
0
1
o
2
1
1
2
{ 'o', ... , 'u' }
3
1
2
3
letter set { 'o', ... ,'u' } to represent the string interval. A query letter
will match the third row if and only if that letter is contained in the
respective set of letters. Since the given interval provides information
only on the three first characters of the strings within the interval, the
algorithm stops on the third row and estimates the lower bound on the
edit distance as the smallest value on that row (assuming a best case
scenario where a string exists within the interval matching the sux of
the query exactly). In this case the algorithm returns 1 as the estimated
lower bound between the given query string and the string interval. We
skip the details of the algorithm which can be easily implemented by
modifying Algorithm 2.1.1.
Similarly, we can compute a lower bound edit distance between two
string intervals [ s l ,s u ] and [ r l ,r u ], by combining the prefixes of the
boundary strings from these two intervals.
Notice that it is impossible to compute φ d ( s ) for a given string s ,
given that there is an infinite number of strings in the dictionary order
between any two string. As already mentioned it is not necessary to
instantiate the mapping, since we can eciently verify in linear time the
order of two strings. In practice we store the actual keys inside each B-
tree node instead of the mapping, which of course increases the storage
requirements of this structure (as is usually the case for string B-trees).
8.3.2.2
Gram counting order
The dictionary order collects useful information only on the pre-
fixes of the strings. In many cases, unfortunately, the discriminative
information of the strings is scattered in different positions. This moti-
vates the use of q -grams instead of prefixes to summarize the string
set. In this section, we describe a string order based on counting the
 
Search WWH ::




Custom Search