Database Reference
In-Depth Information
B-tree algorithms can be used to evaluate normalized edit distance,
but not weighted edit distance.
8.5 Related Work
Edit distance based queries have been studied in the past in litera-
ture related to the approximate dictionary lookup problem. Dictionary
lookup simply returns a yes or no answer, if there exists a dictio-
nary string within edit distance θ from a query string. Minsky and
Papert [53] originally formulated the dictionary lookup for Hamming
distance. Yao and Yao [77] presented a solution for Hamming distance
θ = 1. Brodal and Gasieniec [14] presented an improved algorithm in
terms of space and query time. Manber and Wu [51] proposed an algo-
rithm for edit distance with threshold θ = 1. The algorithm generates
all possible strings within edit distance one from each string in the
dataset and indexes those strings using Bloom filters. The index can
then identify whether there exists a string within edit distance one
from any query string. An improved solution for edit distance one was
given by Brodal and Venkatesh [13]. Arslan and Egecioglu [7] pre-
sented a novel algorithm based on tries and arbitrary edit distance
thresholds.
The dictionary lookup problem is fundamentally easier than the
approximate dictionary match problem, and both are easier than the
approximate text match problem, which has received a lot of attention
in the past. Most solutions for the approximate text match problem can
be used to solve the approximate dictionary match problem. Neverthe-
less, due to its simplicity, the dictionary match problem is amenable to
certain optimizations and simplifications that result in faster indexes
and filtering techniques in practice. A survey of related literature on the
oine text matching problem was conducted by Chan et al. [16]. A sur-
vey of the online text matching problem was conducted by Navarro [54].
A comprehensive exposition of all algorithms for online approximate
string matching was conducted by Stephen [62].
The q -gram intersection filter for edit distance was first proposed
by Jokinen and Ukkonen [39]. In the same paper the authors gave the
first algorithm that used q -gram inverted lists to compute edit distance.
Search WWH ::




Custom Search