Database Reference
In-Depth Information
studied by Amir et al. [2] and Lee et al. [42]. More recently, Cormode
and Muthukrishnan [26] introduced a variation of edit distance allow-
ing sub-strings to be moved with smaller cost. Chaudhuri et al. [18]
introduced a weighted edit distance for sets of tokens, based on the
idf weights of tokens. Approximating edit distance has also been well
studied in the literature of computer algorithms. Bar-Yossef et al. [9]
showed that an O ( n 3 / 7 ) approximation of edit distance can be obtained
in linear time. Ostrovsky and Rabani [56] proved that edit distance can
be probabilistically well preserved by Hamming distance after project-
ing the strings onto randomly selected subspaces. Andoni and Onak [3]
extended this idea by reducing both the space and time complexities.
The problem of approximate string matching using set based similar-
ity functions has received wide attention from the information retrieval
and core database communities. A detailed analysis and history of set
based similarity in information retrieval was conducted by Baeza-Yates
and Ribeiro-Neto [8].
Search WWH ::




Custom Search