Database Reference
In-Depth Information
8
Algorithms for Edit Based Similarity
In this section we will cover evaluation of edit based similarity. We will
show that the techniques introduced for set based similarity functions
can be applied to answer unweighted edit distance queries. We will also
present data structures based on tries and B-trees that can be used to
answer unweighted, weighted, and normalized edit distance.
8.1
Inverted Indexes
From the outset it does not appear that edit based similarity is related
to set based similarity, and by extension, to token based inverted
indexes. Although, careful observation reveals that this is not true for
the case of simple, unweighted edit distance (i.e., where all edit opera-
tions have unit cost).
Lemma 8.1(
-gram Intersection Filter for Edit Distance). Let
Λ be the universe of q -grams. Given sequences s = λ 1 ···
q
λ s m and r =
λ 1 ···
λ n , λ i j
Λ, and edit distance threshold θ , then
E ( s, r ) ≤ θ ⇒I ( s, r ) max( m, n ) − q +1 − qθ.
368
 
Search WWH ::




Custom Search