Database Reference
In-Depth Information
distance equal to E ( s ( N ) ,v i )= E ( s ( N ) ,v i− 1 ). This action
corresponds to a diagonal move down the matrix for a match-
ing character (and value d 3 in Algorithm 2.1.1). An addi-
tional sub-action here is that the algorithm needs to consider
all new strings contained in the trie that have lengths in
the interval [
( s ( N ) ,v i ))]. This
action corresponds to inserting at the end of string s ( N ),
θ
s ( N )
s ( N )
|
|
+1 ,
|
|
+( θ
−E
( s ( N ) ,v i ) arbitrary characters — all these strings are
still within edit distance at most θ from v i , given that
E ( s ( N ) ,v i ) ≤ θ .
(3) It considers all child nodes N of N , s.t. σ ( N )
−E
= σ i . These
nodes are added in the active set A i if and only if
E
( s ( N ) ,v i− 1 ) . This action corresponds to a diagonal
move down the matrix for a non-matching character (and
value d 3 in Algorithm 2.1.1). In this case, the algorithm
does not have to consider strings with lengths in the interval
[
( s ( N ) ,v i ))]. The reason is that
all these nodes have been added already by a matching parent
node of N , through step 2 above, if necessary.
After all active nodes in A i− 1 have been processed the algorithm
discards duplicate nodes in A i by preserving only the copy with the
minimum edit distance from v i (this corresponds to the minimum oper-
ation in Algorithm 2.1.1). This step is necessary since an active node
can be processed multiple times as a child of another active node. For
that purpose, we can organize the sets of active nodes as hash tables.
Initially, table A i is empty. As we process nodes in table A i− 1 we probe
set A i , and if the node exists, we update its edit distance if needed, oth-
erwise we insert the node. Notice that the order in which we process
nodes from table A i− 1 is not important. We simply iterate through all
entries in the hash table in hash value order. The full algorithm appears
as Algorithm 8.3.1.
Notice that Algorithm 8.3.1 can be used to compute all answers to
a query string incrementally, as characters are typed by the user one
by one. Every time the user types a new character, we use the already
computed set A |v| to compute a new set A |v| +1 , by continuing from
where the algorithm left off.
s ( N )
s ( N )
|
|
+1 ,
|
|
+( θ
−E
Search WWH ::




Custom Search