Database Reference
In-Depth Information
δ
is small, the dynamic programming algorithm is very ecient.
However, for some applications
If
δ
may need to be large. In this case, we
can speed-up the above computation using random sampling. Given two
sequences
, we compute two subsets
RA
and
RB
by sampling each
sequence. Then we use the dynamic programming algorithm to compute
the
LCSS
on
RA
and
RB
. We can show that, with high probability, the
result of the algorithm over the samples, is a good approximation of the
actual value. We describe this technique in detail in [40].
A
and
B
4.2.
Computing the Similarity Function S
2
We now consider the more complex similarity function
S
2. Here, given
two sequences
A, B
, and constants
δ, ε
, we have to find the translation
f
c
that maximizes the length of the longest common subsequence of
A, f
c
(
B
)
(
LCSS
δ,ε
(
)) over all possible translations.
A one dimensional translation
A, f
c
(
B
f
c
is a function that adds a constant to
all the elements of a 1-dimensional sequence:
f
c
(
x
1
,...,x
m
)=(
x
1
+
c,...,
x
m
+
).
Let the length of sequences
c
A
and
B
be
n
and
m
respectively. Let us
also assume that the translation
f
c
1
is the translation that, when applied
to
,and
it is also the translation that maximizes the length of the longest common
subsequence:
B
, gives a longest common subsequence
LCSS
δ,ε
(
A, f
c
1
(
B
)) =
a
LCSS
δ,ε
(
A, f
c
1
(
B
)) = max
c∈R
LCSS
δ,ε
(
A, f
c
(
B
))
.
The key observation is that, although there is an infinite number of
translations that we can apply to
B
f
c
results to a longest
, each translation
common subsequence between
), and there is a finite set of
possible longest common subsequences. In this section we show that we can
eciently enumerate a finite set of translations, such that this set provably
includes a translation that maximizes the length of the longest common
subsequence of
A
and
A
and
f
c
(
B
f
c
(
B
).
c
, applied to
A translation by
B
can be thought of as a linear transfor-
c
. Such a transformation will allow
mation of the form
f
(
b
i
)=
b
i
+
b
i
to
be matched to all
.
It is instructive to view this as a stabbing problem: Consider the
a
j
for which
|i − j| <δ
,and
a
j
− ε ≤ f
(
b
i
)
≤ a
j
+
ε
O
(
δ
(
n
+
m
)) vertical line segments ((
b
i
,a
j
−ε
)
,
(
b
i
,a
j
+
ε
)), where
|i−j| <δ
(Figure 10).
These line segments are on a two dimensional plane, where on the
x
axis
we put elements of
B
andonthe
y
axis we put elements of
A
. For every