MEDIAN STRINGS: A REVIEW - Data Mining in Time Series Databases

Database Reference

In-Depth Information

n N )in

of length

O

(

n

), both the time and space complexity amounts to

O

(

this case.

Despite of its mathematical elegance the exact algorithm above is

impractical because of the exponential complexity. There have been efforts

to shorten the computation time using heuristics or domain-specific knowl-

edge. Such an approach from [24] assumes that the string of

S

be quite

similar. Under reasonable constraints on the cost function (

c

(

a → ε

)=

c

(

ε → a

)=1and

c

(

a → b

) nonnegative), the generalized median string

p

E S (¯

p

≤ k

k

being a small number. In this case the opti-

mal dynamic programming path must be close to the main diagonal in

the distance table. Therefore only part of the

¯

satisfies

)

with

-dimensional table needs

to be considered. Details of the restricted search can be found in [24]. Its

asymptotic time complexity is

n

nk N ). While this remains exponential,

O

(

k

is

typically much smaller than

, resulting in a substantial speedup compared

to the full search of the original algorithm [21].

We may also use any domain-specific knowledge to limit the search

space. An example is the approach in the context of classifier combination

for handwritten sentence recognition [25]. An ensemble of classifiers provide

multiple classification results of a scanned text. Then, the consensus string

is expected to yield the best overall recognition performance. The input

strings from the individual classifiers are associated with additional infor-

mation of position, i.e. the location of each individual word in a sequence of

handwritten words. Obviously, it is very unlikely that a word at the begin-

ning of a sequence corresponds to a word at the end of another sequence.

More generally, only words at a similar position in the text image are mean-

ingful candidates for being matched to each other. The work [25] makes use

of this observation to exclude a large portion of the full

n

N

-dimensional

search space from consideration.

5.2. Approximate Algorithms

Because of the NP-hardness of generalized median string computation,

efforts have been undertaken to develop approximate approaches which

provide suboptimal solutions in reasonable time. In this section we will

discuss several algorithms of this class.

5.2.1. Greedy Algorithms

The following algorithm was proposed by Casacuberta and de Antoni

[4]. Starting from an empty string, a greedy algorithm constructs an

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home