Database Reference
In-Depth Information
on the portions that are similar and we are willing to pay less attention to
regions of great dissimilarity.
Non-metric distances are used nowadays in many domains, such as string
(DNA) matching, collaborative filtering (where customers are matched
with stored 'prototypical' customers) and retrieval of similar images from
databases. Furthermore, psychology research suggests that human similar-
ity judgments are also non-metric.
For this kind of data we need distance functions that can address the
following issues:
Different Sampling Rates or different speeds. The time-series that
we obtain, are not guaranteed to be the outcome of sampling at fixed
time intervals. The sensors collecting the data may fail for some period
of time, leading to inconsistent sampling rates. Moreover, two time series
moving at exactly the similar way, but one moving at twice the speed of
the other will result (most probably) to a very large Euclidean distance.
Outliers. Noise might be introduced due to anomaly in the sensor col-
lecting the data or can be attributed to human 'failure' (e.g. jerky move-
ment during a tracking process). In this case the Euclidean distance will
completely fail and result to very large distance, even though this differ-
ence may be found in only a few points.
Different lengths. Euclidean distance deals with time-series of equal
length. In the case of different lengths we have to decide whether to
truncate the longer series, or pad with zeros the shorter etc. In general
its use gets complicated and the distance notion more vague.
Eciency. The similarity model has to be suciently complex to express
the user's notion of similarity, yet simple enough to allow ecient com-
putation of the similarity.
To cope with these challenges we use the Longest Common Subsequence
( LCSS )model.The LCSS is a variation of the edit distance [30,37]. The
basic idea is to match two sequences by allowing them to stretch, without
rearranging the sequence of the elements but allowing some elements to be
unmatched .
A simple extension of the LCSS model is not sucient, because (for
example) this model cannot deal with parallel movements. Therefore, we
extend it in order to address similar problems. So, in our similarity model
we consider a set of translations and we find the translation that yields the
optimal solution to the LCSS problem.
Search WWH ::




Custom Search