Providing Case-Based Retrieval as a Decision Support Strategy in Time Dependent Medical Domains - Computational Intelligence in Healthcare: Advanced Methodologies

Information Technology Reference

In-Depth Information

One obvious solution is thus to reduce the time series dimensionality, adopting

a transform that preserves the distance between two time series (or underesti-

mates it: in this case a post-processing step will be required, to filter out the

so-called “false alarms”; the requirement is never to overestimate the distance, so

that no “false dismissals” can exist [17]). Widely used transforms are the Discrete

Fourier Transform (DFT) [2], and the Discrete Wavelet Transform (DWT) [10].

DFT maps time series to the frequency domain. DFT application for dimen-

sionality reduction stems from the observation that, for the majority of real-world

time series, the first (1-3) Fourier coecients carry the most meaningful infor-

mation, and the remaining ones can be safely discarded. Moreover, Parseval's

theorem [38] guarantees that the distance in the frequency domain is the same

as in the time domain, when resorting to any similarity measure that can be ex-

pressed as the Euclidean distance between feature vectors in the feature space.

In particular, resorting only to the first Fourier coecients can underestimate

the real distance, but never overestimates it. The definition of similarity can also

be extended with invariance under a group of transformations, like amplitude

scaling and shift (see [15,41]).

In DFT, comparison is based on low frequency components, where most en-

ergy is presumed to be concentrated on. DFT is good for detecting irregularities

in signals, but it looses the temporal aspects, i.e., when in time the irregularities

occur. An interesting way of detecting these temporal attributes is to transform

the time series into time-frequencies, i.e. to retain both temporal and frequency

aspects of the signals. This can be reached exploiting DWT, since wavelets,

which are basis functions used to represent other functions, are local both in

time and in frequency. DWT can be repeatedly applied to the data: each ap-

plication brings out a higher resolution of the data, while at the same time it

smoothes the remaining data. The output of the DWT consists of the remain-

ing smooth components and of all the accumulated detail components. DWT,

like any orthonormal transform, preserves the Euclidean distance as the DFT

does. Moreover, Chan & Fu [10] have obtained through experiments that the

application of the Haar wavelet outperforms DFT in similarity-based time series

retrieval. The number of wavelet coecients to be kept, although lower than

the original data dimensionality, is higher than in the case of DFT application.

Since R-trees and variants perform well with 5 dimensions at most [7], other in-

dex structures, such as the X-tree [7], and the TV-tree [50], have to be adopted

if resorting to DWT.

A different approach to dimensionality reduction is Piecewise Constant Ap-

proximation (PCA) (see e.g. [22,23]): it consists in dividing a time series into

k segments, and in using their average values as a k -dimensional feature vector

(where obviously k<<n , the original data dimensionality). The best value of k

can also be estimated. PCA is robust to various transformations, such as scaling

and offset translation, and its calculation is claimed to be fast.

The choice of the most cost-effective transformation to apply should be done

on the basis of the application at hand. Examples in the medical domain will be

illustrated in the following subsection.

Computational Intelligence in Healthcare: Advanced Methodologies

Search WWH ::

Custom Search

Home