Databases Reference
In-Depth Information
Equivalently, a matrix multiplication can be applied to the input data in order to
obtain the wavelet coefficients, where the matrix used depends on the given DWT. The
matrix must be orthonormal , meaning that the columns are unit vectors and are mutu-
ally orthogonal, so that the matrix inverse is just its transpose. Although we do not have
room to discuss it here, this property allows the reconstruction of the data from the
smooth and smooth-difference data sets. By factoring the matrix used into a product of
a few sparse matrices, the resulting “fast DWT” algorithm has a complexity of O
.
n
/
for
an input vector of length n .
Wavelet transforms can be applied to multidimensional data such as a data cube. This
is done by first applying the transform to the first dimension, then to the second, and so
on. The computational complexity involved is linear with respect to the number of cells
in the cube. Wavelet transforms give good results on sparse or skewed data and on data
with ordered attributes. Lossy compression by wavelets is reportedly better than JPEG
compression, the current commercial standard. Wavelet transforms have many real-
world applications, including the compression of fingerprint images, computer vision,
analysis of time-series data, and data cleaning.
3.4.3 Principal Components Analysis
In this subsection we provide an intuitive introduction to principal components analy-
sis as a method of dimesionality reduction. A detailed theoretical explanation is beyond
the scope of this topic. For additional references, please see the bibliographic notes
(Section 3.8) at the end of this chapter.
Suppose that the data to be reduced consist of tuples or data vectors described
by n attributes or dimensions. Principal components analysis ( PCA ; also called the
Karhunen-Loeve, or K-L, method) searches for k n -dimensional orthogonal vectors that
can best be used to represent the data, where k n . The original data are thus projected
onto a much smaller space, resulting in dimensionality reduction. Unlike attribute sub-
set selection (Section 3.4.4), which reduces the attribute set size by retaining a subset of
the initial set of attributes, PCA “combines” the essence of attributes by creating an alter-
native, smaller set of variables. The initial data can then be projected onto this smaller
set. PCA often reveals relationships that were not previously suspected and thereby
allows interpretations that would not ordinarily result.
The basic procedure is as follows:
1. The input data are normalized, so that each attribute falls within the same range. This
step helps ensure that attributes with large domains will not dominate attributes with
smaller domains.
2. PCA computes k orthonormal vectors that provide a basis for the normalized input
data. These are unit vectors that each point in a direction perpendicular to the others.
These vectors are referred to as the principal components . The input data are a linear
combination of the principal components.
3. The principal components are sorted in order of decreasing “significance” or
strength. The principal components essentially serve as a new set of axes for the data,
 
Search WWH ::




Custom Search