Graphics Reference
In-Depth Information
quantitative multivariate observations. he method is conceptually founded on the
singular value decomposition approach proposed by Eckart and Young ( ) for
approximating a matrix with another one of lower rank.
Let us denote by X the generic n
p data matrix, where the general term x ij
(
indicates thevalueassumedbythe ithstatistical unitforthe
jth variable. Froma geometric point of view, rowsand columns of X representpoints
inthe R p and inthe R n space,respectively.Withoutlossofgenerality, weassumethat
columns of data matrix X have been standardized in matrix Y having the same order
of X .
PCA allows one to find a q-dimensional subspace, where
i
=
,...,n; j
=
...,p
)
holds position,
such that the distances among the n row vectors were approximated in this subspace
in a satisfactory manner. PCA problems allow for more than one formulation. he
following section offers only a brief overview of the geometric formulation of PCA
and mainly reflects the French approach to data analysis.
Readers interested in the PCA formulation problem are referred to, e.g., Hardle
and Simar ( ); Jolliffe ( ); Lebart et al. ( , ) and Mardia et al. ( ).
(
q
l
p
)
Principal Component Analysis
4.3.1
PCA was originally proposed by Hotelling ( ) as a method for determining the
major axes of an ellipsoid derived from a multivariate normal distribution. Although
acommoninterpretation ofPCAasonespecifictypeoffactoranalysis iswidespread,
data analysts have a different view of the method. hey use PCA as a technique for
describing a dataset without imposing any assumption about distribution or with-
out starting from an underlying statistical model (Benzécri, ; Lebart et al., ,
). In this framework the point of view is then geometrically oriented and PCA
aims to identify a subspace through the optimization of a given algebraic criterion.
Among the potential criteria useful for fitting a set of n points to a subspace, the
classical least squares is undoubtedly the most widespread method.
he problem, in a nutshell, consists in determining the unknown p
p matrix
U
=
u , u ,..., u p
that indicates the maximum variance directions. he vector
ψ j
=
,...,p). he unknown matrix U is determined solving the following eigenanalysis
problem:
=
Yu j represents the coordinates of the n row points over the axis u j (
j
U Λ U
( . )
Y
Y
=
Notice that the previous equation can be alternatively expressed as
U Λ ,
( . )
Y
YU
=
where Λ is a square diagonal matrix of order p having as general element λ j ,and U
must satisfy the constraint U
I . It is straightforward to say that λ j and u j are
respectively the generic eigenvalue and eigenvector of the square symmetric matrix
Y
U
=
Y .Noticethat
λ , λ ,...,λ p
are ranked in decreasing order and U defines an
orthonormal basis.
Search WWH ::




Custom Search