Graphics Reference
In-Depth Information
A Note on Transformations
2.2.4
Whenanalyzing dataitiscommontotransform variables toexamine themondiffer-
ent scales. Transformation plays a useful role prior to PP as well. he most common
transformation before PP is to “sphere” the data. Sphering the data means to com-
pute the principal components of the data and to use the resulting variables instead
of the original variables. he major reason to do this is that we are not interested
in covariance structure. his is adequately captured by PCA. Consequently we com-
monly remove the covariance from the data before running PP and search for other
typesofstructureinthedata. InFig. . thelabelsofthevariables insomeoftheplots
PC ,PC ,...reflectthatthedatawerespheredpriortorunningthePPguidedtour.
Sometimes transformations areperformedtorelievethe data of outliers orskewness.
Whenthese occurinsingle variables, theycanbedetectedandaddressedbeforerun-
ning PP,but PPis useful fordetecting multivariate outliers and nonlinear dependen-
cies in high-dimensional data.
ANoteonScaling
2.2.5
Plots of data are generally constructed by scaling the data using the minimum and
maximum data values to fit the data into a plotting space, on a computer screen win-
dow, or sheet of paper. Axes are provided so the viewer can convert the points into
the original scales.
Forhigh-dimensionaldataeachvariableisscaledtoauniformscaleusingthemin-
imum and maximum, packing the data into a p-dimensional hyperrectangle. hese
scaled data are projected into a plotting space. It might interesting to think about
scaling the data ateraprojection iscomputed,butthe effect of this approachis adis-
continuity from one projection frame to the next. It would be like watching a movie
where the camera lens constantly zooms and pans.
hePPguidedtouroperatesontheunscaleddatavalues.(Itmayalsobeimportant
to transform the data by standardizing variables or sphering before running PP, as
discussedinthepreviousparagraph.)heprocessofscalingdataintoaplotting space
is called the data pipeline and is discussed in detail in Buja et al. ( ), Sutherland
et al. ( ), and in a different sense, in Wilkinson ( )and Pastizzo et al. ( ).
Using Tours with Numerical Methods
2.3
Tours are useful when used along with numerical methods for certain data analyses,
such as dimension reduction and supervised and unsupervised classification. We'll
demonstrate with an example from supervised classification.
In supervised classification we seek to find a rule for predicting the class of new
observations based on training a classifier using known classes. here are many nu-
merical methods that tackle this problem.
Search WWH ::




Custom Search