Grand Tours, Projection Pursuit Guided Tours, and Manual Controls - Data Visualization

Graphics Reference

In-Depth Information

A Note on Transformations

2.2.4

Whenanalyzing dataitiscommontotransform variables toexamine themondiffer-

ent scales. Transformation plays a useful role prior to PP as well. he most common

transformation before PP is to “sphere” the data. Sphering the data means to com-

pute the principal components of the data and to use the resulting variables instead

of the original variables. he major reason to do this is that we are not interested

in covariance structure. his is adequately captured by PCA. Consequently we com-

monly remove the covariance from the data before running PP and search for other

typesofstructureinthedata. InFig. . thelabelsofthevariables insomeoftheplots

PC ,PC ,...reflectthatthedatawerespheredpriortorunningthePPguidedtour.

Sometimes transformations areperformedtorelievethe data of outliers orskewness.

Whenthese occurinsingle variables, theycanbedetectedandaddressedbeforerun-

ning PP,but PPis useful fordetecting multivariate outliers and nonlinear dependen-

cies in high-dimensional data.

ANoteonScaling

2.2.5

Plots of data are generally constructed by scaling the data using the minimum and

maximum data values to fit the data into a plotting space, on a computer screen win-

dow, or sheet of paper. Axes are provided so the viewer can convert the points into

the original scales.

Forhigh-dimensionaldataeachvariableisscaledtoauniformscaleusingthemin-

imum and maximum, packing the data into a p-dimensional hyperrectangle. hese

scaled data are projected into a plotting space. It might interesting to think about

scaling the data ateraprojection iscomputed,butthe effect of this approachis adis-

continuity from one projection frame to the next. It would be like watching a movie

where the camera lens constantly zooms and pans.

hePPguidedtouroperatesontheunscaleddatavalues.(Itmayalsobeimportant

to transform the data by standardizing variables or sphering before running PP, as

discussedinthepreviousparagraph.)heprocessofscalingdataintoaplotting space

is called the data pipeline and is discussed in detail in Buja et al. ( ), Sutherland

et al. ( ), and in a different sense, in Wilkinson ( )and Pastizzo et al. ( ).

Using Tours with Numerical Methods

2.3

Tours are useful when used along with numerical methods for certain data analyses,

such as dimension reduction and supervised and unsupervised classification. We'll

demonstrate with an example from supervised classification.

In supervised classification we seek to find a rule for predicting the class of new

observations based on training a classifier using known classes. here are many nu-

merical methods that tackle this problem.

Search WWH ::

Custom Search

Home