Parallel Coordinates: Visualization, Exploration and Classiication of High-Dimensional Data - Data Visualization

Graphics Reference

In-Depth Information

Figure . . he manufacturing dataset ater “cleanup,” which let about variables

measurements occur frequently, it may be worth adding an automated feature to the

sotware that detects and identifies the suspect variables.

his brief exposure is just an indication that large (in terms of dimensions - i.e.,

number of variables) datasets can still be usefully explored in

-coords.

A different example of EDA on a process control dataset is given in Inselberg

( ), where compound queries turned out to be very useful. his reminds us to

add, to the list of exploration guidelines, arguably the most important one:

test the assumptions, especially the “I am really sure of”s.

Classiication

14.3

Although itisfuntoundertake thiskindofexploration, the levelof skillandpatience

requiredtendstodiscouragesomeusers.Itisnotsurprisingthenthatthemostpersis-

tent requests and admonitions have been for tools which, at least partially, automate

the knowledge discovery process (Inselberg and Avidan, ).

Classification is a basic task in data analysis and pattern recognition, and an algo-

rithm that performsitisnamed Classifier (Quinlan, ;Fayad etal., ;Mitchell,

).heinput is a dataset P andadesignated subset S.heoutputisacharacteriza-

tion, aset of conditions or rules,to distinguish elements of S fromall other members

of P,the“global”dataset.heoutputmayalsobethatthereisinsu cientinformation

to provide the desired distinction.

With parallel coordinates, a dataset P with N variables is transformed into a set

of points in N-dimensional space. In this setting, the designated subset S can be de-

Search WWH ::

Custom Search

Home