Graphics Reference
In-Depth Information
Figure . . he manufacturing dataset ater “cleanup,” which let about variables
measurements occur frequently, it may be worth adding an automated feature to the
sotware that detects and identifies the suspect variables.
his brief exposure is just an indication that large (in terms of dimensions - i.e.,
number of variables) datasets can still be usefully explored in
-coords.
A different example of EDA on a process control dataset is given in Inselberg
( ), where compound queries turned out to be very useful. his reminds us to
add, to the list of exploration guidelines, arguably the most important one:
test the assumptions, especially the “I am really sure of”s.
Classiication
14.3
Although itisfuntoundertake thiskindofexploration, the levelof skillandpatience
requiredtendstodiscouragesomeusers.Itisnotsurprisingthenthatthemostpersis-
tent requests and admonitions have been for tools which, at least partially, automate
the knowledge discovery process (Inselberg and Avidan, ).
Classification is a basic task in data analysis and pattern recognition, and an algo-
rithm that performsitisnamed Classifier (Quinlan, ;Fayad etal., ;Mitchell,
).heinput is a dataset P andadesignated subset S.heoutputisacharacteriza-
tion, aset of conditions or rules,to distinguish elements of S fromall other members
of P,the“global”dataset.heoutputmayalsobethatthereisinsu cientinformation
to provide the desired distinction.
With parallel coordinates, a dataset P with N variables is transformed into a set
of points in N-dimensional space. In this setting, the designated subset S can be de-
Search WWH ::




Custom Search