Parallel Coordinates: Visualization, Exploration and Classiication of High-Dimensional Data - Data Visualization

Graphics Reference

In-Depth Information

. Vowel recognition data. he data collection process involves digital sampling

speechwithacousticsignalprocessing,followedbyrecognitionofthephonemes,

groups of phonemes and words. he goal here is a speaker-independent rule

based on ten variables of eleven vowels that occur in various words spoken (re-

corded and processed) by fiteen British male and female speakers. Deterding

Deterding( )collectedthisdatasetofvowels,whichcanbefoundintheCMU

benchmark repository in the WWW. here are entries for training and

for testing. hree other types of classifiers were also applied to this dataset: neu-

ral networks and k-NN by Robinson and Fallside ( ), and decision trees by

Shang and Breiman ( ).For the sake of variety, both versions of our classifier

were used and a somewhat different error test procedure was used. he results

are shown in Table . .

. A neural-pulse dataset. his has interesting and unusual features. here are two

classes of neurons, whoseoutputs to stimuli are to be distinguished. heyconsist

of different pulses measured in a monkey's brain (poor thing!). here are

samples with variables (the pulses). his dataset was given to me by a very

competent group (that of Prof. Coiffman, CS & Math. Depts. at Yale Univ.), who

hadbeenworkingonitbuthadbeenunabletoobtainaviablerulewiththeclassi-

fication methodstheyused.Remarkably, withNCconvergence isobtained based

on only nine of the parameters. he resulting ordering shows a striking sepa-

ration. In Fig. . , the first pair of variables x , x is plotted as originally given

on the let. On the right, the best pair x , x , as chosen by the classifier's order-

ing, speaks foritself.Bythe way,todiscover this finding manually would require

the construction of a scatterplot matrix with pairs, and then careful inspec-

tion andcomparison of theindividual plots.heimplementation providesallthe

next bestsections tocomplete the rule'svisualization. hedataset consists oftwo

“pretzel-like” clusters winding closely in -D, one (the complement in this case)

enclosing the other. Note that the classifier can actually describe highly complex

regions that carve the cavity shown. One can understand why the separation of

clusters by hyperplanes or nearest-neighbor techniques can fail badly on such

datasets. he rule has an error of %.

Table . . Summary of classification results for the vowel dataset

Rank

Classifier

Testing mode

Test error rate %

Nested Cavities (NC)

Cross-validation

.

CART-DB

Cross-validation

.

Nested Cavities (NC)

Train & Test

.

CART

Cross-validation

.

k - N N

T r a i n & T e s t

.

R B F

T r a i n & T e s t

.

Multilayer perceptron

Train & Test

.

Single-layer perceptron

Train & Test

.

Data Visualization

Search WWH ::

Custom Search

Home