Parallel Coordinates: Visualization, Exploration and Classiication of High-Dimensional Data - Data Visualization

Graphics Reference

In-Depth Information

scribed by ahypersurfacewhichonly includesthe points of S. In practical situations,

the strict enclosure requirement is dropped and some points of S may be omitted

(“false negatives”), while some points of P

S are allowed (“false positives”) in the

hypersurface. he description of such a hypersurface is equivalent to the rule for

identifying, within some acceptable error, the elements of S. Casting the problem

in a geometrical setting allows us to visualize how to approach the classification al-

gorithm. his entails:

. using an e cient “wrapping” (a convex-hull approximation) algorithm to en-

closethepointsof S inahypersurface S containing S,andingeneralsomepoints

of P

−

S too; so S

⊂

S ,

. the points in

S are isolated and the wrapping algorithm is applied to

enclose them, and usually also some points of S , producing a new hypersurface

S with S

(

−

)

. the points in S not included in S

(

−

)

S are then marked for input to the wrapping

algorithm, andanewhypersurface S isproducedcontaining thesepointsaswell

as some other points in P

−

S ,

. the process is repeated, alternately producing upper and lower containment

bounds for S; termination occurs when an error criterion (which can be user-

specified) is satisfied or when convergence is not achieved.

−(

−

)

,resultinginS

⊂(

−

)

It can and does happen that the process does not converge when P does not contain

su cient information to characterize S.ItmayalsohappenthatS is so “porous”

(sponge-like) that an inordinate number of iterations are required. On convergence,

say at step n,thedescriptionofS is provided as:

(

−

)(

−

)ċċċ(

S n −

−

S n

)

( . )

his is the terminating expression resulting from the algorithm that we call Nested

Cavities (abbr. NC).

he user can select a subset of the variables available and restrict the rule genera-

tion to these variables. In certain applications, for example as in process control, not

all of the variables can be controlled, and hence it is useful to have a rule that in-

volves only the accessible (i.e., controllable) variables. An important fringe benefit is

the useful ordering that emerges from dimensionality selection, which is completely

dataset-specific. With the algorithm being display-independent, there is no inherent

limitation on the size and number of variables in the dataset. Summarizing for NC,

an approximate convex-hull boundary is obtained for each cavity,

utilizingpropertiesoftherepresentationofmultidimensionalobjectsin

-coords,

a very low polynomial worst case complexity of O

(

)

is obtained for vari-

able number N and dataset size

; it is worth contrasting this with the oten

unknown, unstated, or very high (even exponential) complexity of other classi-

fiers,

an intriguing prospect, due to the low complexity, is that the rule can be derived

in near real-time, meaning that the classifier could be adapted to changing con-

ditions,

Data Visualization

Search WWH ::

Custom Search

Home