Preview of Java Data Mining 2.0 - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Features

A

B

C

Age

Income

HHSize

OwnHome

Employed

0.8

0.4

0.2

0.8

0.5

0.7

0.5

0.2

0.1

0.3

0.9

0.8

0.9

0.5

0.9

Figure 18-6

Heat map example.

using a separate text analysis tool. Once the terms, or words, are

extracted, each word considered individually is often ambiguous or

indeterminate in explaining what the document is about. For exam-

ple, if we look at the word “hike,” this word can be applied to the

outdoors or interest rates. If we see “hike” in combination with

“mountain,” we may categorize the document with “outdoor

sports.” However, if we see “hike” in combination with “interest,”

we may categorize the document with “interest rates.” These phrases

could be provided directly to a classification mining algorithm. The

feature extraction mining function produces a new attribute, or fea-

ture , as a linear combination of the original attributes, such as

feature-A

0.8* 'hike'

0.7* 'interest'

0.01* 'mountain'

...

An interesting visualization of feature extraction results involves

displaying the coefficients in what is called a heat map . A feature heat

map is a matrix, typically with the attributes represented along rows

and the features as columns. An entry in the matrix is colored relative

to the magnitude of the coefficient (e.g., values closer to 1 are colored

red, values closer to 0 are colored blue, with color gradations in

between). Figure 18-6 illustrates a heat map with three features for

five attributes, where values below 0.3 are shaded black, values

between 0.3 and 0.6 are shaded medium gray, and values above 0.6

are shaded light gray. This provides a quick visual assessment of

feature-attribute relationships. Columns may be sorted by value to

group a feature's related attributes more effectively.

0.2* 'word-x'

18.5

Statistics

Even before data mining techniques were available, a first step

toward data analysis was gathering statistics from the data. Data

analysis could compute the average, standard deviation, or variance,

Java Data Mining: Strategy, Standard, and Practice

Search WWH ::

Custom Search

Home