Java Reference
In-Depth Information
Features
A
B
C
Age
Income
HHSize
OwnHome
Employed
0.8
0.4
0.2
0.8
0.5
0.7
0.5
0.2
0.1
0.3
0.9
0.8
0.9
0.5
0.9
Figure 18-6
Heat map example.
using a separate text analysis tool. Once the terms, or words, are
extracted, each word considered individually is often ambiguous or
indeterminate in explaining what the document is about. For exam-
ple, if we look at the word “hike,” this word can be applied to the
outdoors or interest rates. If we see “hike” in combination with
“mountain,” we may categorize the document with “outdoor
sports.” However, if we see “hike” in combination with “interest,”
we may categorize the document with “interest rates.” These phrases
could be provided directly to a classification mining algorithm. The
feature extraction mining function produces a new attribute, or fea-
ture , as a linear combination of the original attributes, such as
feature-A
0.8* 'hike'
0.7* 'interest'
0.01* 'mountain'
...
An interesting visualization of feature extraction results involves
displaying the coefficients in what is called a heat map . A feature heat
map is a matrix, typically with the attributes represented along rows
and the features as columns. An entry in the matrix is colored relative
to the magnitude of the coefficient (e.g., values closer to 1 are colored
red, values closer to 0 are colored blue, with color gradations in
between). Figure 18-6 illustrates a heat map with three features for
five attributes, where values below 0.3 are shaded black, values
between 0.3 and 0.6 are shaded medium gray, and values above 0.6
are shaded light gray. This provides a quick visual assessment of
feature-attribute relationships. Columns may be sorted by value to
group a feature's related attributes more effectively.
0.2* 'word-x'
18.5
Statistics
Even before data mining techniques were available, a first step
toward data analysis was gathering statistics from the data. Data
analysis could compute the average, standard deviation, or variance,
Search WWH ::




Custom Search