Java Reference
In-Depth Information
DVD
Noise Canceling Headset”) with support and confidence.
Since JDM 1.0, users can filter the rules according to various criteria,
as discussed in Chapter 7. However, an interface to enable applying
the model was lacking.
JDM 2.0 apply for association uses a rule filter to constrain the set
of rules considered, if desired. The apply dataset consists of cases,
each of which contains items for matching with the antecedent of fil-
tered rules. Antecedent matching can be based on exact match , where
the case items must match the rule antecedent exactly; subset, where
the case items can be a subset of the items in the rule antecedent; or
superset, where the case items can be a superset of the items in the
rule antecedent.
The result from association apply is one or more consequent
items, depending on the selection criteria specified. This criteria can
include the top item with the highest support, confidence, or lift, or
the top n items.
18.4
Feature Extraction
JDM 2.0 introduces the mining function feature extraction as an
attribute reduction technique. In contrast to attribute importance,
which ranks each individual attribute so that some top n can be
selected, or some bottom n can be eliminated, feature extraction actu-
ally creates new features , or attributes, as linear combinations of exist-
ing attributes. This smaller set of attributes can result in deeper
understanding of the original attributes, but can also improve model
quality by presenting algorithms with fewer attributes that have
richer content. In a sense, feature extraction can project a dataset with
high dimensionality (many attributes) onto a smaller number of
dimensions (e.g., two or three dimensions can enable effective visual-
ization of complex data).
Feature extraction is particularly useful in domains in which there
are many attributes, perhaps hundreds or thousands, and each
attribute on its own has weak, even ambiguous, predictability. But
when taken in combination, these weak predictor attributes produce
meaningful patterns, topics, or themes. This occurs in text mining as
well as life science data such as genomics. Other areas of application
include data compression, data decomposition, and projection and
pattern recognition.
Consider an example from text mining. To classify documents into
categories, we first parse the text to extract important terms or words
Search WWH ::




Custom Search