Graphics Reference
In-Depth Information
agglomerative ones start by considering each example as a cluster and perform-
ing an iterative merging of clusters until a criterion is satisfied. Partitioning based
clustering, with k-Means algorithms as the most representative, starts with a fixed
k number of clusters and iteratively adds or removes examples to and from them
until no improvement is achieved based on a minimization of intra and/or inter
cluster distance measure. As usual when distance measures are involved, numeric
data is preferable together with no-missing data and the absence of noise and out-
liers. Other well known examples of clustering algorithms are COBWEB and Self
Organizing Maps.
￿
Association Rules: they are a set of techniques that aim to find association rela-
tionships in the data. The typical application of these algorithms is the analysis
of retail transaction data [ 1 ]. For example, the analysis would aim to find the
likelihood that when a customer buys product X, she would also buy product Y.
Association rule algorithms can also be formulated to look for sequential patterns.
As a result of the data usually needed for association analysis is transaction data,
the data volumes are very large. Also, transactions are expressed by categorical
values, so the data must be discretized. Data transformation and reduction is often
needed to performhigh quality analysis in this DMproblem. The Apriori technique
is the most emblematic technique to address this problem.
1.3 Supervised Learning
In the DM community, prediction methods are commonly referred to as supervised
learning. Supervisedmethods are thought to attempt the discovery of the relationships
between input attributes (sometimes called variables or features) and a target attribute
(sometimes referred to as class). The relationship which is sought after is represented
in a structure called a model. Generally, a model describes and explains experiences,
which are hidden in the data, and which can be used in the prediction of the value
of the target attribute, when the values of the input attributes are known. Supervised
learning is present in many application domains, such as finance, medicine and
engineering.
In a typical supervised learning scenario, a training set is given and the objective
is to form a description that can be used to predict unseen examples. This training
set can be described in a variety of ways. The most common is to describe it by a set
of instances, which is basically a collection of tuples that may contain duplicates.
Each tuple is described by a vector of attribute values. Each attribute has an associate
domain of values which are known prior to the learning task. Attributes are typically
one of two types: nominal or categorical (whose values are members of an unordered
set), or numeric (values are integer or real number, and an order is assumed). The
nominal attributes have a finite cardinality, whereas numeric attributes domains are
delimitated by lower and upper bounds. The instance space (the set of possible
examples) is defined as a cartesian product of all the input attributes domains. The
 
Search WWH ::




Custom Search