the identity relation, that is, if two categories are equal. The lack of an
order relation makes it impossible to tell if one attribute category is
greater than another, or that one category is closer to another.
A distinct value of a categorical attribute. Also referred to
as a class .
category set A named collection of related categories.
centroid A cluster centroid is a vector that encodes, for each logical
attribute, either the mean (numerical attributes) or the mode (cate-
gorical attributes) of the cases in the build data assigned to a cluster.
classification A supervised data mining technique that produces a
model capable of classifying cases into categories or assigning cases
to categories. A classification model requires a categorical target
attribute in the build dataset. One of the JDM mining functions.
cluster A collection of cases that are similar to one another as deter-
mined by a clustering mining function. A cluster can be defined by
its centroid, or by an area determined by an attribute vector space—a
set of attribute value ranges (numerical) and attribute values (cate-
gorical). Predicate rules involving the cluster attributes are often
used to define clusters in a human-understandable way.
clustering An unsupervised data mining technique that given a set
of cases, each having a set of attributes, and a similarity measure
among them, groups the cases into different clusters such that cases
in the same cluster are more similar to one another while cases in
different clusters are less similar to one another. One of the JM
confusion matrix A table that counts of the actual versus predicted
class values. It indicates where the model correctly predicted
outcomes, and where it became confused or made mistakes.
consequent In an association rule, the right-hand side is called the
consequent. For example, in the rule “If A, then B,” “B” is the conse-
quent. See also antecedent .
N table that defines the cost
associated with incorrect predictions. A cost matrix is typically used
in classification models, where N is the number of distinct categories
in the target, and the columns (reflecting predicted categories) and
rows (reflecting actual categories) are labeled with target categories.
cross validation A method of evaluating the accuracy of a classifica-
tion or regression model, typically used when there are relatively few
cases to divide between build and test datasets. In cross validation,
A two-dimensional, N