Java Reference

In-Depth Information

the identity relation, that is, if two categories are equal. The lack of an

order relation makes it impossible to tell if one attribute category is

greater than another, or that one category is closer to another.

category

A distinct value of a categorical attribute. Also referred to

as a
class
.

category set
A named collection of related categories.

centroid
A cluster centroid is a vector that encodes, for each logical

attribute, either the mean (numerical attributes) or the mode (cate-

gorical attributes) of the cases in the build data assigned to a cluster.

classification
A supervised data mining technique that produces a

model capable of classifying cases into categories or assigning cases

to categories. A classification model requires a categorical target

attribute in the build dataset. One of the JDM mining functions.

cluster
A collection of cases that are similar to one another as deter-

mined by a clustering mining function. A cluster can be defined by

its centroid, or by an area determined by an attribute vector space—a

set of attribute value ranges (numerical) and attribute values (cate-

gorical). Predicate rules involving the cluster attributes are often

used to define clusters in a human-understandable way.

clustering
An unsupervised data mining technique that given a set

of cases, each having a set of attributes, and a similarity measure

among them, groups the cases into different clusters such that cases

in the same cluster are more similar to one another while cases in

different clusters are less similar to one another. One of the JM

mining functions.

confusion matrix
A table that counts of the actual versus predicted

class values. It indicates where the model correctly predicted

outcomes, and where it became
confused
or made mistakes.

consequent
In an association rule, the right-hand side is called the

consequent. For example, in the rule “If A, then B,” “B” is the conse-

quent. See also
antecedent
.

cost matrix

N
table that defines the cost

associated with incorrect predictions. A cost matrix is typically used

in classification models, where
N
is the number of distinct categories

in the target, and the columns (reflecting predicted categories) and

rows (reflecting actual categories) are labeled with target categories.

cross validation
A method of evaluating the accuracy of a classifica-

tion or regression model, typically used when there are relatively few

cases to divide between build and test datasets. In cross validation,

A two-dimensional,
N

Search WWH ::

Custom Search