Glossary - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

multi-record case A representation of physical data that uses

multiple records to store a single case. The data typically has three

columns with roles of sequence id, attribute name, and value.

multi-target model A type of supervised model that can predict

multiple targets, both categorical (classification) and numerical

(regression). A multi-target model may be more efficient at repres-

enting the knowledge extracted during model building, and more

efficient to compute.

normalization A transformation that maps numerical values to a

particular numerical range, typically 0 … 1. There are several types

of normalization (e.g., z-score, min-max, and shift-scale).

numerical attribute An attribute whose values are numbers. The

numeric value can be either an integer or a real number. See also

categorical attribute and ordinal attribute .

OLAP Online Analytical Processing.

ordinal attribute An ordinal attribute is similar to a categorical

attribute except that there is an order defined on the discrete categor-

ical values, for example, temperature where the discrete values are

high, medium, and low. There is an order defined on the values:

high > medium > low.

Ordinal attributes define a total order relation on the categories.

For example, if x, y, and z are ranked, 5, 6, and 7, we can tell x < y < z ,

but not if ( z

x ).

Consider the ordinal attribute speed that takes the following

ranked categories: STATIONARY, SLOW, FAST, VERY FAST, where

rank (STATIONARY)

y ) < ( y

1, rank (SLOW)

2, rank (FAST)

3, and

rank (VERY FAST)

4. We can tell that SLOW represents a smaller

speed value than FAST. However, it is not possible to tell if, for exam-

ple, the difference between two adjacent values is the same or not: is

the difference between SLOW and FAST equal to, smaller or greater

than the difference between FAST and VERY FAST.

outlier A data value that does not (or is not thought to have) come

from the typical population of data. Outliers are values that fall

outside the boundaries that enclose most other values in the data.

This can apply to values of an attribute, or of entire cases.

outlier treatment The approach to replacing outliers in numerical

data attributes. There are several techniques including specifying

explicit boundaries, percentages in the tails of the distribution, and

number of standard deviations, such that values outside the valid

range are replaced either by null values or edge values.

Java Data Mining: Strategy, Standard, and Practice

Search WWH ::

Custom Search

Home