Java Reference
In-Depth Information
multi-record case A representation of physical data that uses
multiple records to store a single case. The data typically has three
columns with roles of sequence id, attribute name, and value.
multi-target model A type of supervised model that can predict
multiple targets, both categorical (classification) and numerical
(regression). A multi-target model may be more efficient at repres-
enting the knowledge extracted during model building, and more
efficient to compute.
normalization A transformation that maps numerical values to a
particular numerical range, typically 0 … 1. There are several types
of normalization (e.g., z-score, min-max, and shift-scale).
numerical attribute An attribute whose values are numbers. The
numeric value can be either an integer or a real number. See also
categorical attribute and ordinal attribute .
OLAP Online Analytical Processing.
ordinal attribute An ordinal attribute is similar to a categorical
attribute except that there is an order defined on the discrete categor-
ical values, for example, temperature where the discrete values are
high, medium, and low. There is an order defined on the values:
high > medium > low.
Ordinal attributes define a total order relation on the categories.
For example, if x, y, and z are ranked, 5, 6, and 7, we can tell x < y < z ,
but not if ( z
x ).
Consider the ordinal attribute speed that takes the following
ranked categories: STATIONARY, SLOW, FAST, VERY FAST, where
rank (STATIONARY)
y ) < ( y
1, rank (SLOW)
2, rank (FAST)
3, and
rank (VERY FAST)
4. We can tell that SLOW represents a smaller
speed value than FAST. However, it is not possible to tell if, for exam-
ple, the difference between two adjacent values is the same or not: is
the difference between SLOW and FAST equal to, smaller or greater
than the difference between FAST and VERY FAST.
outlier A data value that does not (or is not thought to have) come
from the typical population of data. Outliers are values that fall
outside the boundaries that enclose most other values in the data.
This can apply to values of an attribute, or of entire cases.
outlier treatment The approach to replacing outliers in numerical
data attributes. There are several techniques including specifying
explicit boundaries, percentages in the tails of the distribution, and
number of standard deviations, such that values outside the valid
range are replaced either by null values or edge values.
Search WWH ::




Custom Search