Glossary - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

test The data mining operation that determines the accuracy of a

model. This is typically performed by using held-aside (test) data

identical in form to the build data, scoring that test data, and com-

paring the actual target value with the predicted target value. Testing

is only applicable for supervised models. In JDM, test is performed

using a test task .

test data The input data used for testing a model.

test task A task that when executed produces test results for super-

vised models.

text mining A data mining technique for extracting patterns and

insights out of unstructured, text data. Text mining goes beyond the

notion of search in that previously unknown information can be

discovered through the use of data mining algorithms.

time series A data mining technique that supports the analysis of

time series data. A series of values X(t) are recorded according to

some function of time and are thus ordered by an index describing

the time (t) at which the values were recorded.

training The step in the model building process that produces a

possibly nonoptimized form of the model. For example, a tree algo-

rithm may produce a full tree during training, but may require an

evaluation phase to effectively select the best subtree. See build .

training data See build data .

transformation A function applied to data resulting in a new form

or representation of the data. Binning and normalization are exam-

ples of data transformations. See also binning, explode, and normaliza-

tion .

trend In time series, this is typically considered to be a long-term

change in the mean level of a series. What constitutes “long-term”

depends on the sampling rate of the time series. See also time series .

UML Unified Modeling Language.

URI Uniform Resource Identifier.

unstructured data Data that represents complex content, often with

an inherent structure. Examples of unstructured data include text,

images, audio, and video. See also structured data .

unsupervised learning The process of building data mining models

without the guidance ( supervision ) of a known, correct result. In super-

vised learning , this correct result is provided in the target attribute .

Unsupervised learning has no such target attribute. Clustering and

association are examples of unsupervised learning.

Search WWH ::

Custom Search

Home