Database Reference
In-Depth Information
adding it to the labeled pool decreases in i . Thus, the calculation of the
potential contribution of each instance in the new batch depends on the
other instances that are selected to this batch.
Rokach et al . (2008) introduce Pessimistic Active Learning (PAL). PAL
employs a novel pessimistic measure, which relies on confidence intervals
and is used to balance the exploration/exploitation trade-off. In order to
acquire an initial sample of labeled data, PAL applies orthogonal arrays
of fractional factorial design. PAL is particularly advantageous when the
estimation confidence intervals are relatively large. Once the decision tree
obtains sucient evidence for each leaf, the relative advantage of PAL is
diminished.
12.6 Proactive Data Mining
Data mining algorithms are used as part of the broader process of
knowledge-discovery. The role of the data-mining algorithm, in this process,
is to extract patterns hidden in a dataset. The extracted patterns are then
evaluated and deployed. The objectives of the evaluation and deployment
phases include decisions regarding the interest of the patterns and the way
they should be used.
While data mining algorithms, particularly those dedicated to super-
vised learning, extract patterns almost automatically (often with the user
making only minor parameter settings), humans typically evaluate and
deploy the patterns manually. In regard to the algorithms, the best practice
in data mining is to focus on description and prediction and not on
action. That is to say, the algorithms operate as passive “observers” on
the underlying dataset while analyzing a phenomenon. These algorithms
neither affect nor recommend ways of affecting the real world. The
algorithms only report to the user on the findings. As a result, if the
user chooses not to act in response to the findings, then nothing will
change. The responsibility for action is in the hands of humans. This
responsibility is often overly complex to be handled manually, and the
data mining literature often stops short of assisting humans in meeting this
responsibility.
Consider the following scenario. In marketing and customer relationship
management (CRM), data mining is often used for predicting customer
lifetime value (LTV). Customer LTV is defined as the net present value of
the sum of the profits that a company will gain from a certain customer,
starting from a certain point in time and continuing through the remaining
Search WWH ::




Custom Search