Java Reference
In-Depth Information
implementation and parallel algorithms together with the
data spread. However, this depends on the quality of the
cluster software implementation.
Application execution performance: Applications may
utilize data mining results in batch (e.g., predicting which
customers will respond to a campaign at one time and
storing the results) or in real-time (e.g., predicting a
customer's response to an offer while speaking to a
customer service representative based on new customer
profile information).
IT administration tools: Most large organizations use tools
to monitor software and hardware usage. The problem
of backup and recovery is also of concern. In-database
mining tools often leverage the existing backup and recov-
ery mechanisms in place for the database. Independent-
server tools either provide support in this area, or IT must
rely on OS and file system-oriented tools.
Impacts on Computing Hardware
How much computing power will be required to mine data is often
hard to determine. The time it takes to build a single model depends
on many factors: the amount of data including both the number of
cases and number of attributes, the complexity of the data itself, the
choice of algorithm and settings, and the internal scalability of the
algorithms as implemented in the DME (e.g., how these implementa-
tions may use multiple CPUs in parallel).
By complexity of the data, we mean the number of distinct values
present in each categorical attribute, as well as the richness of pat-
terns found in the data. For example, for an algorithm like k-means
clustering, which accepts numerical data for its computations, each
categorical attribute is exploded into indicator attributes. A binary
attribute, one with two values, will become two attributes, whereas
an attribute containing values for each of the 50 United States would
become 50 attributes. In the case of association rules, two datasets
with the same number of transactions, number of products, and
items per basket can take radically different execution times based on
the co-occurrences found in the data.
There is another dimension to data mining computing power
requirements: the volume of mining activities. This volume includes
the number and type of models built within a given time window, the
number and type of models used to score datasets, and the real-time
Search WWH ::

Custom Search