Information Technology Reference
In-Depth Information
512
apriori
partition
eclat
hybrid+2
256
128
64
32
16
2
1
0.5
0.25
0.125
minsupp in %
Fig. 4. Performance benchmark
the mining algorithms the achieved runtime - and actually 1 million transactions
is still a moderate database size -would allow true interactivity in an iterative
KDD process as introduced in Section 3. On larger databases and with lower
minsupp values easily run times upto hours are reached.
5 Integration with Relational Database Systems
In Section 3 we learned that data mining - running the mining algorithm - is
only one of the steps in a KDD process. In this context it becomes clear that
algorithmic details are important but of course the integration of the mining
algorithm with the other KDD phases must also be considered. Interactivity
tremendouslysufferswhenproceedingfromoneKDDphasetonextisnotsmooth
but implies annoying user interference [13].
5.1 Common Situation
Flat files or even binary encoded datasets are common in research and devel-
opment environments but we rarely found them in business units. So, today in
real-world applications we can expect the data to reside in a database system.
For the mining algorithms this implies that a proper integration with relational
database systems is one of the key features.
The natural way to store transactions as they were introduced in Section 2
in a relational table is in (id, item)-tuple form, c.f. Table 3. Each transaction
(vehicle) is represented by one or more rows in the table.
Butofcourseadatabasewilltypicallynotberestrictedtosuch“transactional
data”. For instance in our example there will also be tables that hold attributes
Search WWH ::




Custom Search