Advanced Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

In order to overcome this trade-off, Domingos and Hulten (2000)

proposed VFDT a decision-tree learning method which is aimed at learning

online from high-volume data streams by using sub-sampling of the entire

data stream generated by a stationary process. This method uses constant

time per example and constant memory and can incorporate tens of

thousands of examples per second using off-the-shelf hardware. This method

involves learning by seeing each example only once; there is no necessity for

storing them. As a result it is possible to directly mine online data sources.

The sample size is determined in VFDT from distribution-free Hoeffding

bounds to guarantee that its output is asymptotically nearly identical to

that of a conventional learner.

11.7.3

The Concept Drift Challenge

The second problem, centered around concept drift, is also addressed by

incremental learning methods that have been adapted to work effectively

with continuous, time-changing data streams. Black and Hickey (1999)

identified several important sub-tasks involved in handling drift within

incremental learning methods. The two most fundamental sub-tasks are

identifying that drift is occurring and updating classification rules in the

light of this drift.

Time-windowing is one of the most known and acceptable approaches

for dealing with these tasks. The basic concept of this approach is the

repeated application of a learning algorithm to a sliding window, which

contains a certain amount (either constant or changing) of examples. As

new examples arrive, they are placed into the beginning of the window. A

corresponding number of examples are removed from the end of the window.

The latest model is the one used for future prediction of incoming instances

until concept drift is detected. At this point the learner is reapplied on the

last window of instances and a new model is built.

FLORA, the time-windowing approach developed by Widmer and

Kubat (1996) describes a family of incremental algorithms for learning

in the presence of drift. This method uses a currently trusted window of

examples as well as stored, old concept hypothesis description sets which

are reactivated if they seem to be valid again. The first realization of

this framework of this family of incremental algorithms is FLORA2, which

maintains a dynamically adjustable window of the latest training examples.

The method of adjusting the size of the window is known as window

adjustment heuristic (WAH). Whenever a concept drift is suspected as a

Search WWH ::

Custom Search

Home