Database Reference
In-Depth Information
In order to overcome this trade-off, Domingos and Hulten (2000)
proposed VFDT a decision-tree learning method which is aimed at learning
online from high-volume data streams by using sub-sampling of the entire
data stream generated by a stationary process. This method uses constant
time per example and constant memory and can incorporate tens of
thousands of examples per second using off-the-shelf hardware. This method
involves learning by seeing each example only once; there is no necessity for
storing them. As a result it is possible to directly mine online data sources.
The sample size is determined in VFDT from distribution-free Hoeffding
bounds to guarantee that its output is asymptotically nearly identical to
that of a conventional learner.
11.7.3
The Concept Drift Challenge
The second problem, centered around concept drift, is also addressed by
incremental learning methods that have been adapted to work effectively
with continuous, time-changing data streams. Black and Hickey (1999)
identified several important sub-tasks involved in handling drift within
incremental learning methods. The two most fundamental sub-tasks are
identifying that drift is occurring and updating classification rules in the
light of this drift.
Time-windowing is one of the most known and acceptable approaches
for dealing with these tasks. The basic concept of this approach is the
repeated application of a learning algorithm to a sliding window, which
contains a certain amount (either constant or changing) of examples. As
new examples arrive, they are placed into the beginning of the window. A
corresponding number of examples are removed from the end of the window.
The latest model is the one used for future prediction of incoming instances
until concept drift is detected. At this point the learner is reapplied on the
last window of instances and a new model is built.
FLORA, the time-windowing approach developed by Widmer and
Kubat (1996) describes a family of incremental algorithms for learning
in the presence of drift. This method uses a currently trusted window of
examples as well as stored, old concept hypothesis description sets which
are reactivated if they seem to be valid again. The first realization of
this framework of this family of incremental algorithms is FLORA2, which
maintains a dynamically adjustable window of the latest training examples.
The method of adjusting the size of the window is known as window
adjustment heuristic (WAH). Whenever a concept drift is suspected as a
Search WWH ::




Custom Search