Database Reference
In-Depth Information
result of a drop in predictive accuracy, the size of the window is decreased
by disposing of the oldest examples. The window size is left unchanged if
the concept appears to be stable. If concept drift remains uncertain, none of
the examples are forgotten, and thus the window size is gradually increased
until a stable concept description can be formed. As long as a relatively low
rate of concept drift is preserved this strategy of window adjustment can
detect radical changes in the underlying concept eciently.
FLORA3 stores concepts for later use and reassesses their utility when
a context change is perceived. FLORA4 is designed to be exceptionally
robust with respect to noise in the training data since it is very dicult in
incremental learning to distinguish between slight irregularities due to noise
and actual concept drift. Both of these algorithms serve as an extension of
FLORA2.
According to Hulten et al . (2001), as long as the window size is small
relative to the rate of concept drift, the time-windowing procedure assures
availability of a model reflecting the current concept generating the data.
However, if the window is too small, this may result in insucient examples
to satisfactorily learn the concept. Furthermore, the computational cost of
reapplying a learner may be prohibitively high, especially if examples arrive
at a rapid rate and the concept changes quickly.
To meet these challenges, Domingos and Hulten (2001) proposed an
ecient algorithm for mining decision trees from continuously changing
data streams. Called CVFDT (concept-adapting very fast decision trees
learner), the algorithm is based on the ultra-fast VFDT decision tree
learner. CVFDT, a VFDTn extension, maintains VFDTs speed and
accuracy advantages but adds the ability to detect and respond to changes
in the example-generating process.
Like other systems with this capability, CVFDT works by keeping its
model consistent with a sliding window of examples. However, it does not
need to learn a new model from scratch every time a new example arrives;
instead, it updates the sucient statistics at its nodes by incrementing
the counts corresponding to the new example and by decrementing the
counts corresponding to the oldest example in the window (which now
must be forgotten). This will statistically have no effect if the underlying
concept is stationary. If the concept is changing, however, some splits
that previously passed the Hoeffding test will no longer do so because an
alternative attribute now has higher gain. In this case, CVFDT begins to
grow an alternative sub-tree with the new best attribute at its root. When
this alternate sub-tree becomes more accurate on new data than the old
Search WWH ::




Custom Search