Database Reference
In-Depth Information
ated. The VFDT method has also been extended to the case of evolving
data streams. This framework is referred to as CVFDT [48], and it
runs VFDT over fixed sliding windows in order to always have the most
updated classifier. Jin and Agrawal [50] have extended the VFDT al-
gorithm in order to process numerical attributes and reduce the sample
size which is calculated using the Hoeffding bound. Since this approach
reduces the sample size, it improves eciency and space requirements
for a given level of accuracy.
On Demand Classification: While most stream classification meth-
ods are focussed on a training stream, the on demand method is focussed
on the case when both the training and the testing stream evolves over
time. In the on demand classification method [13], we create class-
specific micro-clusters from the underlying data. For an incoming record
in the test stream, the class label of the closest micro-cluster is used in
order to determine the class label of the test instance. In order to han-
dle the problem of stream evolution, the micro-clusters from the specific
time-horizon are used for the classification process. A key issue in this
method is the choice of horizon which should be used in order to obtain
the best classification accuracy. In order to determine the best horizon,
a portion of the training stream is separated out and the accuracy is
tested over this portion with different horizons. The optimal horizon is
then used in order to classify the test instance.
Ensemble-based Classification: This technique [74] uses an ensem-
ble of classification methods such as C4.5, RIPPER and naive Bayes in
order to increase the accuracy of the predicted output. The broad idea
is that a data stream may evolve over time, and a different classifier may
work best for a given time period. Therefore, the use of an ensemble
method provides robustness in the concept-drifting case.
Compression-based Methods: An interesting method for real-time
classification of streaming sensor data with the use of compression tech-
niques has been proposed in [57]. In this approach, time-series bitmaps,
which can be updated in constant time are used as ecient classifiers.
Because of the ability of be updated in constant time, these classifiers
are very ecient in practice. The effectiveness of this approach has been
illustrated on a number of insect-tracking data sets.
In the context of sensor networks, data streams may often have a
significant level of errors and uncertainty. Data uncertainty brings a
number of unique challenges with it in terms of the determination of
the important features to be used for the classification process. In this
context, a number of algorithms have been proposed for classification of
uncertain data streams [14, 15]. In particular, the method discussed in
Search WWH ::




Custom Search