Information Technology Reference
In-Depth Information
1
0.98
|H| = 60
|H| = 40
0.96
|H| =
0.94
0.92
0.9
0.88
|H| = 20
0.86
10
20
30
40
50
60
70
80
90
100
Data chunk
Figure 7.13 Performance comparison of hypotheses removal.
approximately half the total number of data chunks received over time, which
is impractical as the number of data chunks is usually unknown in real-world
applications. Another method is to assign for each hypothesis a factor decaying
from the time it is created. When the factor decreases through a threshold η ,the
corresponding hypothesis is removed from H , which is pretty much similar to
a queue with dynamic capacity. The challenge raised by this method is how to
determine η , which could hardly be determined by cross-validation in an online
learning scenario.
7.4.4 Complexity
Time and space complexity are of critical importance for designing real-time
online learning algorithms. It is expected that an algorithm learns from data
streams as quickly as possible such that it can keep pace with the data stream
that could be of very high speed. It is also desirable that the algorithm does not
occupy significantly large memory because of the concern of scalability. From
a view of slightly high level, time and space complexity of REA should be
related to (i) the difference between post-balanced ratio f and imbalanced ratio
γ , that is, f γ ; (ii) k parameter of k -nearest neighbor; and (iii) the capacity
of hypothesis set H , that is, | H | .
To quantify the time and space complexity of algorithms introduced in Section
7.3, their time and space consumption for learning from SEA (40 chunks), ELEC
(40 chunks), and SHP (100 chunks) are recorded as in Tables 7.3-7.5, respec-
tively. The hardware configuration is Intel Core i5 Processor with 4 GB random
access memory (RAM).
Search WWH ::




Custom Search