Digital Signal Processing Reference
In-Depth Information
1.4.4
Resource Constraints
A key research challenge [ 3 , 15 ] in distributed stream mining systems arises from
the need to cope effectively with system overload, due to limited system resources
(e.g. CPU, memory, I/O bandwidth etc.) while providing desired application
performance. Specifically, there is a large computational cost incurred by each
classifier (proportional to the data rate) that limits the rate at which the application
can handle input data. This is all the more topical in a technological environment
where low-power devices such as smartphones are becoming more and more used.
Although some systems like Telegraph CQ [ 6 ] provide dynamic adaptation
to available resources, they do not factor in application knowledge to achieve
the best resource-to-accuracy tradeoffs. System S [ 2 ] makes an attempt in this
direction by additionally providing hooks for applications to determine current
resource usage so that they can adapt suitably to the available resources. As the
number of stream processing applications is growing rapidly, this issue must be
addressed systematically. Additionally, we do not consider in this chapter multiple
queries from different users, making it unsuitable for complex, large-scale, and
diverse semantic knowledge extraction tasks as required by the various users and
applications.
2
Proposed Systematic Framework for Stream Mining
Systems
2.1
Query Process Modeled as Classifier Chain
Stream data analysis applications pose queries on data that require multiple concepts
to be identified. More specifically, a query q is answered as a conjunction of a set of
N classifiers
, each associated with a concept to be identified
(e.g. Fig. 4 shows a stream mining system where the concepts to be identified are
sports categories).
In this chapter, we focus on binary classifiers: each binary classifier C i labels
input data int o t wo classes
C (
q
)= {
C 1 ,...,
C N }
H i (considered without loss of generality as the class of
H i . The objective is to extract data belonging to i = 1 H i .
Partitioning the problem into this ensemble of classifiers and filtering data
successively (i.e. discarding data that is not labelled as belonging to the class of
interest), enables to control the amount of resources consumed by each classifier
in the ensemble. Indeed, only da ta labelled as belonging to
interest) and
H i is forwarded, while
data labelled as belonging to
H i is dropped. Hence, a classifier only has to process
a subset of the data processed by the previous classifier. This justifies using a chain
topology of classifiers, where the output of one classifier C i 1 feeds the input of
classifier C i , and so on, as shown in Fig. 8 .
 
Search WWH ::




Custom Search