Finding It Now: Construction and Configuration of Networked Classifiers in Real-Time Stream Mining Systems - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

1.4.4

Resource Constraints

A key research challenge [ 3 , 15 ] in distributed stream mining systems arises from

the need to cope effectively with system overload, due to limited system resources

(e.g. CPU, memory, I/O bandwidth etc.) while providing desired application

performance. Specifically, there is a large computational cost incurred by each

classifier (proportional to the data rate) that limits the rate at which the application

can handle input data. This is all the more topical in a technological environment

where low-power devices such as smartphones are becoming more and more used.

Although some systems like Telegraph CQ [ 6 ] provide dynamic adaptation

to available resources, they do not factor in application knowledge to achieve

the best resource-to-accuracy tradeoffs. System S [ 2 ] makes an attempt in this

direction by additionally providing hooks for applications to determine current

resource usage so that they can adapt suitably to the available resources. As the

number of stream processing applications is growing rapidly, this issue must be

addressed systematically. Additionally, we do not consider in this chapter multiple

queries from different users, making it unsuitable for complex, large-scale, and

diverse semantic knowledge extraction tasks as required by the various users and

applications.

2

Proposed Systematic Framework for Stream Mining

Systems

2.1

Query Process Modeled as Classifier Chain

Stream data analysis applications pose queries on data that require multiple concepts

to be identified. More specifically, a query q is answered as a conjunction of a set of

N classifiers

, each associated with a concept to be identified

(e.g. Fig. 4 shows a stream mining system where the concepts to be identified are

sports categories).

In this chapter, we focus on binary classifiers: each binary classifier C i labels

input data int o t wo classes

C (

q

)= {

C 1 ,...,

C N }

H i (considered without loss of generality as the class of

H i . The objective is to extract data belonging to i = 1 H i .

Partitioning the problem into this ensemble of classifiers and filtering data

successively (i.e. discarding data that is not labelled as belonging to the class of

interest), enables to control the amount of resources consumed by each classifier

in the ensemble. Indeed, only da ta labelled as belonging to

interest) and

H i is forwarded, while

data labelled as belonging to

H i is dropped. Hence, a classifier only has to process

a subset of the data processed by the previous classifier. This justifies using a chain

topology of classifiers, where the output of one classifier C i − 1 feeds the input of

classifier C i , and so on, as shown in Fig. 8 .

Signal Processing Systems

Search WWH ::

Custom Search

Home