Finding It Now: Construction and Configuration of Networked Classifiers in Real-Time Stream Mining Systems - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

features, e.g. sports categories. In this example, the “Team Sports” classifier is used

to filter the incoming data into two sets, thereby shedding a significant volume of

data before passing it to the downstream classifiers (negatively identified team sports

data is forwarded to the “Winter” classifier, while the remaining data is not further

analyzed). Deploying a network of classifiers in this manner enables successive

identification of multiple features in data, and provides significant advantages in

terms of deployment costs. Indeed, decomposing complex jobs into a network of

operators enhances scalability, reliability, and allows cost-performance tradeoffs to

be performed. As a consequence, less computing resources are required because

data is dynamically filtered through the classifier network. For instance, it has been

shown that using classifiers operating in series with the same model (boosting [ 29 ] )

or classifiers operating in parallel with multiple models (bagging [ 16 ] ) can result in

improved classification performance.

In this chapter, we will focus on mining applications that are built using a

topology of low-complexity binary classifiers each mapped to a specific concept

of interest. A binary classifier performs feature extraction and classification leading

to a yes/no answer. However, this does not limit the generality of our solutions,

as any M-ary classifiers may be decomposed into a chain of binary classifiers.

Importantly, our focus will not be on the operators' or classifiers' design, for which

many solutions already exist; instead, we will focus on configurin g 1 the networks of

distributed processing nodes, while trading off the processing accuracy against the

available processing resources or the incurred processing delays. See Fig. 4 b .

1.2.2

Changing Paradigm

Historically, mining applications were mostly used to find facts with data at rest.

They relied on static databases and data warehouses, which were submitted to

queries in order to extract and pull out valuable information out of raw data.

Recently, there has been a paradigm change in knowledge extraction: data is no

longer considered static but rather as an inflowing stream, on which to dynamically

compute queries and analysis in real time. For example, in Healthcare Monitoring,

data (i.e., biometric measurements) is automatically analyzed through a batch of

queries, such as “Verify that the calcium concentration is in the correct interval”,

“Verify that blood pressure is not too high”, etc. Rather than applying a single

query to data, the continuous stream of medical data is by default pushed through

a predefined set of queries. This enables to detect any abnormal situation and react

accordingly. See Fig. 5 .

Interestingly, stream mining could lead to performing automatic action in

response to a specific measurement. For example, a higher dose of pain killers could

be administrated when concentration of calcium becomes too high, thus enabling

real-time control. See Fig. 6

1 As we will discuss later, there are two types of configuration choices we must make: the

topological ordering of classifiers and the local operating points at each classifier

Search WWH ::

Custom Search

Home