Digital Signal Processing Reference
In-Depth Information
⎧
⎨
N
i
=
1
ρ
i
t
i
−
1
(
x
)
maximize
x
∈
[
01
]
U
(
x
)=
g
N
(
x
)
−
Kt
N
(
x
)
−
.
(4)
N
⎩
r
max
subject to
0
≤
x
≤
1
and
M
r
≤
2.3
Operating Point Selection
corresponding literature.
The solutions proposed involve using iterative optimization techniques based
descent, and models a nonlinear optimization problem as an approximate quadratic
programming subproblem at each iteration, ultimately converging to a locally
optimal solution.
A significant body of work exists which discusses how to adapt stream mining
approaches are based on load-shedding algorithms, which determine when, where,
what, and how much data to discard given the observed data characteristics, desired
Naive load-shedding does not hold for more complex data classification tasks, where
results depend on dynamic statistical properties of the data, semantic relationships
between the concepts of interest, underlying features extracted, selected classifica-
tion algorithms, etc.
Instead of deciding on
what fraction
of the data to process, as in load-shedding
based approaches, we propose to determine
how
the available data should be
processed given the underlying resource allocation. Hence, we allow individual
classifiers in the ensemble to select different operating points in order to maximize
the global end-to-end performance while meeting system resource constraints. This
simultaneous configuration of multiple classifiers, makes our problem significantly
more complex than traditional NOPs, all the more due to the non-concavity of the
In our setting, selecting the operating point for a fixed order
σ
f
can be done by
applying the SQP-algorithm to the Lagrangian function of the optimization problem
T
1
T
2
x
L
(
x
,
ν
1
,
ν
2
,
ν
2
)=
U
(
x
)
−
ν
(
x
−
1
)+
ν
.
Because of the gradient-descent nature of the SQP algorithm, it is not possible
to guarantee convergence to the global maximum and the convergence may only
be locally optimal. However, the SQP algorithm can be initialized with multiple
starting configurations in order to find a better local optimum (or even the global
optimum). Since the number and size of local optima depend on the shape of the
various DET curves of each classifier, a rigorous bound on the probability to find