Database Reference
In-Depth Information
where:
ˆ
M K− 1 measured on
D K− 1 set of records (the standard validation error of the model).
ˆ
e M K− 1 ,K− 1
is the validation error rate of model
e M K− 1 ,K
is the validation error rate of the aggregated model
M K− 1 on
the set of records
d K .
In order to detect a significant difference between the two error rates it
is needed to test the following statistic (two sided hypothesis):
ˆ
|
d|
|
e M K− 1 ,K
e M K− 1 ,K− 1 |,
=
ˆ
ˆ
ˆ
e M K− 1 ,K
(1
ˆ
e M K− 1 ,K )
+ ˆ
e M K− 1 ,K− 1 (1
e M K− 1 ,K− 1 )
n K− 1( val )
ˆ
σ d
ˆ
=
,
n K
d|≥z (1 −α/ 2) · ˆ
ˆ
σ d ,
If
|
then reject
H 0 .
AchangehasoccuredinperiodK
.
n K− 1(val)
=
|D K− 1(val) |
is the number of records which were selected for
validation from periods 1
,...,K−
1and
n K =
|d K |
is the number of records
in period
.
The foundations for the above way of hypothesis testing when comparing
error rates of classification algorithms can be found in [29].
Detecting changes in variable distributions . The second statistical
test is Pearson's chi-square statistic for comparing multinomial variables
(see [28]). This test examines whether a sample of the variable distribu-
tion is drawn from the matching probability distribution known as the true
distribution of that variable. The objective of this estimator in the change
detection procedure is to validate our assumption that the distribution of
an input or a target variable has significantly changed in statistical sense.
Again, since massive data streams are usually involved in building an incre-
mental model, it is safe to assume that the stationary distribution of any
variable in a given incremental model can be accurately estimated by the
previous
K
1periods.
The following null hypothesis is tested for every variable of interest
K −
X
:
H 0 :thevariable
X
's distribution is stationary (time-invariant).
H 1 : otherwise.
The decision is based on the following formula:
j
x iK /n K − x iK− 1 /n K− 1 ) 2
x iK− 1 /n K− 1
(
X p =
n K ·
,
(2)
i =1
where:
n K
isthenumberofrecordsinthe K th period.
Search WWH ::




Custom Search