Database Reference
In-Depth Information
error rate of a given incremental model is accurately estimated by the previous K -1
periods. Therefore, a change in the rules (R) is encountered during period K if the
validation error of the model M K -1 (the model based on D K -1 ) on the database
D K -1 is significantly different from the validation error rate of the model M K -1
over d K .
Therefore, the parameter of interest for the statistical hypothesis testing is the
true validation error rate, and the null hypothesis for testing is as follows:
ˆ
ˆ
H
:
e
e
e
0
M
,
K
Val
M
,
K
1
(4.2)
K
1
K
1
,
H
:
e
ˆ
z
e
e
ˆ
1
M
,
K
Val
M
,
K
1
K
1
K
1
M K e is the validation error rate of D K -1 set of records on model M K -1
(the standard validation error of the model);
ˆ
where
K
,
1
1
ˆ is the validation error rate of
the set of records d K on the aggregated model M K -1 ; and Val
e
M K
,
K
1
e
is the true
validation error rate of the incremental model, based on K -1 periods.
To detect a significant difference between two error rates (see also Mitchell
[29]), it is needed to use the Eq. (4.2). The objective of this test is to test the
difference between two independent proportions based on the approximation to the
normal distribution.
The hypothesis decision is measured by the following equations (two-sided
hypothesis):
ˆ
ˆ
ˆ
ˆ
e
(1
e
)
e
(1
e
)
MK
,
MK
,
MK
,
1
MK
,
1
ˆ
2
V
K
1
K
1
K
1
K
1
d
n
n
K
K
1(
val
)
(4.3)
ˆ
ˆ
ˆ
de
e
MK
,
MK
,
1
K
1
K
1
ˆ
If
dz D
t
V
ˆ
2
then do not accept H .
0
1
d
2
ˆ
M K e is the validation error rate of D K -1 set of records on model M K -1
(the standard observed validation error of the model); n K -1(val) = |D K -1(val) | is
the number of records selected for validation from periods 1, ... ,
where
K
,
1
1
ˆ
is the observed validation error rate of the set of records d K on the aggregated
model M K -1 ; and n K = |d K | is the number of records in period K .
e
K
1
;
M K
,
K
1
4.2.3.2 Statistical Hypothesis for the Distribution Change Detection
The objective of the second estimator is to validate the assumption that a
variable(s)'s population (target or candidate) has significantly changed in a
statistical sense. For this purpose, we use Pearson's estimator for testing matching
Search WWH ::




Custom Search