Database Reference
In-Depth Information
n K− 1 is the number of records in periods 1
,...,K −
1.
x iK
X
K
isthenumberofrecordsinthe i th class of variable
in the
period.
x iK− 1 isthenumberofrecordsinthe i th class of variable
X
in periods
1
,...,K −
1.
X p 1 −α
is the number of classes of the tested
variable, then the null hypothesis that the variable X 's distribution has
been stationary in period
If
(
j −
1), where
j
like in the previous periods is rejected.
The explanation of Pearson's statistical hypothesis testing is provided
in [28].
K
2.4. Methodology
This section describes the algorithmic usage of the previous estimators:
Inputs:
• G
algorithm used for constructing the classification model
(e.g., C4.5 or IFN).
• M
is the
DM
is the classification model constructed by the
DM
algorithm (e.g., a
decision tree).
• V
is the validation method in use (e.g., 5-fold cross-validation).
• K
is the cumulative number of periods in a data stream.
• α
is the desired significance level for the change detection procedure (the
probability of a false alarm when no actual change is present).
Outputs:
• CD
(
α
) is the error-based change detection estimator (1 - p- value).
• XP
) is the Pearson's chi-square estimator of distribution change
(1 - p -value).
(
α
2.5. Change Detection Procedure
Stage 1:
For perio ds
K −
1 build the model
M K− 1 using the
DM
algorithm
G
.
Define the data set
D K− 1(val) .
Count the number of records
n K− 1 =
|D K− 1(val) |
.
Calculate the validation error rate ˆ
e M K− 1 ,K− 1
according to the valida-
tion method
V
.
Calculate
x iK− 1 ,
n K− 1
for every input and target variable existing in
periods 1
,...,K −
1.
Search WWH ::




Custom Search