Information Technology Reference
In-Depth Information
a within a window of size l . Then the J -measure is defined as f J ( a , b )=
p ( a )CE l ( a F b )whereCE l ( a F b ) denotes the cross-entropy of a and b ( b fol-
lows a within a window of size l ) and is defined as
CE l ( a F b )= p l ( a F b )log p l ( a F b )
p ( b )
+(1
p l ( a F b )) log 1
p l ( a F b )
1
p ( b )
The J -measure of b follows a for trace acaebfh using a window of size l =4
is f J ( a , b )=0 . 147.
Though local features are defined at a trace level, it is easy to lift them to the
level of an entire event log.
4.3 Statistical Hypothesis Tests to Detect Drifts
One can consider an event log
as a time series of traces (traces ordered on
their arrival time). Fig. 2 depicts such a perspective on an event log along with
change points. An event log can be split into sub-logs of s traces each. We can
consider either overlapping or non-overlapping windows when creating such sub-
logs. Fig. 2 depicts the scenario where two subsequent sub-logs do not overlap.
In this case, we have k =
L
s
sub-logs for n traces. One can estimate the
feature values for each trace separately (local features) or cumulatively over a
subset of traces (local and global features) and generate a dataset defined by a
matrix/vector of feature values over a sub-log/trace. For example, the relation
count feature type will generate a dataset
when either the
follows/precedes relation counts of all activities are considered over
D
of size k
×
3
|
Σ
|
.Instead,
if the follows/precedes relation count of an individual activity is considered in
isolation, it generates a dataset of size k
L
.The J -measure generates a
scalar value for each trace (sub-log) when an activity pair is considered thereby
generating a vector of size n
×
3for
L
×
1or k
×
1 (depending on whether it is measured
over traces or sub-logs) over
L
. If all activity pairs are considered, then a dataset
of size n
×|
Σ
|
2 or k
×|
Σ
|
2 is generated.
change
points
s
...
...
L 1
L 2
L k
t 1
t 2
...
t s t s +1
...
t 2 s
...
...
...
...
...
t n
Fig. 2. An event log and change points
We believe that there should be a characteristic difference in the manifesta-
tion of feature values in the traces (sub-logs) before and after the change points
with the difference being more pronounced at the boundaries. The goal of con-
cept drift in process mining is then to detect the change points and the nature
of changes given an event log. We propose the use of statistical hypothesis test-
ing to discover these change points. Hypothesis testing is a procedure in which
Search WWH ::




Custom Search