Information Technology Reference
In-Depth Information
a
within a window of size
l
. Then the
J
-measure is defined as
f
J
(
a
,
b
)=
p
(
a
)CE
l
(
a
F
b
)whereCE
l
(
a
F
b
) denotes the cross-entropy of
a
and
b
(
b
fol-
lows
a
within a window of size
l
) and is defined as
CE
l
(
a
F
b
)=
p
l
(
a
F
b
)log
p
l
(
a
F
b
)
p
(
b
)
+(1
p
l
(
a
F
b
)) log
1
−
p
l
(
a
F
b
)
−
1
−
p
(
b
)
The
J
-measure of
b
follows
a
for trace
acaebfh
using a window of size
l
=4
is
f
J
(
a
,
b
)=0
.
147.
Though local features are defined at a trace level, it is easy to lift them to the
level of an entire event log.
4.3 Statistical Hypothesis Tests to Detect Drifts
One can consider an event log
as a time series of traces (traces ordered on
their arrival time). Fig. 2 depicts such a perspective on an event log along with
change points. An event log can be split into sub-logs of
s
traces each. We can
consider either overlapping or non-overlapping windows when creating such sub-
logs. Fig. 2 depicts the scenario where two subsequent sub-logs do not overlap.
In this case, we have
k
=
L
s
sub-logs for
n
traces. One can estimate the
feature values for each trace separately (local features) or cumulatively over a
subset of traces (local and global features) and generate a dataset defined by a
matrix/vector of feature values over a sub-log/trace. For example, the relation
count feature type will generate a dataset
when either the
follows/precedes relation counts of all activities are considered over
D
of size
k
×
3
|
Σ
|
.Instead,
if the follows/precedes relation count of an individual activity is considered in
isolation, it generates a dataset of size
k
L
.The
J
-measure generates a
scalar value for each trace (sub-log) when an activity pair is considered thereby
generating a vector of size
n
×
3for
L
×
1or
k
×
1 (depending on whether it is measured
over traces or sub-logs) over
L
. If all activity pairs are considered, then a dataset
of size
n
×|
Σ
|
2
or
k
×|
Σ
|
2
is generated.
change
points
s
...
...
L
1
L
2
L
k
t
1
t
2
...
t
s
t
s
+1
...
t
2
s
...
...
...
...
...
t
n
Fig. 2.
An event log and change points
We believe that there should be a characteristic difference in the manifesta-
tion of feature values in the traces (sub-logs) before and after the change points
with the difference being more pronounced at the boundaries.
The goal of con-
cept drift in process mining is then to detect the change points and the nature
of changes given an event log.
We propose the use of statistical hypothesis test-
ing to discover these change points. Hypothesis testing is a procedure in which