Handling Concept Drift in Process Mining - Advanced Information Systems Engineering

Information Technology Reference

In-Depth Information

a within a window of size l . Then the J -measure is defined as f J ( a , b )=

p ( a )CE l ( a F b )whereCE l ( a F b ) denotes the cross-entropy of a and b ( b fol-

lows a within a window of size l ) and is defined as

CE l ( a F b )= p l ( a F b )log p l ( a F b )

p ( b )

+(1

p l ( a F b )) log 1

−

p l ( a F b )

−

p ( b )

The J -measure of b follows a for trace acaebfh using a window of size l =4

is f J ( a , b )=0 . 147.

Though local features are defined at a trace level, it is easy to lift them to the

level of an entire event log.

4.3 Statistical Hypothesis Tests to Detect Drifts

One can consider an event log

as a time series of traces (traces ordered on

their arrival time). Fig. 2 depicts such a perspective on an event log along with

change points. An event log can be split into sub-logs of s traces each. We can

consider either overlapping or non-overlapping windows when creating such sub-

logs. Fig. 2 depicts the scenario where two subsequent sub-logs do not overlap.

In this case, we have k =

sub-logs for n traces. One can estimate the

feature values for each trace separately (local features) or cumulatively over a

subset of traces (local and global features) and generate a dataset defined by a

matrix/vector of feature values over a sub-log/trace. For example, the relation

count feature type will generate a dataset

when either the

follows/precedes relation counts of all activities are considered over

of size k

.Instead,

if the follows/precedes relation count of an individual activity is considered in

isolation, it generates a dataset of size k

.The J -measure generates a

scalar value for each trace (sub-log) when an activity pair is considered thereby

generating a vector of size n

3for

1or k

1 (depending on whether it is measured

over traces or sub-logs) over

. If all activity pairs are considered, then a dataset

of size n

×|

2 or k

×|

2 is generated.

change

points

...

L 1

L 2

L k

t 1

t 2

...

t s t s +1

...

t 2 s

...

t n

Fig. 2. An event log and change points

We believe that there should be a characteristic difference in the manifesta-

tion of feature values in the traces (sub-logs) before and after the change points

with the difference being more pronounced at the boundaries. The goal of con-

cept drift in process mining is then to detect the change points and the nature

of changes given an event log. We propose the use of statistical hypothesis test-

ing to discover these change points. Hypothesis testing is a procedure in which

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home