Handling Concept Drift in Process Mining - Advanced Information Systems Engineering

Information Technology Reference

In-Depth Information

dimensionality makes the computational complexity intractable for most real

life logs. On the other hand, changes being typically concentrated in a small

region of a process makes it unnecessary to consider all features. There is a

need for dimensionality reduction techniques that can eciently select the

most appropriate features.

- Holistic Approaches: In this paper, we discussed ideas on change detection

and localization in the context of sudden drifts and owing to the control-

flow perspective of a process. However, as mentioned in Section 3, data

and resource perspectives are also equally important. So are the contexts of

gradual, recurring and incremental drifts. Features and techniques that can

enable the detection of changes in these other perspectives need to be discov-

ered. Furthermore, there could be instances where more than one perspective

(e.g., both control and resource) change simultaneously. Hybrid approaches

considering all aspects of change holistically need to be developed.

- Techniques for Drift Detection: In this paper, we explored just the Hotelling

T 2 test to deal with multi-variate data. In addition, we have dealt with

multiple features by considering univariate hypothesis tests on each feature

separately and averaging the test results over all features. Further investiga-

tion needs to be done on hypothesis tests devised naturally for multi-variate

data. Also, determining an appropriate size of the window for hypothesis

tests is nontrivial; this mandates further study on understanding the influ-

ence of window size on the results. Alternatives to hypothesis testing that

can uncover drifts and diagnose the changes are a welcome addition to the

repertoire of techniques for handing concept drifts in process mining.

- Sample Complexity: Sample complexity refers to the number of traces (size

of the event log) needed to detect, localize, and characterize changes within

acceptable error bounds. This should be sensitive to the nature of changes,

their influence and manifestation in traces, and the feature space and al-

gorithms used for detecting drifts. On a broader note, the topic of sample

complexity is relevant to all facets of process mining and is hardly addressed.

For example, it would be interesting to know the lower bound on the number

of traces required to discover a process model with a desired fitness.

7 Conclusions

This paper introduced the topic of concept drift in process mining, i.e., analyzing

process changes based on event logs. We proposed feature sets and techniques

to effectively detect the changes in event logs and identify the regions of change

in a process. The approach has been implemented in ProM and evaluated using

synthetic data. This is a first step in the direction of dealing with changes in

any process monitoring and analysis efforts. We considered changes only with

respect to the control-flow perspective manifested as sudden drifts. However,

there is much to be done on various other perspectives mentioned in this paper.

Moreover, to further validate the approach we plan to conduct extensive case

studies based on real-life event logs.

Acknowledgments. R.P.J.C. Bose and W.M.P. van der Aalst are grateful to

Philips Healthcare for funding the research in process mining.

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home