Information Technology Reference
In-Depth Information
dimensionality makes the computational complexity intractable for most real
life logs. On the other hand, changes being typically concentrated in a small
region of a process makes it unnecessary to consider all features. There is a
need for dimensionality reduction techniques that can eciently select the
most appropriate features.
- Holistic Approaches: In this paper, we discussed ideas on change detection
and localization in the context of sudden drifts and owing to the control-
flow perspective of a process. However, as mentioned in Section 3, data
and resource perspectives are also equally important. So are the contexts of
gradual, recurring and incremental drifts. Features and techniques that can
enable the detection of changes in these other perspectives need to be discov-
ered. Furthermore, there could be instances where more than one perspective
(e.g., both control and resource) change simultaneously. Hybrid approaches
considering all aspects of change holistically need to be developed.
- Techniques for Drift Detection: In this paper, we explored just the Hotelling
T 2 test to deal with multi-variate data. In addition, we have dealt with
multiple features by considering univariate hypothesis tests on each feature
separately and averaging the test results over all features. Further investiga-
tion needs to be done on hypothesis tests devised naturally for multi-variate
data. Also, determining an appropriate size of the window for hypothesis
tests is nontrivial; this mandates further study on understanding the influ-
ence of window size on the results. Alternatives to hypothesis testing that
can uncover drifts and diagnose the changes are a welcome addition to the
repertoire of techniques for handing concept drifts in process mining.
- Sample Complexity: Sample complexity refers to the number of traces (size
of the event log) needed to detect, localize, and characterize changes within
acceptable error bounds. This should be sensitive to the nature of changes,
their influence and manifestation in traces, and the feature space and al-
gorithms used for detecting drifts. On a broader note, the topic of sample
complexity is relevant to all facets of process mining and is hardly addressed.
For example, it would be interesting to know the lower bound on the number
of traces required to discover a process model with a desired fitness.
7 Conclusions
This paper introduced the topic of concept drift in process mining, i.e., analyzing
process changes based on event logs. We proposed feature sets and techniques
to effectively detect the changes in event logs and identify the regions of change
in a process. The approach has been implemented in ProM and evaluated using
synthetic data. This is a first step in the direction of dealing with changes in
any process monitoring and analysis efforts. We considered changes only with
respect to the control-flow perspective manifested as sudden drifts. However,
there is much to be done on various other perspectives mentioned in this paper.
Moreover, to further validate the approach we plan to conduct extensive case
studies based on real-life event logs.
Acknowledgments. R.P.J.C. Bose and W.M.P. van der Aalst are grateful to
Philips Healthcare for funding the research in process mining.
 
Search WWH ::




Custom Search