Information Technology Reference
In-Depth Information
Step 1 : Use existing domain knowledge for an initial classification of process
instances based on contextual properties that are known to affect process
behavior.
For each partition, separately apply the following three steps.
Step 2 : Establish the behavioral similarity of the process instances.
(a) Path similarity categories are formed using a clustering algorithm over
path data of the instances. The number of path similar clusters generated is
selected according to goodness of fit criteria, such as Akaike Information
Criteria (AIC). The clustering algorithm can be applied several times,
achieving a series of clustering results with an increasing number of clus-
ters for each clustering set. Finally, the best cluster set is selected as the
one that attains the first minima of the ratio of AIC changes.
(b) Categorize termination states to a small number of categories based
on a set of predefined rules. The aim is to achieve a coarse grained
categorization with a clear distinction between categories.
(c) Combine path similarity categories with termination state categories into
behavioral similarity categories.
Step 3 : Establish the contextual properties that affect behavior. This is accom-
plished by training a decision tree algorithm, using the context data as
inputs and the behavioral categories as dependent variable (their label). The
objective of using the decision tree is to discover the contextual semantics
behind each behavioral category. We use a modified Chi-squared Automatic
Interaction Detection (CHAID) growing decision tree algorithm to construct
the decision tree that represents the context groups and their relationships.
CHAID tries to split the context data of the process instances into nodes that
contain instances whose dependent variable values (namely, behavioral simi-
larity category) are the same. Each path from the source node to a leaf node in
the decision tree represents a different combination of contextual properties.
Each leaf node contains a certain distribution of instances among behavioral
categories, allowing the identification of the most probable category for that
leaf.
Step 4 : Form the context groups. Based on Postulates 1 and 2, join tree paths
into context groups if the following two conditions are satisfied:
(a) The hypothesis that the process instances in their leaves are of the same
population (considering their behavioral similarity categories) cannot be
rejected.
(b) If their leaves include behavioral categories that stand for similar paths
but different termination states, the hypothesis that termination states for
similar paths in both leaves are of the same population cannot be rejected.
Search WWH ::




Custom Search