Information Technology Reference
In-Depth Information
Table 1. Properties of business process models used in the validation
Nodes Activities Role Responsible role IT system Data object
Average
15.5
6.3
2.1
0.76
1.5
0.76
Minimum
5
1
0
0
0
0
Maximum
48
20
5
2
7
17
activities intersect, i.e., contain the same flow elements. Table 1 outlines the rel-
evant properties of the process models. In the existing repository, the models are
hierarchically organized using a subprocess relation. Within the model set, we
have identified 8 subprocess hierarchies. Each hierarchy contains a root process
model refined with subprocesses, allowing for several levels of refinement.
To formally validate how good the designed activity aggregation approximates
the behavior of modelers clustering a set of activities into the same subprocess,
we selected the following approach. For each pair of activities that belong to
the same process hierarchy, we have evaluated two values in the process model
collection: diff and dist . Here, diff describes the human abstraction style, which
indicates whether the activities have been decided to be placed in the same sub-
process or not. The value of dist represents the vector space distance between
the two activities in accordance with our approach. To discover if the two ap-
proaches yield similar results, we study the correlation between the two variables.
A strong correlation of two variables implies that dist is a good distance measure
in the clustering algorithm. In this case, the inclusion of two activities within
the same subprocess is mirrored by a close positioning of the corresponding vec-
tors in the vector space. Given the nature of the observed variables, we employ
Spearman's rank correlation coecient.
In the following, we first investigate the human abstraction style in the model
collection as a whole. Then, we verify the results organizing a K-fold cross valida-
tion. We partition the model sample into 4 subsamples, i.e., k =4andperform
four tests. In each test, three subsamples are used to discover vector
, while
the fourth subsample is used to evaluate the correlation values between the diff
and dist measures in different vector spaces. In this way, a more reliable in-
sight is developed into the question whether the human abstraction style can be
mimicked in contrast to using the whole process model collection for both the
discovery and the evaluation of this correlation.
W
3.2 Validation Results
Table 2 outlines the validation's results. The columns in the table correspond
to distance measures. While the first 6 columns correspond to distances in ho-
mogeneous spaces, the last three columns reflect the distance measure taking
into account multiple activity properties. All three distance measures make use
of the activity property types in columns 1-6. The distance dist h is measured
in heterogeneous vector space, where dimensions are activity property values of
types listed in columns 1-6. The distance measure dist avg is the average value of
 
Search WWH ::




Custom Search