Information Technology Reference
In-Depth Information
-
σ
M
is a subprocess relation refining a process model with subpro-
cess models, such that
⊆
M
×
∀
m
i
,m
j
∈
M
,where
j
=1
,
2
,...,n
and
i
=
j
,if
σ
+
,where
σ
+
is a transitive reflexive closure
(
m
i
,m
j
)
∈
σ
then (
m
j
,m
i
)
/
∈
of
σ
.
Definition 3 explicitly enumerates the model collection activities and property
value types. The relation
σ
formalizes the subprocess relation that exists between
models. Note that according to the definition,
σ
enables only a process model
hierarchy without loops. Without loss of generality in the remainder of this
paper we discuss abstraction of process models within a process model collection.
Indeed, a process model
m
i
can be seen as a trivial process model collection
c
=(
{
m
i
}
,A
i
,P
i
,
∅
).
2.2 Activity Aggregation as Cluster Analysis Problem
In this paper we interpret activity aggregation as a problem of cluster analysis.
Consider process model
m
i
=(
A
i
,G
i
,F
i
,P
i
,props
i
) from process model collec-
tion
c
=(
M, A, P, σ
). The set of objects to be clustered is the set of activities
A
i
. The objects are clustered according to a distance measure: objects that are
“close” to each other according to this measure are put together. The distance
between objects is evaluated through analysis of activity property values
P
.The
cluster analysis outcome, activity clusters, correspond to coarse-grained activi-
ties of the abstract process model. While cluster analysis provides a large variety
of algorithms, e.g., see [29], we focus on one algorithm that suits the business
process model abstraction use case in focus.
In the considered scenario, the user demands control over the number of activ-
ities in the abstract process model. For example, a popular practical guideline is
that five to seven activities are displayed on each level in the process model [30].
Provided a fixed number, e.g. 6, the clustering algorithm has to assure that the
number of clusters equals the request by the user. We turn to the use of k-means
clustering algorithm, as it is simple to implement and typically exhibits good
performance [16]. K-means clustering partitions an activity set into
k
clusters.
The algorithm assigns an activity to the cluster, which centroid is the closest
to this activity. To evaluate an activity distance, we analyze activity property
values
P
. We foresee a number of alternative activity distance measures and
elaborate on them in the next section.
2.3 Activity Distance Measures
To introduce the distance measure among activities we represent activities as vec-
tors in a vector space. Such an approach is inspired by the vector space model, an
algebraic model widely used in information retrieval [28]. The space dimensions
correspond to activity property values
P
and the vector space can be captured
as vector (
p
1
,...,p
|P |
), where
p
j
∈
P
for
j
=1
,...,
|
P
|
.Consideranexampleset
of property values
P
=
{
FA data, QA data, Raw data
}
and the corresponding
vector space presented in Fig. 2. A vector
A
i
in
process model
m
i
=(
A
i
,G
i
,F
i
,P
i
,props
i
) is constructed as follows. If activity
a
v
a
representing an activity
a
∈