A New Similarity Search Approach on Process Models - Process-Aware Systems

Information Technology Reference

In-Depth Information

3.1 Only Consider Semantic Correspondences

In this approach, we compare each process model pair in term of semantic sim-

ilarity of their labels. We determine the degree of semantic similarity of each

label pair by utilizing WordNet [7][8], a powerful lexical database for measuring

the relatedness of concepts.

After comparing each label pair we summarize a collection of correspondence

sets, in which nodes are similar to each other in view of semantic similarity in

labels. Note in this approach we support 1-to-N and N-to-M correspondences.

For example, in case of process model pair (b) and (c), see Figure 3, we figure

out five correspondence sets (here the former node set is subset of nodes in

process (b) and the later node set is subset of nodes in process (c). The legend

is align to Figure 3 accordingly:

Set 1: (

{C} , {A}

)

Set 2: (

{D} , {D}

)

{E,G,H} , {H,K}

Set 3: (

)

{J, K, L} , {L}

Set 4: (

)

{M} , {M}

)

The patterns in each correspondence set is supposed to be the answer to our

similarity search query. For example,

Set 5: (

in process (b) and L in process

(c). Moreover, we can expand the retrieved pattern by assembling patterns when

two correspondence sets are adjacent to each other, such as Set 4 and Set 5. As

a result, we have our final similar model patterns out of Process (b) and (c):

{J, K, L, M}and {L,M}

{J, K, L}

.

3.2 Add Topological Consideration with Adjacency Matrix

Based on the straightforward approach introduced above, we notice that the

retrieved result of Approach A strictly depends on the adjacent relation of the cor-

respondence sets. Clearly it is not very helpful in the real-life applications. For ex-

ample given Process (d) as a target model, we should find

{E,G,H, I,J,K,L,M}

in Process (b) as its similar patterns. However, it is not a valid answer with

Approach A.

In addition, there is some limitation when we only compare node labels. Be-

cause it leaves out the ordering of activities and more likely results in a scat-

tered distribution of correspondences. In both cases the retrieval quality will be

affected.

To address the first problem, we need to take the topology into consideration

and differentiate between such Set 1

ₒ Set 1 case. In this pa-

per, we utilize adjacency matrix [9] to represent the structural properties of the

process models under study, where 1 from row i to column j means there is an

edge from node i to node j, 0 means there is no edge between them.

When we consider nodes from different correspondence sets, we look up the

adjacency matrixes of the process model pair. In this approach we extend the

limit of strict adjacent relation between the correspondence sets as below:

ₒ Set 2and Set 2

Process-Aware Systems

Search WWH ::

Custom Search

Home