Process Model Generation from Natural Language Text - Advanced Information Systems Engineering

Information Technology Reference

In-Depth Information

and edges within the generated models. We can see that the transformation

procedure tends to produce models which are on average 9-15% larger in size

then what a human would create. This can be partially explained by noise and

meta sentences which were not filtered appropriately. On the other hand, humans

tend to abstract during the process of modeling. Therefore, we often find more

detail of the text also in the generated model. The results are highly encouraging

as our approach is able to correctly recreate 77% of the model in average. On

a model level up to 96% of similarity can be reached, which means that only

minor corrections by a human modeler are required.

During the detailed analysis we determined different sources of failure, which

resulted in a decreased metric value. These are noise, different levels of abstrac-

tions, and processing problems within our system. Noise includes sentences or

phrases that are not part of the process description, as for instance “This ob-

ject consists of data elements such as the customers name and address and the

assigned power gauge.” While such information can be important for the under-

standing of a process, it leads to unwanted Activities within the generated model.

To tackle this problem, further filtering mechanisms are required. Low similarity

also results from difference in the level of granularity . To solve this problem, we

could apply automated abstraction techniques like [34] on the generated model.

Finally, the employed natural language processing components failed during the

analysis. At stages, the Stanford Parser failed at correctly classifying verbs. For

instance, the parser classified “the second activity checks and configures” as a

noun phrase, such that the verbs “check” and “configure” cannot be extracted

into Actions. Furthermore, important verbs related to business processes are

not contained in FrameNet, as “report”. Therefore, no message flow is created

between report activities and a Black Box Pool. We expect this problem to

be solved in the future as the FrameNet database grows. With WordNet, for

instance, there is a problem with times like “2:00 pm”, where pm as an abbre-

viation for “Prime Minister” is classified as an Actor. To solve this problem a

reliable sense disambiguation has to be conducted. Nevertheless, overall good

results were achieved by using WordNet as a general purpose Ontology.

5 Related Work

Recently, there is an increasing interest in the derivation of conceptual models

from text. This research is mainly conducted by six different groups.

Two approaches generate UML models. The Klagenfurt Conceptual Pre-design

Model and a corresponding tool are used to parse German text and fill instances

of a generic meta-model [35]. The stored information can be transformed to UML

activity diagrams and class diagrams [18]. The transformation from text to the

meta-model requires the user to make decisions about the relevant parts of a

sentence. In contrast to that, the approach described in [36] is fully automated.

It uses use-case descriptions in a format called RUCM to generate activity di-

agrams and class diagrams [17]. Yet, the system is not able to parse free-text.

The RUCM input is required to be in a restricted format allowing only 26 types

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home