Supporting Agile Software Development by Natural Language Processing - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

6WHS/LQNLQJ

6WHS

,QIRUPDWLRQ

$JJUHJDWLRQ

8VHU

VWRULHV

$UWLIDFWV

Fig. 5. Overview of the proposed approach

step, we classify user stories according to their status (to be implemented/not

yet started, in progress, completed) based on the artifacts found. This helps the

product owner to get a better understanding of the current status of the project

at the user story level.

In the first linking step (cf. Figure 6), the information contained in the de-

velopment artifacts is analyzed in order to discover which artifacts belong to

the realization of which user story. For instance, a code comment or a commit

message can refer to the implementation of the fancy case method of the exam-

ple user story in Figure 2 allowing to link it to the first task of the user story.

Additionally, the comments of a JUnit test can reference parts of the user story

such that the test case can be associated to the second task of this user story.

The artifacts that have tight links to the code, such as code comments or com-

mit messages, can be augmented with information derived from bug reports or

development Wiki. Also other sources of information might be exploited (which

are less structured and more distant to the code, as shown in Figure 6), such as

instant messaging (IM) within the company network or social network posts.

To make the linking step technically more concrete from the NLP perspective,

we need to reason about i) possible instance representations of the artifacts and

the user stories, and ii) possible learning mechanisms able to identify similar

objects.

For the instance representation, a first attempt might consist in applying in-

formation retrieval [10] techniques: representing the information contained in

the artifact or user story in a simple bag-of-words model in the vector space (i.e.

counting how often a word appeared in a user story, possibly weighted). If we

also want to link actual source code to user stories, then it will be also nec-

essary to identify and split source code identifiers into actual words [9]. Then,

similarity between these unstructured objects (vectors) can be calculated based

on the angle between the feature vectors in the vector space (e.g. their cosine

similarity ). Alternatively, deep natural language processing might be applied to

gather structured objects. For instance, the example user story could be represent

as shown in Figure 7, where natural language parsing and argument classifica-

tion has been applied. This representation could be further enriched with other

NLP tools like a semantic role labeler, a named entity recognizer, or distribu-

tional semantic techniques. Then, machine learning algorithms able to deal with

Search WWH ::

Custom Search

Home