Supporting Agile Software Development by Natural Language Processing - Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Information Technology Reference

In-Depth Information

This is by far not an exhaustive list, but it shows the possible artifacts that

can we exploited for the proposed approach. As we have seen, the artifacts

contain various levels of textual information (from short descriptions to entire

Wiki pages), thereby presenting an interesting challenge for NLP, which we will

introduce next.

2.2 Natural Language Processing

Natural Language Processing (NLP) [7] is an interdisciplinary field between com-

puter science, artificial intelligence, machine learning and linguistics concerned

with the study of computational approaches to understand and/or produce hu-

man (natural) language. Building systems that are able to do so is a dicult task,

given the intrinsic properties of natural language. One of the major challenges

for NLP is the ambiguity of language, exemplified in the following sentence:

The product owner gave her user stories . Humans usually have no trouble iden-

tifying the intended meaning (that the product owner gave some user stories

to 'her', presumably a software developer), while a computer usually identifies

many possible readings. For example, an alternative reading is that the product

owner gives some kind of stories to 'her user', thus identifying 'her' as possessive

pronoun and splitting the compound noun 'user story'. Ambiguity is pertaining

to all levels of linguistic processing. For instance, structural ambiguity (whether

'her' attaches to the verb or noun) or word-level ambiguity (whether her is a

personal or a possessive pronoun).

While early approaches to NLP were mainly symbolic and rule-based, the

field has changed dramatically since the development of annotated corpora (text

collections), the introduction of machine learning and the associated growth

and availability of computational power, leading to data-driven statistical ap-

proaches for learning. Current research largely focuses on the use of data-driven

approaches to learn from annotated (supervised learning), partially labeled data

(semi-supervised) or unlabeled data (unsupervised learning/clustering).

Some of the NLP tasks include, amongst others: part-of-speech (POS) tagging

(determining the part of speech, or word-class, for each word in a sentence),

named entity recognition (NER, given a text, determine which items in the text

refer to, e.g. proper names, locations, geopolitical entities), parsing (extracting

the syntactic structure of natural language sentences), relation extraction (RE,

identify relationships between entities in text, e.g. who is working for whom),

semantic role labeling (SRL, sometimes also called shallow semantic parsing,

the detection of the semantic arguments associated with the predicate or verb

of a sentence and their classification into their specific roles, e.g. agent, patient),

Machine Translation (automatic translation between texts in different languages)

and sentiment analysis (also known as opinion mining; extracting subjective

information from text, e.g. opinion statements, overall polarity).

We here propose to use NLP to analyze the natural language-based artifacts

created during the software development process. For instance, natural language

parsing is the task of uncovering the syntactic structure of natural language

sentences, which is represented in forms of trees. For example, if we apply a

Trustworthy Eternal Systems via Evolving, Software Data and Knowledge

Search WWH ::

Custom Search

Home