Information Technology Reference
In-Depth Information
This is by far not an exhaustive list, but it shows the possible artifacts that
can we exploited for the proposed approach. As we have seen, the artifacts
contain various levels of textual information (from short descriptions to entire
Wiki pages), thereby presenting an interesting challenge for NLP, which we will
introduce next.
2.2 Natural Language Processing
Natural Language Processing (NLP) [7] is an interdisciplinary field between com-
puter science, artificial intelligence, machine learning and linguistics concerned
with the study of computational approaches to understand and/or produce hu-
man (natural) language. Building systems that are able to do so is a dicult task,
given the intrinsic properties of natural language. One of the major challenges
for NLP is the ambiguity of language, exemplified in the following sentence:
The product owner gave her user stories . Humans usually have no trouble iden-
tifying the intended meaning (that the product owner gave some user stories
to 'her', presumably a software developer), while a computer usually identifies
many possible readings. For example, an alternative reading is that the product
owner gives some kind of stories to 'her user', thus identifying 'her' as possessive
pronoun and splitting the compound noun 'user story'. Ambiguity is pertaining
to all levels of linguistic processing. For instance, structural ambiguity (whether
'her' attaches to the verb or noun) or word-level ambiguity (whether her is a
personal or a possessive pronoun).
While early approaches to NLP were mainly symbolic and rule-based, the
field has changed dramatically since the development of annotated corpora (text
collections), the introduction of machine learning and the associated growth
and availability of computational power, leading to data-driven statistical ap-
proaches for learning. Current research largely focuses on the use of data-driven
approaches to learn from annotated (supervised learning), partially labeled data
(semi-supervised) or unlabeled data (unsupervised learning/clustering).
Some of the NLP tasks include, amongst others: part-of-speech (POS) tagging
(determining the part of speech, or word-class, for each word in a sentence),
named entity recognition (NER, given a text, determine which items in the text
refer to, e.g. proper names, locations, geopolitical entities), parsing (extracting
the syntactic structure of natural language sentences), relation extraction (RE,
identify relationships between entities in text, e.g. who is working for whom),
semantic role labeling (SRL, sometimes also called shallow semantic parsing,
the detection of the semantic arguments associated with the predicate or verb
of a sentence and their classification into their specific roles, e.g. agent, patient),
Machine Translation (automatic translation between texts in different languages)
and sentiment analysis (also known as opinion mining; extracting subjective
information from text, e.g. opinion statements, overall polarity).
We here propose to use NLP to analyze the natural language-based artifacts
created during the software development process. For instance, natural language
parsing is the task of uncovering the syntactic structure of natural language
sentences, which is represented in forms of trees. For example, if we apply a
 
Search WWH ::




Custom Search