Information Technology Reference
In-Depth Information
system equipment . Once the template is given, the text fragments containing rel-
evant information to fill the template slots (i.e. specific values associated to the
attributes of a certain Ship instance) need to be identified in a text. The recog-
nition of textual information of interest results from pattern matching against
extraction rules. Finally, in a third phase, whenever the information of interest
is identified in the text, its mapping in the suitable (e.g. Ship ) template slot is
carried out. The above chain is not trivial and contemporary IE systems 1 are
usually integrated with large scale knowledge bases, determining all the lexical,
syntactic and semantic constraints needed for a correct interpretation of usually
domain-specific texts. Unfortunately, the manual development of these resources
is a time-consuming task that is often highly error-prone due to the subjectivity
and intrinsic vagueness that affects the semantic modeling process. Knowledge
acquisition task is often approached through the use of Machine Learning algo-
rithms to automatically learn the domain-specific information from annotated
data [9]. Statistical learning methods [10] assume that lexical or grammatical as-
pects of training data are the basic features for modeling the different inferences.
They are then generalized into predictive patterns composing the final induced
model. A statistical language processor is assumed to be able to locate specific
instances of a template type (e.g. Ship ) and their slot information in an incom-
ing text. The resulting instantiated template can be employed to populate an
existing knowledge base whose semantic schema correspond (or can be mapped)
to the template structure. Moreover, reasoning over the extracted information,
e.g. identifying relations or dependencies with respect to previous requirements,
can be better performed. For example, retrieval of developed components that
respond properly to new requirements could be realized as a form of reasoning.
3 Machine Learning for Requirement Analysis
In Requirement Analysis some NLP applications like Information Extraction
tasks could be very useful to support people to perform this task practically
and in a cost-effective way. Statistical NLP approaches provide domain specific
models of target interpretation tasks by acquiring and generalizing linguistic ob-
servations. Several Statistical Machine Learning paradigms have been defined to
provide robust models that easily adapt across different (and possibly specific)
domains. These techniques are the basis to our proposed approach and we will
discuss them hereafter. This problem is normally treated as a Statistical Classi-
fication problem, where the target is to identifying the sub-population to which
new data belong, where the identity of the sub-population is unknown (the test
data), on the basis of a training set of data containing observations whose sub-
population is known (the training data). In this scenario we may be interested
for example to induce a template slot for a candidate text. Support Vector Ma-
chine (SVM), as discussed in [11] and [12], represents one of the most known
learning paradigm for classification, based on Statistical Learning Theory. Given
training instances, each one associated with a class and a set of “features”, i.e.
1 OpenCalais: http://viewer.opencalais.com/
Search WWH ::




Custom Search