Information Technology Reference
In-Depth Information
In this evaluation, we considered a set of 290 requisites and 1,474 functionali-
ties of the Combat Management System (CMS). The mean number of function-
alities for each
r
was 5, with a standard deviation of about 3.5. For each
r
i
,the
set
F
i
specifies all functionalities realizing
r
i
: these are the gold standard, i.e.
the set of texts expected to be retrieved by the analyst querying by
r
i
.Asthey
are short texts, individual
r
i
as well as
f
j
are modeled according to the
Com-
prehensive
model, i.e. the
BoW + N-POS + N-Words
vector representation, as
it achieves the best results in the RA discussed in Section 4.1.
The information acquired during the RA phase is here exploited in order to
define a
Re-ranking
phase: the ranking provided by the semantic similarity func-
tion is thus adjusted to filter out all those functionalities that do not share the
same characterization of the target requirement
r
i
, i.e. the same
type
and
capa-
bility
, as discussed in Section 4.1. Four different retrieval strategies are applied,
giving rise to four IR systems:
- NoFilter
:foreach
r
i
, the most similar
f
j
are retrieved and ranked according
to
sim
: no filter is applied.
-Type
: the ranking provided by
sim
is grouped in two lists: the first, ranked
higher, is made by functionalities sharing the same type of
r
i
and a second list
including the remaining
f
j
whose type is different. In this way functionalities
f
j
of the same type of
r
i
are always ranked first than the other ones.
- Capability
: the two lists are created as before with respect to the capability
assigned to the target
r
i
, so that functionalities with the same capabilities
of
r
i
are ranked first;
- Type+Capability
: the ranking provided by
sim
is modified as before ac-
cording to the sharing the both type
and
capability of
r
i
.
Different strategies are evaluated according to standard IR evaluation metrics:
Precision
(
P
),
Recall
(
R
),
F-measure
(
F
1) and
Mean Average Precision
(
MAP
).
Precision is expressed as
P
=
tp
tp
+
fp
,where
tp
is the number of the relevant
functionalities retrieved, and
fp
is the number of the not relevant functionalities
retrieved. Recall is expressed as
R
=
tp
tp
+
fn
,where
fn
is the number of the
relevant functionalities not retrieved. While Precision estimates the capacity to
retrieve correct functionalities, Recall is more interesting in this scenario as it
measures system capacity to retrieve all existing functionalities; in many cases,
it is more important to retrieve all existing software instead of spending more
time reading useless documentation. F-measure consider both aspects as it is
estimated as the harmonic mean of Precision and Recall:
F
1=
2
·P·R
P
+
R
Finally,
MAP
provides a single accuracy measure across different recall levels.
MAP
is based on the oracle given by
RF
=
that are pairs of a requisite
r
i
and a functionality set
F
i
. Every requisite
r
i
also corresponds to a ranked list
of retrieved functionalities, ordered according to the similarity function
sim
.Let
F
i
be the list of retrieved functionalities
f
j
from the top result (i.e.
f
1
,ranked
as the closest by the system) to the
f
k
that corresponds to the position where
k
-th members of the functionalities in
F
i
results all returned. In this way, the
{r
i
,F
i
}
Search WWH ::
Custom Search