Geoscience Reference
In-Depth Information
used for the matching process. Other modes of expression, mainly graphical, are
proposed in [LES 07] and [PAL 10a]. The system retrieves document fragments
containing spatially or temporally relevant noun phrases with respect to the query.
2.4.5.1. Spatial IR
We have tested the models of similarity measure discussed in section 2.3.6 and
composed, in an empirical way, a derived ad hoc model adapted to the corpus. This
spatial IR process passes through the following steps:
1) Interpretation of the query. Thequeryisinterpretedusingthesameprocessflow
as for indexing. SFs are detected and then symbolic and numeric interpretations are
calculated.
2) Calculation of the set of results. Let Set
req
be the set of SFs annotated in
the query and Set
doc
the set of SFs annotated in the document. We have Set
req
=
{SF
req
} and Set
doc
= {SF
doc
}. Then, we calculate Set
res
, which is the set of
SFs of Set
doc
for which the overlapping of their spatial representation and that of
an SF of Set
req
is not empty. We have Set
res
= {SF
doc
} with SF
doc
∈ Set
doc
and
∃ SF
req
∈ Set
req
suchthat representation(SF
doc
)∩representation(SF
req
) = ∅.
The result of the query contains the set of document fragments to which the SFs of
Set
res
belong.
3) Calculation of the relevance score of each document fragment in the set of
results. We use the characteristics detailed in Figure 2.3 to measure the similarity
between a document fragment D
f
and a query Q [SAL 07a].
Similarity(D
f
,Q)=
Precision(D
f
,Q)+Overlapping(D
f
,Q)
2+Distance(D
f
,Q)
[2.7]
Equation [2.7] uses notions of precision, overlapping and distance. The precision
score [2.8] evaluates the relevance of a document, in other words if the surface of
overlapping O (Figure 2.3: overlapping area, considered to be relevant) occupies a
largepartofthesurfaceofdocument D
f
.Inasimilarway,the overlapping score[2.8]
evaluates the ratio of the surface of query Q occupied by the surface of overlapping
area O. The higher this score, the more significant the corresponding document is.
Finally,the distance score[2.8]evaluatesthedistanceofthecentroidsofquery Q and
that of the surface of overlapping area O. The closer these centroids are to each other,
the more relevant the document fragment D
f
is.
Precision(D
f
,Q)=
O
D
f
Overlapping(D
f
,Q)=
O
Q
[2.8]
Distance(D
f
,Q)=
d
D
Search WWH ::
Custom Search