Geoscience Reference

In-Depth Information

the document i). The score of a retrieved document is, for example, determined by

calculating the scalar product [BAE 99, GOK 09].

Tile frequency

(TF)

freq(t,Du)

i=1
freq(t
i
)

W(t,Du)=TF(t,Du)=

NDu

NDu
t

TF·IDF

W(t,Du)=TF(t,Du) ∗ IDF(t) with IDF(t)=log

(k
1
+1)∗TF(t,Du)

(K+TF(t,Du))

with K = k
1
∗ [(1 − b)+
b∗n

OkapiBM25

W(t,Du)=

advl
]

W(t,Du)=TFp(t,Du)=
freqP(t,Du)

TFp

i=1
freq(t
i
)

freq(t,Du): frequency of the tile t in the document unit Du

freqP(t,Du): continuous frequency of the tile t in the document unit Du

n: number of tiles in the document unit Du

i=1
freq(t
i
):cumulated number of occurrences of tiles in the document unit Du

NDu
t
:number of document units related to the tile t

NDu:number of document units, k
1
= 1.2

b = 0.75, advl = 900

Table 3.3. Weighting formulas applied to the standardized indexes, for a tile t

and a document unit Du - taken from [PAL 10d]

T
1

T
2
... T
t

D
1

D
2

.

D
n

w
11
w
21
... w
t1

w
21
w
22
... w
t2

. . .

w
n1
w
n2
... w
tn

Table 3.4. Vectorial model: document-tile matrix

Giventhattheinformationcanberepresentedviadifferentlevelsofgeneralization,

the proposed multi-level tiling allows us to use the index of tiles most adapted to the

range of the user's query.

Several tests described in [PAL 10a] will be discussed in section 3.5. These will

mainly allow us to verify that the loss of precision due to tiling does not degrade the

Search WWH ::

Custom Search