Geoscience Reference

In-Depth Information

Binary frequency consists of counting the number of initial representations of the

information (objects) in intersection with the tile (an object can cover all or a part

of several tiles). Thus, each representation in intersection with a tile increments the

frequencyofthelatterby1.Proportionalfrequencyisbasedontheratioofoverlapping

between an object and a tile: the frequency is thus incremented by a value between 0

and1.Table3.2detailstheformulasdedicatedtothecalculationofthesetwotypesof

frequencies.

Binary frequency freq(T
i
)=
j=1
freq(T
i
,O
j
)

Proportional frequency freqP(T
i
)=
j=1
freq(T
i
,O
j
)∗
Surf (T
i
,O
j
)

1

NbTiles(O
j
)

Surf (T
i
)
∗

Table 3.2. Formulas for calculating the frequency of a tile T
i
- p: the number of objects

in the initial index, freq(T
i
,O
j
): frequency of the object O
j
in the tile T
i
(intersection),

Surf(T
i
,O
j
): surface of the object O
j
in the tile T
i
, Surf(T
i
): surface of the tile T
i
,

NbTiles(O
j
): number of tiles in intersection with the object O
j
- taken from [PAL 10d]

We then use these frequencies for calculating the weight associated with each tile

invokedinagivendocumentunit.Weusefourformulasofweighting.TF,TF·IDFand

OkapiBM25 [MAN 08b] are applied to the weighting of tiles from binary frequency

calculations. TF
p
, an adaptation of TF, is applied to the weighting of tiles from

proportional frequency calculations.

Table 3.3 presents these different formulas. The normalization of the frequencies

calculatedfor TFand TF
p
mustbe noted: thus, theweight ofa tilein a documentunit

is divided by the total number of tiles invoked in this same unit.

Thesestandardizedindexescontain,foreachtile,alistoftuplesmainlycomposed

oftheirweight(TF,TF.IDF,OkapiBM25andTF
p
),theidentifierofthecorresponding

document and paragraph (document unit).

It is now possible to apply the IR models allowing the use of such generalized

indexes composed of tiles.

3.4.2. Spatial and temporal IR applied to tiling: PIV
2

The vectorial model of Salton [SAL 71, SAL 75], well-tried in IR, gives good

results[BAE 99].Weapplyittospatialandtemporaltiles: thisconsistsofrepresenting

a set of tiles describing a document in the form of a first vector as well as the set of

tilescorrespondingtoaqueryintheformofasecondvector,thencomparingthesetwo

vectors.Thedocumentrepositoryisthusdescribedbyamatrix, asshowninTable3.4

(D corresponds to a document, T to a tile and w
ij
to the weight of the tile j for

Search WWH ::

Custom Search