Geoscience Reference
In-Depth Information
Binary frequency consists of counting the number of initial representations of the
information (objects) in intersection with the tile (an object can cover all or a part
of several tiles). Thus, each representation in intersection with a tile increments the
frequencyofthelatterby1.Proportionalfrequencyisbasedontheratioofoverlapping
between an object and a tile: the frequency is thus incremented by a value between 0
and1.Table3.2detailstheformulasdedicatedtothecalculationofthesetwotypesof
frequencies.
Binary frequency freq(T i )= j=1 freq(T i ,O j )
Proportional frequency freqP(T i )= j=1 freq(T i ,O j )∗ Surf (T i ,O j )
1
NbTiles(O j )
Surf (T i )
Table 3.2. Formulas for calculating the frequency of a tile T i - p: the number of objects
in the initial index, freq(T i ,O j ): frequency of the object O j in the tile T i (intersection),
Surf(T i ,O j ): surface of the object O j in the tile T i , Surf(T i ): surface of the tile T i ,
NbTiles(O j ): number of tiles in intersection with the object O j - taken from [PAL 10d]
We then use these frequencies for calculating the weight associated with each tile
invokedinagivendocumentunit.Weusefourformulasofweighting.TF,TF·IDFand
OkapiBM25 [MAN 08b] are applied to the weighting of tiles from binary frequency
calculations. TF p , an adaptation of TF, is applied to the weighting of tiles from
proportional frequency calculations.
Table 3.3 presents these different formulas. The normalization of the frequencies
calculatedfor TFand TF p mustbe noted: thus, theweight ofa tilein a documentunit
is divided by the total number of tiles invoked in this same unit.
Thesestandardizedindexescontain,foreachtile,alistoftuplesmainlycomposed
oftheirweight(TF,TF.IDF,OkapiBM25andTF p ),theidentifierofthecorresponding
document and paragraph (document unit).
It is now possible to apply the IR models allowing the use of such generalized
indexes composed of tiles.
3.4.2. Spatial and temporal IR applied to tiling: PIV 2
The vectorial model of Salton [SAL 71, SAL 75], well-tried in IR, gives good
results[BAE 99].Weapplyittospatialandtemporaltiles: thisconsistsofrepresenting
a set of tiles describing a document in the form of a first vector as well as the set of
tilescorrespondingtoaqueryintheformofasecondvector,thencomparingthesetwo
vectors.Thedocumentrepositoryisthusdescribedbyamatrix, asshowninTable3.4
(D corresponds to a document, T to a tile and w ij to the weight of the tile j for

Search WWH ::

Custom Search