Information Technology Reference
In-Depth Information
Table 10.4 Features for the
“Gov2” corpus
ID
Feature description
t i q d TF(t i ,d) in body
1
t i q d TF(t i ,d) in anchor
2
t i q d TF(t i ,d) in title
3
t i q d TF(t i ,d) in URL
4
t i q d TF(t i ,d) in the whole document
5
t i q IDF(t i ) in body
6
t i q IDF(t i ) in anchor
7
t i q IDF(t i ) in title
8
t i q IDF(t i ) in URL
9
t i q IDF(t i ) in the whole document
10
t i q d TF(t i ,d)
11
·
IDF(t i ) in body
t i q d TF(t i ,d) ·
12
IDF(t i ) in anchor
t i q d TF(t i ,d)
13
·
IDF(t i ) in title
t i q d TF(t i ,d)
14
·
IDF(t i ) in URL
t i q d TF(t i ,d) · IDF(t i ) in the whole document
15
16
LEN(d) of body
17
LEN(d) of anchor
18
LEN(d) of title
19
LEN(d) of URL
20
LEN(d) of the whole document
21
BM25 of body
22
BM25 of anchor
23
BM25 of title
24
BM25 of URL
25
BM25 of the whole document
26
LMIR.ABS of body
27
LMIR.ABS of anchor
28
LMIR.ABS of title
29
LMIR.ABS of URL
30
LMIR.ABS of the whole document
31
LMIR.DIR of body
32
LMIR.DIR of anchor
33
LMIR.DIR of title
34
LMIR.DIR of URL
35
LMIR.DIR of the whole document
36
LMIR.JM of body
37
LMIR.JM of anchor
38
LMIR.JM of title
39
LMIR.JM of URL
40
LMIR.JM of the whole document
41
PageRank
42
Inlink number
43
Outlink number
44
Number of slash in URL
45
Length of URL
46
Number of child page
Search WWH ::




Custom Search