Information Technology Reference
In-Depth Information
Although common in all texts, a concept with highly semantic weight should
not have a null
idf
value. To avoid this distortion, constants were included in the
idf
formula, to maintain a behavior close to the original without allowing a null re-
sult, as can be seen in (3).
idf
k
= log ((N + 2) / (n
k
+ 1))
(3)
The new constants in (3) do not cause a significant difference for those cases that
present non-null results in (2). This is obtained due to the expected magnitude of
N
and
n
k
, corresponding to a large number of documents in the collection and any
frequency of the considered concept.
4.5 Normalized Frequency
For each concept
k
of each document equivalent (class family -
d
j
), it is counted
the number of occurrences of this concept,
freq
k,j
. All the frequencies are normal-
ized to a value between 0 and 1, through the division by
maxword
j
, which is the
greatest frequency of all concepts (terms) in the document equivalent. This calcu-
lation does not suffer any modification, remaining the same as in the original
model, (4).
f
k, j
= freq
k, j
/ maxword
j
(4)
The same rationale is used to calculate normalized frequencies of concepts in each
document equivalent
d
j
, for each concept considering all document equivalents
D
(the entire ontology). This normalization is equal in the original model, (5).
f
k, D
= freq
k, D
/ maxword
D
(5)
4.6 Concept Weight in Documents
The classic vector model considers the normalized frequency
f
k,j
and the inverse
document frequency
idf
k
in order to obtain the term weight regarding a specific
document, as can be observed in (6).
w
k, j
= f
k, j
* idf
k
(6)
For the semantic approach, it was included the semantic weight as a multiplying
factor, (7), in order to influence the result value.
w
k, j
= f
k, j
* idf
k
* sw
k
(7)
4.7 Concept Weight for Queries
In the original model, to obtain the term weight for a query, the normalized fre-
quency would need to be calculated over the entire document set, not just over a
single document. There is the possibility to insert an extra factor in this calcula-
tion. This factor can be seen in (8) with the value of 0.5.
w
k, q
= (0.5 + 0.5 * f
k, D
) * idf
k
(8)