Information Technology Reference
In-Depth Information
Although common in all texts, a concept with highly semantic weight should
not have a null idf value. To avoid this distortion, constants were included in the
idf formula, to maintain a behavior close to the original without allowing a null re-
sult, as can be seen in (3).
idf k = log ((N + 2) / (n k + 1)) (3)
The new constants in (3) do not cause a significant difference for those cases that
present non-null results in (2). This is obtained due to the expected magnitude of
N and n k , corresponding to a large number of documents in the collection and any
frequency of the considered concept.
4.5 Normalized Frequency
For each concept k of each document equivalent (class family - d j ), it is counted
the number of occurrences of this concept, freq k,j . All the frequencies are normal-
ized to a value between 0 and 1, through the division by maxword j , which is the
greatest frequency of all concepts (terms) in the document equivalent. This calcu-
lation does not suffer any modification, remaining the same as in the original
model, (4).
f k, j = freq k, j / maxword j (4)
The same rationale is used to calculate normalized frequencies of concepts in each
document equivalent d j , for each concept considering all document equivalents D
(the entire ontology). This normalization is equal in the original model, (5).
f k, D = freq k, D / maxword D (5)
4.6 Concept Weight in Documents
The classic vector model considers the normalized frequency f k,j and the inverse
document frequency idf k in order to obtain the term weight regarding a specific
document, as can be observed in (6).
w k, j = f k, j * idf k (6)
For the semantic approach, it was included the semantic weight as a multiplying
factor, (7), in order to influence the result value.
w k, j = f k, j * idf k * sw k (7)
4.7 Concept Weight for Queries
In the original model, to obtain the term weight for a query, the normalized fre-
quency would need to be calculated over the entire document set, not just over a
single document. There is the possibility to insert an extra factor in this calcula-
tion. This factor can be seen in (8) with the value of 0.5.
w k, q = (0.5 + 0.5 * f k, D ) * idf k (8)
Search WWH ::




Custom Search