Information Technology Reference
In-Depth Information
The initial reason for the insertion of that factor is that the result would usually be
a low number. Thus, the second instance factor of 0.5 produces a half normalized
frequency f k,j and a free half value is added to it. This will increase the value of a
term that appears more than once in a query.
There was no real reason to consider a constant value in the term weight for-
mula considering the semantic scenario, because the semantic concepts are ob-
tained from the ontology structure. Keyword repetition inside the ontology does
not mean that a concept is more relevant in the domain of interest. What really
matters is how concepts appear in the ontology hierarchy and how they relate to
others.
The semantic weight should be considered with a greater weight than others
factors. Thus, the 0.5 factor was substituted by a new factor based on it, as can be
observed in (9).
w k, q = (sw k + (1 - sw k ) * f i, D ) * idf k * sw k (9)
This modification allows the semantic weight to be higher than the normalized
frequency f i,Dj and than the inverse document frequency idf k , considering the
document set, once the relevance value is high.
4.8 Similarity between a Document and a Query
To determine the similarity between two documents or between a query and a
document, the classic vector model uses the fact that, when the angle between two
vectors is very small, the cosine of these two vectors approaches one. The formula
used to calculate the cosine can be seen in (10).
sim(d j , q cf ) = (d j ● q cf ) / (d j × q cf ) (10)
For the semantic approach the same formula is used. Because a concept vector is
created for each class family and also for each topic part, this similarity calcula-
tion should be applied several times (number of class families times the amount of
topic parts).
4.9 Considering Object Properties
Object properties establish dependency relations between classes or instances.
Each of those will produce for each document equivalent d j , a new weight factor,
which will receive as result the multiplication between the two similarities in-
volved, i.e., similarity between d j and the first class family q cf1 and similarity be-
tween d j and the second class family q cf2 , (11).
w objP(cf 1, cf 2), j = sim(d j , q cf 1 ) * sim (d j , q cf 2 ) (11)
4.10 Final Ranking
The final raking defines what topics are more relevant regarding the class families
in the ontology. This is done through a normalization of factors to obtain the final
relevance grade for each topic part. Arithmetic averages are obtained for similarity
Search WWH ::




Custom Search