Retrieving Wiki Content Using an Ontology - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

The initial reason for the insertion of that factor is that the result would usually be

a low number. Thus, the second instance factor of 0.5 produces a half normalized

frequency f k,j and a free half value is added to it. This will increase the value of a

term that appears more than once in a query.

There was no real reason to consider a constant value in the term weight for-

mula considering the semantic scenario, because the semantic concepts are ob-

tained from the ontology structure. Keyword repetition inside the ontology does

not mean that a concept is more relevant in the domain of interest. What really

matters is how concepts appear in the ontology hierarchy and how they relate to

others.

The semantic weight should be considered with a greater weight than others

factors. Thus, the 0.5 factor was substituted by a new factor based on it, as can be

observed in (9).

w k, q = (sw k + (1 - sw k ) * f i, D ) * idf k * sw k (9)

This modification allows the semantic weight to be higher than the normalized

frequency f i,Dj and than the inverse document frequency idf k , considering the

document set, once the relevance value is high.

4.8 Similarity between a Document and a Query

To determine the similarity between two documents or between a query and a

document, the classic vector model uses the fact that, when the angle between two

vectors is very small, the cosine of these two vectors approaches one. The formula

used to calculate the cosine can be seen in (10).

sim(d j , q cf ) = (d j ● q cf ) / (d j × q cf ) (10)

For the semantic approach the same formula is used. Because a concept vector is

created for each class family and also for each topic part, this similarity calcula-

tion should be applied several times (number of class families times the amount of

topic parts).

4.9 Considering Object Properties

Object properties establish dependency relations between classes or instances.

Each of those will produce for each document equivalent d j , a new weight factor,

which will receive as result the multiplication between the two similarities in-

volved, i.e., similarity between d j and the first class family q cf1 and similarity be-

tween d j and the second class family q cf2 , (11).

w objP(cf 1, cf 2), j = sim(d j , q cf 1 ) * sim (d j , q cf 2 ) (11)

4.10 Final Ranking

The final raking defines what topics are more relevant regarding the class families

in the ontology. This is done through a normalization of factors to obtain the final

relevance grade for each topic part. Arithmetic averages are obtained for similarity

Search WWH ::

Custom Search

Home