Retrieving Wiki Content Using an Ontology - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

A topic in a wiki corresponds to a document. It is composed by an article and a

discussion. Regarding relevance, a calculation is done for the article and other for

the correspondent discussion. The higher relevance grade of both is considered to

be the topic relevance.

In order to decide if topic part is relevant, the information retrieval vector

model is used. Considering a set of documents and a query to retrieve the most

relevant documents, each document is represented as a vector. Each vector ele-

ment corresponds to a separate term in the document set upon which the query is

performed. If a term occurs in a specific document, its value in the corresponding

vector is non-zero. There are different ways of computing term values, also known

as term weights. Considering n as the amount of terms in a vector of a specific

document set, each vector can be seen as a point in a n -dimensional space. Simi-

larly a vector is defined for the query, as it was a document. The similarity of one

document and the query can be measured by the distance of their correspondent n -

space points.

The information retrieval vector model is in essential a classification model that

allows handling large volumes of data [6].

The mathematic formulas of the classic vector model were modified in order to

consider the semantic nature of the ontology elements, such as the case of a con-

cept that is related to other concepts. The relevance of several associated semantic

terms should be weighted higher than the relevance of a isolated one.

3 Ontology Definition

A tool was designed to retrieve the recent and relevant content of a wiki, whose

location should be informed together with the ontology to be considered.

Using OWL, Annotation and Object Properties , synonyms, related verbs, and

relevance weighting adjustments can be incorporated into each of the ontology

classes in order to allow the calculation of relevance grades.

3.1 Classes and Instances

The Protégé editor allows the creation and maintenance of classes, subclasses and

instances in an ontology. Class names should be keywords that reference main

concepts in the domain of interest.

Relevance grades for a given class are calculated according to the depth of the

class in the ontology hierarchy. For instance, in a three level class hierarchy, a first

level class receives a 0.33 relevance weight. A second level class receives a 0.66

and the leaf classes in the hierarchy tree receive a 1.

The developed tool, when querying classes, considers composed words through

underscore identification or through the CamelBackCase syntax.

3.2 Annotation Properties

Three annotation properties were chosen to extend the semantic meaning of

classes or instances in an OWL ontology. Their meanings are:

Search WWH ::

Custom Search

Home