Database Reference
In-Depth Information
8.3.1.1 Boolean models
The Boolean model is the simplest retrieval model based on Boolean algebra
and set theory. The concept is very simple and intuitive. The drawbacks
of the Boolean model are in two aspects: 1) The users may have diculty
to express their information needs using Boolean expressions; and 2) The
retrieval system can hardly rank documents since a document is predicted to
be either relevant or non-relevant without any notion of degree of relevance.
Nevertheless, the Boolean model is widely used in commercial search engines
because of its simplicity and eciency. How to use relevance feedback from
the user to refine a Boolean query is not straightforward, so the Boolean model
was extended for this purposes (34).
8.3.1.2
Vector space models
The vector model is a widely implemented IR model, most famously built
in the SMART system (52). It represents documents and user queries in a
high dimensional space indexed by “indexing terms,” and assumes that the
relevance of a document can be measured by the similarity between it and
the query in the high dimensional space (51). In the vector space framework,
relevance feedback is used to reformulate a query vector so that it is closer to
the relevant documents, or for query expansion so that additional terms from
the relevant documents are added to the original query. The most famous
algorithm is the Rocchio algorithm (50), which represents a user query using
a linear combination of the original query vector, the relevant documents
centroid, and the non-relevant documents centroid.
A major criticism for the vector space model is that its performance depends
highly on the representation, while the choice of representation is heuristic
because the vector space model itself does not provide a theoretical framework
on how to select key terms and how to set weights of terms.
8.3.1.3
Probabilistic models
Probabilistic models , such as the Binary Independence Model (BIM) ((44)),
provide direct guidance on term weighting and term selection based on proba-
bility theory. In these probabilistic models, the probability of a document d is
relevant to a user query q is modelled explicitly (43) (44) (23). Using relevance
feedback to improve parameter estimation in probabilistic models is straight-
forward according to the definition of the models, because they presuppose
relevance information.
In recent decades many researchers proposed IR models that are more gen-
eral, while also explaining already existing IR models. For example, Inference
networks have been successfully implemented in the well known INQUERY
retrieval system (57). Bayesian networks extend the view of inference net-
works. Both models represent documents and queries using acyclic graphs.
Unfortunately, both models do not provide a sound theoretical framework to
Search WWH ::




Custom Search