Information Technology Reference
In-Depth Information
user vector). For example, given threshold α = 0.4 , if the frequency of a term j in
document i is equal or greater than α , we consider that term j exists in document i .
Otherwise there is no existence of term j in document i . We have a Boolean
vector. So user U is modeled as a Boolean document vector. Such vector is also
called Boolean user vector: U = (1, 0, 0, 0) . According to table III.3.19, we have
Table 16 Boolean document vector (or Boolean user vector)
Term Existence
computer 1
programming language 0
algorithm 0
derivative 0
The Boolean user vector is considered as a document and the classes of such
document are considered as user interests”. Now document (user Boolean vector)
U becomes a data tuple which is fed to trained neural network in figure III.3.10. It
is easy to know the class of document U by checking the value of output unit in
trained neural network. Suppose such output value is 0 , we can infer that
document U belongs to class “ computer science ”. So the interest of user U is
computer science ”.
6 Evaluation
Our approach includes following steps:
Documents are represented as vectors
Classifying documents by using decision tree or support vector machine or
neural network
Mining user's access history to find maximum frequent itemsets. Each
itemset is considered an interesting document
Applying classifiers into interesting documents in order to find their suitable
classes. These classes are user interests.
Two new points of view are inferred from these steps:
The series of user access in his/her history are modeled as documents. So
user is referred indirectly to as document.
User interests are classes that such documents are belong to
The technique of constructing vector model for representing document is not
important to this approach. There are some algorithms of text segmentation for
specifying all terms in documents. From this, it is easy to build up document
vectors by computing term frequency and inverse document frequency. However
the concerned techniques of document classification such as SVM, decision tree
and neural network influence extremely on this approach. SVM and neural
Search WWH ::




Custom Search