Information Technology Reference
In-Depth Information
Table 12 The maximum frequent itemset that user searches
N o
itemset
1
computer, programming language, algorithm, derivative
We propose the new point of view: “ The maximum frequent itemsets are
considered as documents and the classes of such documents are considered as
user interests ”. Such documents may be called interesting documents. Which
classes such interesting documents belong to are user interests. It means that
discovering user's interests involves in classifying interesting documents. Suppose
we have a set of classes C = { computer science, math }, a set of terms T =
{ computer, programming language, algorithm, derivative } and the set of
classification rules in table 6. Each maximum frequent itemset that user searches
is modeled as a document vector (so-called interesting document vector or user
interest vector) whose elements are the support of its member items. Note that the
supports of such items are showed in table 8.
Table 13 Interesting document vector
N o
vector
1
(computer=4, programming language= 2 , algorithm= 2 , derivative= 2 )
Table 14 Interesting document vector is normalized
vector
N o
1
(computer= 0.4 , programming language= 0.2 , algorithm= 0.2 , derivative= 0.2 )
Table 15 Nominal interesting document vector
N o vector
1 (computer= medium , programming language= medium , algorithm= medium ,
derivative= medium )
It is possible to use SVM or decision tree or neural network to classify
documents. Hence we use decision tree as sample classifier for convenience
because we intend to re-use classification rules in section III. Otherwise we must
determine the weight vector W * if applying SVM approach. However SVM
approach is more powerful than decision tree with regard to document
classification in case of huge training data.
Applying classification rule 2 , the interesting document belongs to class
compute science because the frequency of “derivative” and “computer” are
medium and medium , respectively. So we can state that user U has only one
interest: computer science .
Note that in case of using neural network for document classification,
interesting document vector is specified as Boolean document vector (or Boolean
Search WWH ::




Custom Search