Discovering User Interests by Document Classification - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

user vector). For example, given threshold α = 0.4 , if the frequency of a term j in

document i is equal or greater than α , we consider that term j exists in document i .

Otherwise there is no existence of term j in document i . We have a Boolean

vector. So user U is modeled as a Boolean document vector. Such vector is also

called Boolean user vector: U = (1, 0, 0, 0) . According to table III.3.19, we have

Table 16 Boolean document vector (or Boolean user vector)

Term Existence

computer 1

programming language 0

algorithm 0

derivative 0

The Boolean user vector is considered as a document and the classes of such

document are considered as user interests”. Now document (user Boolean vector)

U becomes a data tuple which is fed to trained neural network in figure III.3.10. It

is easy to know the class of document U by checking the value of output unit in

trained neural network. Suppose such output value is 0 , we can infer that

document U belongs to class “ computer science ”. So the interest of user U is

“ computer science ”.

6 Evaluation

Our approach includes following steps:

−

Documents are represented as vectors

− Classifying documents by using decision tree or support vector machine or

neural network

−

Mining user's access history to find maximum frequent itemsets. Each

itemset is considered an interesting document

− Applying classifiers into interesting documents in order to find their suitable

classes. These classes are user interests.

Two new points of view are inferred from these steps:

− The series of user access in his/her history are modeled as documents. So

user is referred indirectly to as document.

− User interests are classes that such documents are belong to

The technique of constructing vector model for representing document is not

important to this approach. There are some algorithms of text segmentation for

specifying all terms in documents. From this, it is easy to build up document

vectors by computing term frequency and inverse document frequency. However

the concerned techniques of document classification such as SVM, decision tree

and neural network influence extremely on this approach. SVM and neural

Search WWH ::

Custom Search

Home