Discovering User Interests by Document Classification - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

Discovering User Interests by Document

Classification

Loc Nguyen *

User interest is one of personal traits attracting researchers' attention in user

modeling and user profiling. User interest competes with user knowledge to

become the most important characteristics in user model. Adaptive systems need

to know user interests so that provide adaptation to user. For example, adaptive

learning systems tailor learning materials (lesson, example, exercise, test…) to

user interests. I propose a new approach for discovering user interest based on

document classification. The basic idea is to consider user interests as classes of

documents. The process of classifying documents is also the process of

discovering user interests. There are two new points of view:

−

The series of user access in his/her history are modeled as documents. So user

is referred indirectly to as “document”.

User interests are classes such documents are belong to.

−

Our approach includes four following steps:

1. Documents in training corpus are represented according to vector model . Each

element of vector is product of term frequency and inverse document

frequency. However the inverse document frequency can be removed from

each element for convenience.

2. Classifying training corpus by applying decision tree or support vector

machine or neural network. Classification rules (weight vectors W * ) are drawn

from decision tree (support vector machine). They are used as classifiers.

3. Mining user's access history to find maximum frequent itemsets . Each

itemset is considered a interesting document and its member items are

considered as terms. Such interesting documents are modeled as vectors.

4. Applying classifiers (see step 3) into these interesting documents in order to

choose which classes are most suitable to these interesting documents. Such

classes are user interests .

This approach bases on document classification but it also relates to information

retrieval in the manner of representing documents. Hence section 1 discusses

about vector model for representing documents. Support vector machine, decision

tree and neural network on document classification are mentioned in section 2, 3, 4.

Search WWH ::

Custom Search

Home