Adaptive Information Filtering - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

documents follow an exponential distribution. Training data can be used to

estimate the model parameters, and the threshold can be found by optimizing

the expected utility under the estimated model (7). However, an adaptive

filtering system only receives feedback for documents delivered/rated by the

user; thus model estimation techniques based on random sampling assumption

usually lead to biased estimation and should be avoided (72).

8.3.2.2

Filtering as text classification

Text classification is another well studied area. A typical classification

system learns a classifier from a labeled training dataset, and then classifies

unlabeled testing documents into different classes. A popular approach is to

treat filtering as a text classification task by defining two classes: relevant

vs. non-relevant. The filtering system learns a user profile as a classifier

and delivers a document to the user if the classifier thinks it is relevant or

the probability of relevance is high. The state of the art text classification

algorithms, such as support vector machines (SVM), K nearest neighbors (K-

NN), neural networks, logistic regression, and Winnow, have been used to

solve this binary classification task (32) (13) (46) (64) (71)(54) (38) (61) (30)

(55).

Instead of minimizing classification error, an adaptive filtering system needs

to optimize the standard evaluation measure, such as a user utility. For ex-

ample, in order to optimize the utility measure T 11 U =2 R +

N + (Equation

8.4), a filtering system usually delivers a document to the user if the prob-

ability of relevance is above 67% (45). Some machine learning approaches,

such as logistic regression or neural networks, estimate the probability of rel-

evance directly, which makes it easier to make the binary decision of whether

to deliver a document.

Many standard text classification algorithms do not work well for a new

user, which usually means no or few training data points. Some new ap-

proaches have been developed for initialization. For example, researchers have

found that retrieval techniques, such as Rocchio, work well at the early stage

of filtering when the system has very few training data. Statistical text classi-

fication techniques, such as logistic regression, work well at the later stage of

filtering when the system has accumulated enough training data. Techniques

have been developed to combine different algorithms, and their results are

promising (71). Yet another example discussed in the following section is to

initialize the profile of a new user based on training data from existing users.

It is worth mentioning that when adapting a text classification technique

to the adaptive filtering task, one needs to pay attention that the classes are

extremely unbalanced, because most documents are not relevant. The fact

that the training data are not sampled randomly is also a problem that has

not been well studied.

−

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home