Collective Classification for Text Classification - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

3.3.2 Gibbs Sampling

Gibbs sampling (GS) (12) is widely regarded as one of the most accurate

approximate inference procedures. It was originally proposed in (10) in the

context of image restoration. Unfortunately, it is also very slow and one of the

common issues while implementing GS is to determine when the procedure

has converged. Even though there are tests that can help one determine

convergence, they are usually too expensive or complicated to implement.

Researchers in collective classification (29; 22; 24) have developed a version

of Gibbs sampling that is easy to implement and faster than traditional GS.

The basic idea behind this algorithm is to assume, just like in the case of

ICA, that we have access to a local classifier f that can sample for the best

label estimate for Y i given all the values for the nodes in

N i . We keep doing

this repeatedly for a fixed number of iterations (a period known as “burn-

in”). After that, not only do we sample for labels for each Y i ∈Y

but we

also maintain count statistics as to how many times we sampled label l for

node Y i . After collecting a predefined number of such samples we output the

best label assignment for node Y i by choosing the label that was assigned the

maximum number of times to Y i while collecting samples. The pseudo-code

for GS is shown in Algorithm 2 . For all our experiments that we report later,

we set burn-in to 200 iterations and collected 800 samples.

3.3.3 Local Classifiers and Further Optimizations

One of the benefits of both ICA and GS is the fact that it is fairly simple to

make use of any local classifier. Some of the classifiers used included: na ıve

Bayes ((7; 28)), logistic regression ((21)), decision trees (14) and weighted-

vote ((22)). There is some evidence to indicate that discriminatively trained

local classifiers such as logistic regression tend to produce higher accuracies

than others; this is consistent with results in other areas.

Other aspects of ICA that have been the subject of investigation include the

ordering strategy to determine in which order to visit the nodes to relabel in

each ICA iteration. There is some evidence to suggest that ICA is fairly robust

to a number of simple ordering strategies such as random ordering, visiting

nodes in ascending order of diversity of its neighborhood class labels and

labeling nodes in descending order of label confidences (11). However, there

is also some evidence that certain modifications to the basic ICA procedure

tend to produce improved classification accuracies. For instance, both (28)

and (24) propose a strategy where only a subset of the unobserved variables are

utilized as inputs for feature construction. More specifically, in each iteration,

they choose the top-k most confident predicted labels and use only those

unobserved variables in the following iteration's predictions, thus ignoring

the less confident predicted labels. In each subsequent iteration they increase

the value of k so that in the last iteration all nodes are used for prediction.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home