Collective Classification for Text Classification - Text Mining: Classification, Clustering, and Applications - page 63

Database Reference

In-Depth Information

TABLE 3.3: Accuracy results for the CiteSeer dataset. CC

algorithms significantly outperformed their CO counterparts except for

ICA-NB and GS-NB for matched cross-validation. CO and CC algorithms

based on LR outperformed the NB versions, but the differences were not

significant. ICA-NB outperformed GS-NB significantly for SS; but, the

rest of the differences between LR versions of ICA and GS, LBP and MF

were not significant.

Algorithm

SS

RS

M

0 . 7427

0 . 7487

0 . 7646

CO-NB

ICA-NB

0 . 7540

0 . 7683

0 . 7752

GS-NB

0 . 7596

0 . 7680

0 . 7737

CO-LR

0 . 7334

0 . 7321

0 . 7532

ICA-LR

0 . 7629

0 . 7732

0 . 7812

GS-LR

0 . 7574

0 . 7699

0 . 7843

LBP

0 . 7663

0 . 7759

0 . 7843

MF

0 . 7657

0 . 7732

0 . 7888

used and we did not have to tune the initializations for these two algorithms.

They were the easiest to train and test among all the collective classification

algorithms evaluated.

Third, ICA and GS produced very similar results for almost all experiments.

However, ICA is a much faster algorithm than GS. In our largest dataset,

CiteSeer, for example, ICA-NB took 14 minutes to run while GS-NB took

over 3 hours. The large difference is due to the fact that ICA converges in just

a few iterations, whereas GS has to go through significantly more iterations

per run due to the initial burn-in stage (200 iterations), as well as the need

to run a large number of iterations to get a suciently large sampling (800

iterations).

3.7 Related Work

Even though collective classification has gained attention only in the past

five to seven years, the general problem of inference for structured data has

received attention for a considerably longer period of time from various re-

search communities including computer vision, spatial statistics and natural

language processing. In this section, we attempt to describe some of the work

that is most closely related to the work described in this article; however,

due to the widespread interest in collective classification our list is sure to be

incomplete.

One of the earliest principled approximate inference algorithms, relaxation

labeling (13), was developed by researchers in computer vision in the context of

Next Page

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home