Database Reference
In-Depth Information
TABLE 3.2: Accuracy results for the Cora dataset. CC algorithms
outperformed their CO counterparts significantly. LR versions
significantly outperformed NB versions. ICA-NB outperformed GS-NB for
SS and M, the other differences between ICA and GS were not significant
(both NB and LR versions). Even though MF outperformed ICA-LR,
GS-LR, and LBP, the differences were not statistically significant.
Algorithm
SS
RS
M
0 . 7285
0 . 7776
0 . 7476
CO-NB
0 . 8054
0 . 8478
0 . 8271
ICA-NB
GS-NB
0 . 7613
0 . 8404
0 . 8154
CO-LR
0 . 7356
0 . 7695
0 . 7393
ICA-LR
0 . 8457
0 . 8796
0 . 8589
GS-LR
0 . 8495
0 . 8810
0 . 8617
LBP
0 . 8554
0 . 8766
0 . 8575
MF
0 . 8555
0 . 8836
0 . 8631
Finally, we take a look at the numbers under the columns labeled M.
First, we would like to remind the reader that even though we are com-
paring the results only on instances appearing in at least one test set in
both sampling strategies (SS and RS), different training data could have
been potentially used for each test instance, thus the comparison can
be questioned. Nonetheless, we expected the matched cross-validation
results (M) to outperform SS results simply because each instance had
more labeled data around it from RS splitting. The differences were not
big (around 1% or 2%); however, they were significant. These results
tell us that the evaluation strategies can have a big impact on the final
results, and care must be taken while designing an experimental setup
for evaluating CC algorithms on network data (9).
3.6.3 Practical Issues
In this section, we discuss some of the practical issues to consider when
applying the various CC algorithms. First, although MF and LBP perform
consistently better than ICA and GS, they were also the most dicult to work
with in both learning and inference. Choosing the initial weights so that the
weights will converge during training is non-trivial. Most of the time, we had
to initialize the weights with the weights we got from ICA in order to get the
algorithms to converge. Thus, the MF and LBP had unfair advantages in the
above experiments. We also note that of the two, we had the most trouble
with MF being unable to converge, or when it did, not converging to the
global optimum. Our diculty with MF and LBP is consistent with previous
work (39; 27; 43) and should be taken into consideration when choosing to
apply these algorithms.
Second, ICA and GS parameter initializations worked for all datasets we
 
Search WWH ::




Custom Search