Information Technology Reference
In-Depth Information
number of Web pages from the astronomy true class is about ten times the number of Web pages from
the agriculture true class. The second dataset, DS2 , contains 664 Web pages from 4 true classes. The
third dataset, DS3 , includes 1215 Web pages from 12 true classes. In order to show the performance
on a more diverse dataset, we produce the forth dataset, DS4 , which consists of 2524 Web pages from
24 true classes. After we remove stop words and conduct reduction of dimensionality (Yao, 2004), the
final dimension for each dataset is listed in Table 1.
discovery of a Constant f actor
In this section, we outline our experiments for the discovery of a constant factor that characterizes the
Web domain and makes our clustering algorithm applicable for clustering Web pages. For all experi-
Figure 6. The impact of avgInter on clustering performances for four representative Web page datas-
ets
For dataset DS1 (the number of true classes is 2):
0. 0. 0 .
0 .
0.
0.
0.
0.
0.
0
0
.
.
.
.
.
.
.
.
AvgInter
Avginter
(a-1)
(a-2)
For dataset DS2 (the number of true classes is 4):
0 .
0.
0.
0.
0.
0 .
0
0.
0.
0.
0
0
.
.
.
.
.
.
.
.
Avginter
Avginter
(b-1)
(b-2)
For dataset DS3 (the number of true classes is 12):
0
0.
0 .
0 .
0.
0 .
0.
0
0.
0
0.
0
0.
0
0
0
.
.
.
.
.
.
.
.
Avginter
Avginter
(c-1)
(c-2)
For dataset DS4 (the number of true classes is 24):
0
0.
0
0.
0. 0. 0.
0.
0
0
0.
0
0
0.
0
0.
0
0
0
.
.
.
.
.
.
.
.
Avginter
Avginter
(d-1)
(d-2)
Search WWH ::




Custom Search