Databases Reference
In-Depth Information
Tabl e 1 . Analyzed system statistics
FAQ DB name
Fed. Gov. Dept.
Consumer
Consumer
electronics
software
Dates
3/18-6/07
5/08-6/07
3/28-6/15
# of Documents
657
2,418
368
# Words in docs
38,426
231,089
24,461
# Unique words in docs
4,008
7,231
3,951
# of searches
779,578
5,701
106,006
# Unique search words
83,105
2,545
15,946
Tabl e 2 . Clustering of 1,000 queries
Level
# clusters Avg. self-similarity Min. self-similarity
Federal Government Department
1
17
0.777
0.057
2
33
0.355
0.117
3
128
0.542
0.173
4
207
0.642
0.246
5
165
0.720
0.343
6
60
0.770
0.463
Consumer Electronics Manufacturer
1
35
0.465
0.038
2
117
0.459
0.121
3
256
0.655
0.269
4
173
0.748
0.333
5
55
0.762
0.462
6
7
0.813
0.502
Consumer Software Producer
1
24
0.524
0.094
2
62
0.526
0.160
3
135
0.570
0.251
4
187
0.679
0.338
5
138
0.744
0.417
6
48
0.816
0.527
Table 2 includes statistics for the final concept hierarchies produced by the
algorithm. Note that when accumulating the maximum self-similarity, values
of 1.0 were ignored; these similarity values were produced for any single-query
clusters.
As expected, as depth in the tree increased, the cohesiveness of the clusters
also increased. The slight anomalies at the top level were due to the decreasing
number of clusters at that level and to the handling of single-query clusters;
the unity self-similarity values had an adverse effect on the results.
In addition to the statistics, the resulting hierarchies were subjectively
evaluated. The results were deemed very useful; the algorithm indeed clusters
 
Search WWH ::




Custom Search