Databases Reference
In-Depth Information
Fig. 6. LSS System oversearch engine Google
to evaluate the clustering performance [3] based on the human expert's de-
cisions. More than one hundred queries related to medicine have been sub-
mitted from our system to clustering the returned results from PubMed and
GOOGLE spectively. More than two hundred thousand Web pages or snip-
pets have been returned. In general, the average entropy is around 0 . 14
±
0 . 06
for PubMed and 0 . 27
0 . 08 or so for GOOGLE. Because PubMed has defined
meta-date for each medical literature by human experts. If without using these
meta-data, the average entropy will become 0 . 21
±
0 . 09. According to it, we
can conclude courageously that the CONCEPTs organized by LSS can nearly
make a precisely semantic concept clustering for Web pages.
±
6 Conclusion
Polysemy , phrases and term dependency are the limitations of search tech-
nology [12]. A single term is not able to identify a latent concept in a
document, for instance, the term “Network” associated with the term “Com-
puter”, “Tra c”, or “Neural” denotes different concepts. To discriminate term
associations no doubt is concrete way to distinguish one category from the
others. A group of solid term associations can clearly identify a concept. The
term-associations (frequently co-occurring terms) of a given collection of Web
pages, form a simplicial complex. The complex can be decomposed into con-
nected components at various levels (in various level of skeletons). We believe
each such a connected component properly identify a concept in a collection
of Web pages.
Search WWH ::




Custom Search