Databases Reference
In-Depth Information
Latent Semantic Space for Web Clustering
I-Jen Chiang 1 , 3 , Tsau Young ('T. Y.') Lin 2 , Hsiang-Chun Tsai 3
Jau-Min Wong 3 , and Xiaohua Hu 4
1
Graduate Institute of Medical Informatics, Taipei Medical University,
205, Wu-Hsien Street, Taipei, Taiwan, ROC
ijchiang@tmu.edu.tw
2
Department of Computer Science, San Jose State University, One Washington
Square, San Jose, CA, USA
95192-0249 tylin@cs.sjsu.edu
3
Graduate Institute of Biomedical Engineering, National Taiwan University, No.1,
Sec. 1, Jen-Ai Road, Taipei, Taiwan, ROC
4
College of Information Science and Technology, Drexel University, Philadelphia,
PA 19104, USA
thu@cis.drexel.edu
Summary. To organize a huge amount of Web pages into topics, according to
their relevance, is the e cient and effective method for information retrieval. Latent
Semantic Space (LSS) naturally in the form on some geometric structure in Com-
binatorial Topology has been proposed for unstructured document clustering. Given
a set of Web pages, the set of associations among frequently co-occurring terms
in them forms naturally a CONCEPT, which is represented as a set of connected
components of the simplicial complexes. Based on these concepts, Web pages can be
clustered into meaningful categories.
1 Introduction
To adequately handle documents, a methodology to represent or to reveal
their latent semantics are needed. To date, no universally accepted effective
methodology has been discovered. In previous paper [15], we have pictured the
latent semantics geometrically and call it the Latent Semantic Space (LSS) of
the given set of documents. We take the key terms as vertices and visualize the
term-associations(frequent co-occurring terms) as simplicial complex in LSS.
Our thesis has been: a maximal connected component represents a CONCEPT
in LSS of a collection of documents. However, in [15], we have not explored
the full thesis, we consider only the PIMITIVE COMCEPTs of the highest
dimension. Technically, we consider only the maximal connect components
of the skeleton of the highest layer. In this paper, we explore the full notion
Search WWH ::




Custom Search