Graph Model for Pattern Recognition in Text - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

R (

D a )

·

D b )

s ab =

| R (

|·|R (

D a )

D b )

|

R ( D i )isthe i -th row of the matrix

M (

where

D

), namely, the normalized

signature vector of the document D i .

Complexity: O ( n 2 m 2 ).

3.6 Clustering

The final stage is to use Quasi-Clique Merge algorithm ( QCM [14]) to cluster

all documents. We suppose h is the level number of the hierarchical system.

Then, by the estimation in [14], the number of iterations is bounded by

O ( hn 2 log ( n )). Note that, for an input set of n documents, the number of

hierarchical levels is log( n ) in average. Thus, the complexity of QCM is

O ([ nlog ( n )] 2 ).

3.7 Total Complexity

By summing up all steps, the total complexity is

O ( t + m 2 φ 2 + nm 2 + n 2 m 2 +[ nlog ( n )] 2 ),

where

is the number of the distinct alphabets appearing in the key words,

t is the average length of the documents, φ is the average appearance of a

keyword in a document, m is the total number of keywords, n is the total

number of documents.

Since we compare lots of documents, so φ (the average appearance of

a keyword in a document) is much smaller than n (the total number of

documents), and t is usually less than n 2 m 2 . Thus, the complexity is further

simplified as O ( n 2 m 2 +[ nlog ( n )] 2 ).

|

Σ

|

4 Experimental Results

In order to evaluate the effectiveness of our algorithm, we will compare the

results of our method with the usual keyword frequency method. We calculate

the similarity between every pair of documents the following two different

ways:

KF method: only use keyword frequency information.

KFP method: use keyword frequency and structure information, which is

based on the weighted directed multigraph model described in this paper.

In the following analysis we will show that the KFP method is superior to

the KF method.

Mining and Analyzing Social Networks

Search WWH ::

Custom Search

Home