Information Technology Reference
In-Depth Information
2.2.4 Create a Signature Vector to Represent the Input Email
Now we create a signature vector to represent an input email by the frequency
information of the keywords and the simplified weighted graph information.
1. We use f i to denote the frequency of the keyword K i in the document.
Use F ( D )=[ f 1 ,f 2 , ..., f m ] denote the frequency vector of the document D .
2. We use the adjacency matrix to represent the simplified weighted di-
rected graph G s .
Let w s ij = 0 if there is no arc from the vertices K i to K j .
w s 11 w s 12
···
w s 1 m
w s 2 m
. . . . . .
w s m 1 w s m 2 ... w s mm
w s 21 w s 22
···
W ( D )=
.
Then we rewrite it as an ( m
×
m ) vector.
W ( D )=[ w s 11 ,w s 12 , ..., w s 1 m ,w s 21 ,w s 22 , ...w s 2 m , ..., w s mm ] .
Let R ( D )=[ F ( D ) , W ( D )]. The vector R ( D ) contains not only the frequency
information of the keywords, but also the structure information of the docu-
ment. It is used as the signature vector of the document.
Again, corresponding to the given example (Figure 1), we have
F ( D )=[1 , 2 , 2 , 2]
00 . 0995 0 . 0705 0 . 0459
00 . 0200 0 . 3848 0 . 5668
00 . 0227 0 . 0204 0 . 0884
00 . 0345 0 . 3627 0 . 0323
W ( D )=
R ( D )=[1 , 2 , 2 , 2 , 0 , 0 . 0995 , 0 . 0705 , 0 . 0459 , 0 , 0 . 0200 , 0 . 3848 , 0 . 5668 ,
0 , 0 . 0227 , 0 . 0204 , 0 . 0884 , 0 , 0 . 0345 , 0 . 3627 , 0 . 0323]
.
2.3 Details of the Step 2
2.3.1 Find Signature Vectors for All Documents
Repeat the process of the Step 1, we create signature vectors for all
documents.
 
Search WWH ::




Custom Search