Information Technology Reference
In-Depth Information
2.2.4 Create a Signature Vector to Represent the Input Email
Now we create a signature vector to represent an input email by the frequency
information of the keywords and the simplified weighted graph information.
1. We use
f
i
to denote the frequency of the keyword
K
i
in the document.
Use
F
(
D
)=[
f
1
,f
2
, ..., f
m
] denote the frequency vector of the document
D
.
2. We use the adjacency matrix to represent the simplified weighted di-
rected graph
G
s
.
Let
w s
ij
= 0 if there is no arc from the vertices
K
i
to
K
j
.
⎡
⎤
w s
11
w s
12
···
w s
1
m
⎣
⎦
w s
2
m
. .
.
.
.
.
w s
m
1
w s
m
2
... w s
mm
w s
21
w s
22
···
W
(
D
)=
.
Then we rewrite it as an (
m
×
m
) vector.
W
(
D
)=[
w s
11
,w s
12
, ..., w s
1
m
,w s
21
,w s
22
, ...w s
2
m
, ..., w s
mm
]
.
Let
R
(
D
)=[
F
(
D
)
, W
(
D
)]. The vector
R
(
D
) contains not only the frequency
information of the keywords, but also the structure information of the docu-
ment. It is used as the signature vector of the document.
Again, corresponding to the given example (Figure 1), we have
F
(
D
)=[1
,
2
,
2
,
2]
⎡
⎤
00
.
0995 0
.
0705 0
.
0459
00
.
0200 0
.
3848 0
.
5668
00
.
0227 0
.
0204 0
.
0884
00
.
0345 0
.
3627 0
.
0323
⎣
⎦
W
(
D
)=
R
(
D
)=[1
,
2
,
2
,
2
,
0
,
0
.
0995
,
0
.
0705
,
0
.
0459
,
0
,
0
.
0200
,
0
.
3848
,
0
.
5668
,
0
,
0
.
0227
,
0
.
0204
,
0
.
0884
,
0
,
0
.
0345
,
0
.
3627
,
0
.
0323]
.
2.3 Details of the Step 2
2.3.1 Find Signature Vectors for All Documents
Repeat the process of the Step 1, we create signature vectors for all
documents.