Information Technology Reference
In-Depth Information
where f(A i ) indicates benefit of attribute i g(A i ) indicates bias of attribute i
h(A i ) indicates cost of attribute I, 1 i k.
(1) Following formula gives definition of f(A i ), where U I represents useful
information; N I represents useless information; T I represents total information =
U I + N I .
I
I
U
N
f
(
A
)
=
=
1
i
T
I
T
I
Let:
I
I
I
=
H T
(
)
H N
(
)
Ç
I
×
N
I
I
I
I
I
=
H N
(
)
H T
(
)
=
log (
N
)
log (
T
)
=
log
È
Ø
2
2
2
I
T
É
Ù
Ç
I
×
Ç ×
I
N
U
I
2
=
=
1
=
1
f
()
È
Ø
È Ø
I
I
T
T
É
Ù
É Ù
I
f
(
A
)
=
1
2
i
I
=
Gain A
(
,
E
)
=
I E
(
)
Ent A
(
,
E
)
l
l
E
C
k
Ã
j
I E
(
)
= −
P
log
P
,
P
=
CT
j
2
j
j
j
E
j
=
1
E
v
Ã
i
Ent A
(
,
E
)
=
I E
(
)
l
i
E
i
Gain
=
1
()
f
(
A
)
=
1
2
i
(2) g(A i ) is defined as:
b
g
(
A
)
=
C
i
i
(3) h(A i ) is defined as:
c
h A
(
)
=
C
+
1
i
i
From which we can get expression:
(1
2
Gain
()
)
C
b
(
A
)
=
ASF
i
c
C
+
1
Our decision tree generating algorithm GSD is a modified version of Quinlan's
C4.5 algorithm. The modification is embodied in two aspects: input and attribute
selecting. Input of GSD employs preprocess algorithm PPD to output appointed
concept level training example subset. Furthermore, it uses attribute selecting
function ASF to replace attribute selecting standard of C4.5.
Search WWH ::




Custom Search