Database Reference
In-Depth Information
In addition, to inhibit redundant division of groups, a penalty,
>γ , is
multiplied to f , where G is the number of groups. This method has the following
features:
z In conventional methods, most of the research has focused on problems of
only clustering or only rule extraction from the data that have already been
classified. We use data containing multiple classes, and clustering of data and
rule extraction in each class are performed simultaneously.
z The number of classes in the training set can be acquired. Moreover, the ratio
of the number of agents in each group corresponds to the ratio of the
appearance of each class. Therefore, we can understand the probability of
appearance of each class. We can also regard the group with few agents as
noise.
z For test data, we can obtain candidate answers and each candidate's reliability.
Recognizing that the data contain multiple classes might trigger a discovery of
a useful attribute for clustering. This enables us to perform accurate prediction.
γ
G
1
( )
1
7.3.4 Extracting Rules from a Medical Database
We used hepatobiliary disorder data as our experimental data. The data consist of
the results of biochemical tests for four hepatobiliary disorders and the gender of
the patient. Table 7.7 shows some example values of the biochemical tests
conducted for each disorder. The tests are: GOT (glutamic oxaloacetic
transaminase), GPT (glutamic pyruvic transaminase), LDH (lactate dehydrase),
GGT (gamma glutamyl transpeptidase), BUN (blood urea nitrogen), MCV (mean
corpuscular volume of red blood cell), MCH (mean corpuscular hemoglobin),
Tbil(total bilirubin), and CRT(creatinine). The disorders are “alcoholic liver
damage,” “primary hepatoma,” “liver cirrhosis,” and “cholelithiasis.” We have
536 patient records with some incorrect diagnostic data. The training data set
consists of 322 randomly chosen records; the remaining records are used for
testing.
As discussed earlier, medical information, such as the results of biochemical
tests and chief complaint, is often ambiguous. We cannot clearly distinguish the
difference between normal and pathological values. Biochemical test values
cannot be precisely evaluated by using crisp sets. So we set up three cutoff values
in Table 7.8. Therefore, each biochemical item has four levels. The functions and
terminals in GP are as follows. The functional symbols are {AND, =, >,<}. Each
function has two arguments. The terminal symbols are night biochemical test
items and discrete values (0,1,2,3).
Table 7.7. Example of biochemical tests for four hepatobiliary disorders: (a) Alcoholic
liver damage, (b) primary hepatoma, (c) liver cirrhosis, and (d) F holelithiasis .
Gender
GOT
GPT
LDH
GGT
BUN
MCV
MCH
Tbil
CRT
a)
Male
108
114
344
176
114.8
99.7
33.1
0.6
0.9
b)
Male
354
104
1047
265
21.4
95.9
33.5
4.8
1.0
c)
Male
38
17
489
23
19.2
89.8
30.4
0.7
1.0
d)
Male
21
11
318
40
12.1
88.6
29.4
0.7
0.6
 
Search WWH ::




Custom Search