Database Reference
In-Depth Information
Table 2. Generality relationships between rules
More Specific
More General
most specific rule
combined rule
most specific rule
random most general rule
most specific rule
initial rule
combined rule
random most general rule
different combinations of boundaries from the most specific rule. Fig. 1(d) shows
the combined rule, formed from the conjunction of all most general rules. The
generality relationships between these rules are presented in Table 2.
Note that it could not be guaranteed that any pair of these rules were strictly
more general or more specific than each other as it was possible for the most
specific and random most general rules to be identical (in which case the set of
most general rules would contain only a single rule and the initial and combined
rules would also both be identical to the most specific and random most general
rules. It was also possible for the initial rule to equal the most specific rule even
when there were multiple most general rules. Also, it was possible for no gen-
erality relationship to hold between an initial and the combined or the random
most general rule developed therefrom.
We wished to evaluate whether the predicted effects held between the rules of
differing levels of generality so formed. It was not appropriate to use the normal
machine learning experimental method of averaging over multiple runs for each
of several data sets, as our prediction is not about relationships between average
outcomes, but rather relationships between specific outcomes. Further, it would
not be appropriate to perform multiple runs on each of several data sets and
then compare the relative frequencies with which the predicted effects held and
did not hold, as this would violate the assumption of independence between ob-
servations relied on by most statistical tools for assessing such outcomes. Rather,
we applied the process once only to each of the following 50 data sets from the
UCI repository [11]:
abalone, anneal, audiology, imports-85, balance-scale, breast-cancer,
breast-cancer-wisconsin, bupa, chess, cleveland, crx, dermatology, dis,
echocardiogram, german, glass, heart, hepatitis, horse-colic,
house-votes-84, hungarian, allhypo, ionosphere, iris, kr-vs-kp,
labor-negotiations, lenses, long-beach-va, lung-cancer, lymphography,
new-thyroid, optdigits, page-blocks, pendigits, pima-indians-diabetes,
post-operative, promoters, primary-tumor, sat, segmentation, shuttle,
sick, sonar, soybean-large, splice, switzerland, tic-tac-toe, vehicle,
waveform, wine.
These were all appropriate data sets from the repository to which we had ready
access and to which we were able to apply the combination of software tools
employed in the research. Note that there is no averaging of results. Statistical
analysis of the outcomes over the large number of data sets is used to compensate
for random effects in individual results due to the use of a single run.
 
Search WWH ::




Custom Search