Databases Reference
In-Depth Information
MLEM2 Rule Induction Algorithms:
With and Without Merging Intervals
Jerzy W. Grzymala-Busse
Department of Electrical Engineering and Computer Science, University of Kansas,
Lawrence, KS 66045-7621, USA
and
Institute of Computer Science, Polish Academy of Sciences, 01-237 Warsaw, Poland
jerzy@ku.edu
Summary. The MLEM2 algorithm is a rule induction algorithm in which rule
induction, discretization, and handling missing attribute values are all conducted
simultaneously. In this paper two versions of the MLEM2 algorithm are compared:
the first version of MLEM2 induces rules that may contain two conditions with
the same numerical attribute and different intervals. The second version of MLEM2
induces rules with merged conditions associated with numerical attributes, i.e., all
conditions are related to different attributes. For completeness, experiments on the
original LEM algorithm with discretization as a preprocessing are also included. The
performance, in terms of accuracy, for all three algorithms is approximately the same
(for any two of them the difference in performance is not statistically significant).
1 Introduction
The algorithm MLEM2 (Modified Learning from Examples Module, version 2)
[7] is a component of the LERS (Learning from Examples based on Rough
Sets) data mining system. Rough set theory was introduced in [13], see
also [14]. MLEM2 is based on LEM2 (Learning from Examples Module, ver-
sion 2). LEM2 requires a preprocessing called discretization, a conversion of
numerical values into intervals. Additionally, LEM2 requires preprocessing to
handling missing attribute values before the main process of rule induction.
On the other hand, in MLEM2 all three processes: rule induction, discretiza-
tion and handling missing attribute values are conducted at the same time,
i.e., the MLEM2 module induces rule sets directly from data with numerical
attributes and missing attribute values. Recently, a new version of MLEM2
was implemented, with merging conditions with intervals (for simplicity, we
will call it MLEM2 with merging intervals).
The data mining system LERS uses a number of discretization algorithms.
The simplest method to discretize a numerical attribute is partitioning its
Search WWH ::




Custom Search