Databases Reference
In-Depth Information
conditions on predictor attributes that describes and distinguishes different
values in the goal attribute. The goal attribute is also known as class label.
2.3.2.1. Representation
Classification rules are normally represented by IF-THEN rules with the
following format:
IF( cond 1 ) AND
···
AND ( cond m )THEN class = c i
This type of rules contains two parts. The rule antecedent (the IF part)
contains a conjecture of m conditions on predictor attributes (i.e., cond i ),
and the rule consequent (the THEN part) contains a prediction about
the value of a goal attribute (i.e., c i ). cond i is a predicate of the form
attri i operator value ij ,where attri i denotes the i -th attribute in the
predictor attribute set, value ij means the j -th value of attribute i and
operator is a comparing operator (e.g., = ,
= ,>,
,<,
for continuous
attribute, = and
= for nominal or boolean attributes).
Encoding such complicated rule structure by GAs is not obvious, since
GAs use fixed length binary strings for representation. Therefore, the rule
format is normally simplified by only considering “=” as the operator.
In this case, given n attributes, if referring to Fig. 2.2, there will be n
genes in the chromosome, where the first n
1 genes represent values of
n − 1 attributes in the IF part, and the last gene represents the value
of the goal attribute. Two ways are available to decide the number of
nucleotides for each gene. Suppose a given attribute attri i can take k
discrete values, then there will be k nucleotides for this gene. For example, if
the value of attribute “ login time ” can be “morning”, “noon”, “afternoon”,
“evening”, and “midnight”, then the gene for this attribute has five bits.
If the gene has the value “01001”, then they would be representing a
condition like (login time = “noon” OR “midnight”). Obviously, such type
of representation is able to encode more than one value for an attribute
at the same time, but will suffer performance issues if an attribute has
hundreds of values. Another way to decide the number of nucleotides is to
use the equation len =
, 26,27 where n is the number of values of an
attribute. So this time there will be three bits for attribute “login time”.
The binary string “010” here means (login time=“afternoon”).
With the development of GAs, other encoding schemes were conceived
and applied, such as hexadecimal strings, 28,29 real-number vectors 30,31 or a
vector mixed with real numbers and characters. 32,33 Sometimes, a special
log n
2
Search WWH ::




Custom Search