Information Technology Reference
In-Depth Information
% Given a query Q as training data (as before), find a new
% atom A that can be established from an induced rule.
learned_atom(Q,A) :-
background(K), induce(Q,R), est([R|K],[A]),
\+ est(K,[A]), \+ member(A,Q).
The
learned_atom
predicate uses
induce
to generate a rule as before, then looks for
an atom
A
that is not an element of the training data
Q
, and that can be established
given that rule, but not without the rule. The result is the following:
?- learned_atom([four_legged(fido),four_legged(kitty),
not(four_legged(daffy)),
not(four_legged(tree17))], A).
A = four_legged(spot)
;
A = four_legged(rover)
;
A = four_legged(kelly)
In this case, enough is learned from the training examples to generate new examples:
Spot, Rover, and Kelly are four-legged.
Unfortunately, this simple version of
induce
looks for ever longer rules to explain
the observations it is given and never actually fails. So if
learned_atom
is asked to
classify Donald or Huey as four-legged given the same training examples, instead of
failing as it should, it would simply run forever.
There are ways of dealing with this issue. The easiest perhaps is to rewrite
induce
so that it eventually gives up and fails. For example, the length of the desired rule
could be limited. But a better approach is to
commit
to the first induced rule found by
the program. In other words, if a rule
R
has been found that accounts for the training
data
Q
, the testing should be done with
R
, with no backtracking to look for other rules.
Prolog provides a built-in predicate called
once
that does this: if
induce(Q,R)
in the
body of
learned_atom
is replaced by
once(induce(Q,R))
, the program commits to
the first rule found and
learned_atom
will then fail as it should, for instance, on
classifying the test examples Donald and Huey as four-legged.
In performing such classification, the bulk of the work involves finding a rule that
works well. When there are a number of predicates to choose from and a number
of variables that are needed, it can be very time-consuming to find a rule that does
the job properly. This makes it especially important to have training examples that
include
near-misses
, individuals that do not have the desired property but are very
similar to those that do. With near-misses, it is much easier to locate the negative
literals that need to be included in the body of the induced rule.