Information Technology Reference
In-Depth Information
importantly, in quality. In the performed tests all rules on learning samples were
found and then in each studied case limited by introducing hard constraints to the
type of rules and their support [ 37 , 38 ].
The inferred rules can be certain (exact), possible, or approximate and only the
first type without additional processing indicates recognition results. In the research
conducted only certain rules were taken into account.
To reduce processing time required also a very strict rule was applied to ambigu-
ous decisions (cases of no rules matching, or multiple rules but with contradicting
decisions)—instead of solving the matter by voting, weighting, or both, all ambigu-
ous decisions were always treated as incorrect.
3.4.3 Search Parameters
Yet another of the search parameters, which we need to decide upon before we start,
is the point in the input feature space, from which the algorithm begins its execution.
Taking into account the dimensionality of this search space, and all candidate subsets
of features that can be found, this initial set is either arbitrarily selected as empty
and then variables are added to it in forward direction, or as an entire set of attributes
from which elements are reduced backward, or there is chosen some other subset
and then we can both add and remove features, checking in both directions.
The exhaustive search, with evaluation of all possible candidate subsets, is rarely
executed as it is typically too time-consuming, and only a part of these subsets are
tested. We can stop the search when the maximal performance is obtained (but we
cannot be then sure that the maximum is global and not local), but we can also
make the search complete with respect to the search path—that is it can end upon
reaching the point in space that is opposite to the starting one. This last attitude was
exploited in executed tests.
The search procedures startedwith the entire set of considered stylometric features
and then their sequential backward selection was executed, by removing one variable
at a time, until there was no attribute left. To evaluate a subset of features two separate
groups of samples were involved—one for induction phase and the other for testing.
3.5 Feature Evaluation by Ranking
Within the research presented in this chapter the two-step work framework was
implemented. The first step encompassed evaluation of relevance for characteristic
features by obtaining their ranking.
The rankings of features were calculated by Relief algorithm [ 43 ], and by employ-
ing embedded DRSA processing. Relief algorithm establishes through iterative cal-
culations how well individual attributes discern defined decision classes, whereas
in rough processing (described in detail in [ 42 ]) ranks of variables depend on their
occurrences in relative reducts, to which weights are assigned.
 
Search WWH ::




Custom Search