Information Technology Reference
In-Depth Information
3.6 Feature Evaluation by Backward Reduction
Within the second phase of experiments the considered stylometric features were
evaluated by observation of performance of classification systems executing sequen-
tial backward elimination, while following the previously established ranking. It
resulted in combining filter, wrapper, and embedded approaches.
Firstly there were constructedANN andDRSA classifiers for the entire set of stud-
ied features. The trained network correctly recognised median 83.33% of samples,
while generated all rules on examples algorithm, with hard constraints on minimal
support required of rules being 41, classified without any ambiguity 76.67% of
instances. These two results were next treated as reference points, when sequential
backward elimination was executed.
For neural networks reduction of variables corresponded to decreasing the number
of inputs (and also the number of neurons in the two hidden layers) and repeating
the complete training procedure for such modified topologies.
For DRSA processing it is also possible to conduct it in the same way, that is
eliminate attributes and construct a new classifier by induction of new decision
rules. However, with generation of all rules in each case the task would be very
time-consuming—for the entire set of variables the algorithm comprised 46,191 con-
stituent rules. With lower number of features we can expect this number to decrease
yet we can also expect that at least some part of these rules would be the same.
Therefore, instead of generating the same rules over and over again, another atti-
tude is employed and the process of attribute reduction is applied to the already
induced all rules on examples algorithm for the entire set of features. When a vari-
able is reduced, all rules having conditions on this variable are discarded (regardless
on other conditions they may include in the premise part) from the set and the new,
reduced algorithm is constructed. Such process is executed much faster than repeated
induction of rules.
The experiments were organised in two series, depending on the ranking of charac-
teristic features controlling backward reduction for both types of classifiers employed
in the stylometric task of authorship attribution.
3.6.1 Relief Ranking
The first group of executed tests was focused on elimination of characteristic features
for ANN and DRSA classifiers, while following the ranking of attributes returned
by Relief algorithm (see Table 3.1 ).
The performance of the connectionist classifier for decreasing number of input
nodes is plotted in Fig. 3.1 . The graph in Fig. 3.1 a displays reduction of variables in
the decreasing order, starting with those which are ranked the highest, and then along
lower and lower rank. Elimination of features in the reversed order, that is when the
first to go are the variables with the lowest ranking, is given in Fig. 3.1 b.
 
Search WWH ::




Custom Search