Information Technology Reference
In-Depth Information
In forward selection the initial dimensionality of the inducers used is low and it
gradually increases when the numbers of considered variables get higher. Yet in such
limited context, without the presence of other features, without observing interactions
among them, any conclusions with regard to importance and relevance of attributes
could be unreliable and misleading [ 24 ].
For rule classifiers the low dimensionality means quick induction of rules and
relatively short decision algorithms, with few constituent rules, which seems to be
an advantage, however, with not enough data to mine, the constructed rules tend to be
approximate rather than certain and do not necessarily help in classification [ 33 ]. On
the other hand, when a classification system is of a connectionist type, the learning
stage is much more problematic when there are only few inputs to induce knowledge
from. Artificial neural networks with insufficient number of input nodes can have
significant trouble converging and training them takes much more time as more runs
are needed to learn anything from the training facts [ 11 ].
When the approach is that of backward reduction, we start with induction process
while dealing with some high number of attributes, and computational costs needed
for inferring decision rules in such case are much higher, could even be unfeasible,
depending on the induction algorithm. But, if it is still manageable, studying features
in much wider context can bring additional information resulting in better perfor-
mance of the classifier. Also, connectionist classification systems with more inputs
converge faster because with many neurons and interconnections there is simply
more room for adjustments of weights which minimises the error on the output.
The chapter illustrates a comparison of the two approaches of sequential selection
with a case of a binary classification task of authorship attribution with balanced data
[ 31 , 32 ]. The characteristic features observed refer to textual markers of lexical and
syntactic type, which enable definition and recognition of writing styles [ 25 ]. The
procedures of sequential selection serve as a means to an end of assignment of
weights to variables, depending on how their presence or absence in a considered
feature subset influences predictive accuracy of the classification system.
The text is organised as follows. Section 5.2 contains a brief introduction to
approaches to variable selection, exploited classifiers, and stylometric analysis. In
Sect. 5.3 there is described a framework for conducted experiments. Sections 5.4 and
5.5 show results from tests focused respectively on forward and backward selection
procedures. Concluding remarks are included in Sect. 5.6 .
5.2 Background
In the research presented in this chapter there are combined three issues, namely
approaches to feature selection, connectionist and rule-based classifiers employed in
pattern recognition, and stylometric processing of texts as the application domain,
which are briefly described in this section.
 
Search WWH ::




Custom Search