Information Technology Reference
In-Depth Information
about James Joyce as the author. A painting with a group of posed ballet dancers
upon a stage we would associate with Degas, and water lilies in a pond with Monet.
Hearing rich classical organ music we could try to guess Bach as the composer. In
each of these exemplary cases we have a chance of correct recognition basing on
some characteristic features the authors are famous for. Our brains recognise lily
flowers or organ tunes, yet to make other people or machines capable of the same we
need to explain these specific elements, which means describing, expressing them in
understandable and precise terms.
Characterisation of things is a natural element of life, some excel at it while
others are not so good. Yet anybody can make basic distinctions, especially with
some support system. Some of how these characteristics play into problems we need
to tackle, tasks waiting to be solved, comes intuitively, some we get fromobservations
or experiments, drawn conclusions. Some pointers are rather straightforward while
others indirect or convoluted.
According to a dictionary definition a feature is a distinctive attribute or aspect
of something and it is used as a synonym for characteristic, quality, or property
[ 29 , 38 ]. With such meaning it is employed in general language descriptions but also
in more confined areas of technical sciences, computer technologies, in particular in
the domain of data mining and pattern recognition [ 24 , 30 , 39 ].
For automatic recognition and classification [ 11 , 27 ] all objects of the universe of
discourse need to be perceived through information carried by their characteristics
and in cases when this information is incomplete or uncertain the resulting predictive
accuracies of constructed systems, whether they induce knowledge from available
data in supervised or unsupervised manner [ 28 ], relying on statistics-oriented calcu-
lations [ 8 , 19 ] or heuristic algorithms, could be unsatisfactory or falsified, making
observations and conclusions unreliable.
The performance of any inducer depends on the raw input data on which inferred
knowledge is based [ 21 ], exploited attributes, the approach or methodology of data
mining applied, but also on the general dimensionality of the problem [ 40 ]. Con-
temporary computer technologies with their high computational capabilities aid in
processing, but still for huge data sets, and very high numbers of variables the process,
even if feasible, can take a lot of time and effort, require unnecessary or impractically
large storage.
Typically the primary goal is to achieve the maximal classification accuracy but
we need to take into account practical aspects of obtained solutions, and consider
compromises with trade-offs such as some loss in performance for much shortened
time, less processing, lower complexity, or smaller structure of the system.
Feature selection is an explicit part of most knowledge mining approaches—some
attributes are chosen over others while forming a set of characteristic features in the
first place [ 10 , 18 ]. Here the choice can be supported by expert knowledge. Once
some subset of variables is available, using it to construct a rule classifier, a rule
induction algorithm leads to particular choices of conditions for all constituent rules,
either usual or inhibitory. In a similar manner in a decision tree construction specific
attributes are to be checked at its nodes, and artificial neural networks through their
Search WWH ::




Custom Search