Graphics Reference
In-Depth Information
universal instance space is defined as a cartesian product of all input attribute domain
and the target attribute domain.
The two basic and classical problems that belong to the supervised learning cat-
egory are classification and regression. In classification, the domain of the target
attribute is finite and categorical. That is, there are a finite number of classes or cate-
gories to predict a sample and they are known by the learning algorithm. A classifier
must assign a class to a unseen example when it is trained by a set of training data.
The nature of classification is to discriminate examples from others, attaining as a
main application a reliable prediction: once we have a model that fits the past data,
if the future is similar to the past, then we can make correct predictions for new
instances. However, when the target attribute is formed by infinite values, such as in
the case of predicting a real number between a certain interval, we are referring to
regression problems. Hence, the supervised learning approach here has to fit a model
to learn the output target attribute as a function of input attributes. Obviously, the
regression problem present more difficulties than the classification problem and the
required computation resources and the complexity of the model are higher.
There is another type of supervised learning that involves time data. Time series
analysis is concerned with making predictions in time. Typical applications include
analysis of stock prices, market trends and sales forecasting. Due to the time depen-
dence of the data, the data preprocessing for time series data is different from the
main theme of this topic. Nevertheless, some basic procedures may be of interest
and will be also applicable in this field.
1.4 Unsupervised Learning
We have seen that in supervised learning, the aim is to obtain a mapping from the
input to an output whose correct and definite values are provided by a supervisor. In
unsupervised learning, there is no such supervisor and only input data is available.
Thus, the aim is now to find regularities, irregularities, relationships, similarities and
associations in the input. With unsupervised learning, it is possible to learn larger
and more complex models than with supervised learning. This is because in super-
vised learning one is trying to find the connection between two sets of observations.
The difficulty of the learning task increases exponentially with the number of steps
between the two sets and that is why supervised learning cannot, in practice, learn
models with deep hierarchies.
Apart from the two well-known problems that belong to the unsupervised learning
family, clustering and association rules, there are other related problems that can fit
into this category:
 
Search WWH ::




Custom Search