Information Technology Reference
In-Depth Information
2.5 Machine Learning
ML is the area of study concerned about the design, development and evaluation of
systems capable to learn from data . In many common situations where we need, for
instance, to complete a particular task, or perhaps to make some prediction regard-
ing a given issue, it is possible to find solutions by the inspection and analysis of
previous observations with similar characteristics to the addressed problem. In other
words, ML systems are capable of predicting future actions based on past experiences
(Bishop 2006 ; Murphy 2012 ).
Data are used as the input of the learning process and their representation is
fundamental for the performance of ML systems. They must describe any specific
situation to better predict future data in a meaningful way. The property that allows to
correctly predict unseen samples, is known as generalization and it is highly desirable
in any learning machine as it is directly related to its performance.
2.5.1 Taxonomy of ML Algorithms
ML algorithms have been categorized according to the type of input used for training
and its expected outcome. In this section, we describe the most relevant categories.
2.5.1.1 Supervised Learning
In this type of learning, input data are usually composed of a pair of elements, namely
the input vector
(Bishop 2006 ). This can be better
clarified with an example: assume a system that learns handwritten numbers from 0
to 9. The input vectors would be the set of images of all the numbers (usually several
samples per each one) and the target vector the actual labels that correspond to each
sample.
If the output of desired system is categorical (only a set of discrete classes are
considered), then it is a classification problem such as the example presented above.
Otherwise, if the output data are continuous variables, such as in temperature fore-
casting or stock market prediction, then the system is considered a regression .
This algorithm type is the most commonly used for ML and it is also the one used
in our research. However it is not useful to solve all kind of problems. In fact, one of
its disadvantages is that in some applications it is not always possible to have target
information for all the available input samples. As a result, other techniques can cope
with these situations such as unsupervised and semi-supervised learning which are
described as follows.
When the learning is performed gradually, for instance, by adding one new sample
and its target at a time to the model, we refer to Online Machine Learning (Shalev-
Shwartz 2011 ). This supervised approach have the advantage of making the model
(
x
)
together with its target
(
y
)
 
Search WWH ::




Custom Search