Database Reference
In-Depth Information
Let us take an example here. We have an agriculture scientist working on a new
crop that she developed. As a trial, this seed was planted at various altitudes and
yield was computed. Once we plot a graph between yield and altitude, the relation-
ship between both the parameters is identified and the capability on predicting the
yield at any other altitude is gained. You can observe that the data usually does not
perfectly fit a line, and once the line is fit and the equation is noted (of course along
with errors), we can get rid of the data. This technique is called regression.
Clustering
Most of the time, the data analyst is just given some data and is expected to
unearth interesting patterns that may help in deriving intelligence. The main differen-
ce between this task and that of a classification is that, in the classification problem,
the business user specifies what he/she is looking for (a good customer or a bad
customer; a success or a failure, and so on).
Let us consider the same example as we did in the Classification section. In cluster-
ing, the patterns to classify the customers are identified without any target in mind
or any prior classification. When running a classification given a specific model, the
results will always be the same, whereas with clustering, it may not be the same (for
example, depending on how the initial centroids are picked). An example modeling
method for clustering is K-means clustering. You may learn more on K-means in the
following section.
Search WWH ::




Custom Search