Advanced Analytics – Paradigms, Tools, and Techniques - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

Let us take an example here. We have an agriculture scientist working on a new

crop that she developed. As a trial, this seed was planted at various altitudes and

yield was computed. Once we plot a graph between yield and altitude, the relation-

ship between both the parameters is identified and the capability on predicting the

yield at any other altitude is gained. You can observe that the data usually does not

perfectly fit a line, and once the line is fit and the equation is noted (of course along

with errors), we can get rid of the data. This technique is called regression.

Clustering

Most of the time, the data analyst is just given some data and is expected to

unearth interesting patterns that may help in deriving intelligence. The main differen-

ce between this task and that of a classification is that, in the classification problem,

the business user specifies what he/she is looking for (a good customer or a bad

customer; a success or a failure, and so on).

Let us consider the same example as we did in the Classification section. In cluster-

ing, the patterns to classify the customers are identified without any target in mind

or any prior classification. When running a classification given a specific model, the

results will always be the same, whereas with clustering, it may not be the same (for

example, depending on how the initial centroids are picked). An example modeling

method for clustering is K-means clustering. You may learn more on K-means in the

following section.

Search WWH ::

Custom Search

Home