Database Reference
In-Depth Information
5) In Figure 4-4, we have our correlation coefficients in a matrix. Correlation coefficients
are relatively easy to decipher. They are simply a measure of the strength of the
relationship between each possible set of attributes in the data set. Because we have six
attributes in this data set, our matrix is six columns wide by six rows tall. In the location
where an attribute intersects with itself, the correlation coefficient is '1', because everything
compared to itself has a perfectly matched relationship. All other pairs of attributes will
have a correlation coefficient of less than one. To complicate matters a bit, correlation
coefficients can actually be negative as well, so all correlation coefficients will fall
somewhere between -1 and 1. We can see that this is the case in Figure 4-4, and so we can
now move on to the CRISP-DM step of…
EVALUATION
All correlation coefficients between 0 and 1 represent positive correlations , while all coefficients
between 0 and -1 are negative correlations . While this may seem straightforward, there is an
important distinction to be made when interpreting the matrix's values. This distinction has to do
with the direction of movement between the two attributes being analyzed. Let's consider the
relationship between the Heating_Oil consumption attribute, and the Insulation rating level
attribute. The coefficient there, as seen in our matrix in Figure 4-4, is 0.736. This is a positive
number, and therefore, a positive correlation. But what does that mean? Correlations that are
positive mean that as one attribute's value rises, the other attribute's value also rises. But , a positive
correlation also means that as one attribute's value falls, the other's also falls. Data analysts
sometimes make the mistake in thinking that a negative correlation exists if an attribute's values are
decreasing, but if its corresponding attribute's values are also decreasing, the correlation is still a
positive one. This is illustrated in Figure 4-5.
Heating Oil use
rises
Insulation
rating also rises
Heating Oil use
falls
Insulation
rating also falls
Whenever both attribute values move in the same direction, the correlation is positive .
Figure 4-5. Illustration of positive correlations.
 
Search WWH ::




Custom Search