Databases Reference
In-Depth Information
Evaluation of Results
Deciding which algorithm to apply on the set of data depends on the type of data (interval
or categorical, paired vs. unpaired) being analyzed and whether or not the data is
normally distributed. Interpretation of the results of the analysis relies on an appreciation
and consideration of the null hypothesis, P -values, and the concept of statistical
significance.
By constructing a histogram or frequency curve you will be able to understand
whether the data follows a normal distribution or not. You can also do box plots to
determine if there are outliers in the data sets. Conducting tests like principal component
analysis will also help you determine which attribute or set of attributes are influencing
the spread of the data and why.
Following are few high-level tips for you to consider:
Identify the dependent variable. What are you trying to predict?
Identify the independent variables, or the predictors of the
dependent variable.
Find the statistically significant relationships between independent
variables and the dependent variable. The usual standard for
statistical significance is less than a 5 percent chance that a
relationship this strong would be observed by coincidence, where no
real relationship existed. Look for one of the following indicators:
p (should be .05 or less), Z score , significance level , or the use of
asterisks (**) to indicate significance at the .05 level or less. In each
case, lower numbers are better, since the number is the probability of
this relationship being generated by random coincidence.
Now that you know the statistically significant independent
variables, check the direction of the relationship. You are looking
for a number that will be called a coefficient, or beta, or b.
If you see numbers in parentheses, ignore them. These are
usually standard errors, which are used to calculate p . Since
you already have p , you don't need them. Look for the
number without parentheses: that is the coefficient you want.
In most analyses, if this number is positive, then the
relationship between the independent variable and
dependent variable is direct. Increases in the independent
variable increase the value of the dependent variable. If
the number is negative, then the relationship is inverse.
Increasing the independent variable decreases the value of
the dependent variable.
In analysis of duration (how long a campaign runs, how long
before you see churn indications, how long a product stays
as number one in most selling list, etc.), the coefficient often
describes an effect on the hazard rate. The hazard rate is the
likelihood that some process stops (i.e., product drops from
 
Search WWH ::




Custom Search