Database Reference
In-Depth Information
3.3 Statistical Methods for Evaluation
Visualization is useful for data exploration and presentation, but statistics is crucial
because it may exist throughout the entire Data Analytics Lifecycle. Statistical
techniques are used during the initial data exploration and data preparation, model
building, evaluation of the final models, and assessment of how the new models
improve the situation when deployed in the field. In particular, statistics can help
answer the following questions for data analytics:
• Model Building and Planning
What are the best input variables for the model?
Can the model predict the outcome given the input?
• Model Evaluation
Is the model accurate?
Does the model perform better than an obvious guess?
Does the model perform better than another candidate model?
• Model Deployment
Is the prediction sound?
Does the model have the desired effect (such as reducing the cost)?
This section discusses some useful statistical tools that may answer these questions.
3.3.1 Hypothesis Testing
When comparing populations, such as testing or evaluating the difference of the
means from two samples of data ( Figure 3.22 ), a common technique to assess the
difference or the significance of the difference is hypothesis testing .
Search WWH ::




Custom Search