Information Technology Reference
In-Depth Information
Chapter 2
Data Visualization and
Data Summary
IntroductIon
Data visualization is an important concept in data mining. It enables you to understand the data, and to
begin to formulate questions that you need to answer in order to make reasonable decisions. In traditional
statistics, models and hypothesis tests provide proof; visualization is used to accompany the model in
an attempt to explain it. In the data mining approach, visualization may provide essential information
about the patterns in the data.
Patients in a clinic or geographic area are very heterogeneous. Therefore, the distribution of patient
factors will not have a normal distribution. Generally, the distribution will have a heavy tail since every
patient population will have those extreme patients who need extraordinary care; there will be more
patients who need considerable care than those whose treatment can be discontinued early. Thus, unlike
the normal distribution assumption, distributions of patient populations will not be symmetric. There-
fore, great care must be used when considering a model that assumes normality, or even symmetry. We
will look at alternative methods of analysis and visualization that do not depend upon the assumption
of normality.
Search WWH ::




Custom Search