Database Reference
In-Depth Information
further in Chapter 3.) Another example is skewness, such as if the majority of the
data is heavily shifted toward one value or end of a continuum.
Shneiderman [9] is well known for his mantra for visual data analysis of “overview
first, zoom and filter, then details-on-demand.” This is a pragmatic approach to
visual data analysis. It enables the user to find areas of interest, zoom and filter to
find more detailed information about a particular area of the data, and then find
the detailed data behind a particular area. This approach provides a high-level view
of the data and a great deal of information about a given dataset in a relatively short
period of time.
When pursuing this approach with a data visualization tool or statistical package,
the following guidelines and considerations are recommended.
• Review data to ensure that calculations remained consistent within
columns or across tables for a given data field. For instance, did customer
lifetime value change at some point in the middle of data collection? Or if
working with financials, did the interest calculation change from simple to
compound at the end of the year?
• Does the data distribution stay consistent over all the data? If not, what
kinds of actions should be taken to address this problem?
• Assess the granularity of the data, the range of values, and the level of
aggregation of the data.
• Does the data represent the population of interest? For marketing data, if
the project is focused on targeting customers of child-rearing age, does the
data represent that, or is it full of senior citizens and teenagers?
• For time-related variables, are the measurements daily, weekly, monthly?
Is that good enough? Is time measured in seconds everywhere? Or is it in
milliseconds in some places? Determine the level of granularity of the data
needed for the analysis, and assess whether the current level of
timestamps on the data meets that need.
• Is the data standardized/normalized? Are the scales consistent? If not,
how consistent or irregular is the data?
• For geospatial datasets, are state or country abbreviations consistent
across the data? Are personal names normalized? English units? Metric
units?
Search WWH ::




Custom Search