Database Reference
In-Depth Information
Table 7.3 ( continued )
Field name
Percentage of incoming calls in peak hours
Percentage of incoming calls in non-peak hours
Percentage of incoming calls in work days
Percentage of incoming calls in non-work days
Days with usage:
Monthly average number of days with any outgoing usage
Monthly average number of days with any incoming usage
Average call duration:
Average duration of outgoing voice calls (in minutes)
Average duration of incoming voice calls (in minutes)
Other usage fields - profiling fields
Contract information fields - profiling fields (for instance, tenure, rate plan,
acquisition channel, payment method, handset category, etc.)
Customer information fields - profiling fields (customer demographics)
Data Audit
Before beginning any data mining project it is necessary to perform a health
check on the data to be mined. Initial data exploration may involve looking
for missing data and checking for inconsistencies, identifying outliers, and
examining the field distributions with basic descriptive statistics and charts
like bar charts and histograms. IBM SPSS Modeler offers a tool called Data
Audit (Figure 7.2) that performs all these preliminary explorations and allows
users to understand the data and spot potential abnormalities.
As clustering algorithms are very sensitive to extreme values, we should
thoroughly examine the validity of the input data before beginning the
model training. A common pitfall, for instance, is the inclusion of irrelevant
populations, like members of staff or business customers, in the residential
customer base, when the objective is to segment consumer customers. These
misplaced records may behave exceptionally different from the population of
interest, resulting in outlier values, which may mislead the analysis. Instead of
discovering the mistakes too late, it is always preferable to perform exhaustive
preliminary checks on the data in advance.
Search WWH ::

Custom Search