Database Reference
In-Depth Information
proposals should only be regarded as a general framework that delineates the
minimum information needed to support some of the most common data mining
activities. The proposals include a set of tables and a set of informative input fields
and derived indicators that have proved useful in real mining applications.
It should be noted, though, that the proposed data marts mostly focus on data
from internal data sources. Data from external data sources such as market survey
data are not covered in detail. Similarly, web data (data logged during a visit to the
organization's web site) and unstructured text data (such as comments, requests,
complaints recorded as free text during a customer's interaction with a call center
agent) are also not covered in these proposals.
THE TIME FRAME COVERED BY THE MINING
DATA MART
In general, a mining data mart should incorporate current and past data on all
crucial customer attributes to permit their monitoring over time.
Clustering models used for behavioral segmentation are less demanding in
terms of past data compared to supervised/predictive models. Although typically
6-12 months of past data are sufficient for segmentation models, the data marts
proposed here cover a period of two years to also empower predictive modeling.
Behavioral segmentation models are based only on the current or, to state
it more precisely, on the most recent view of the customer and require a simple
snapshot of this view, as shown in Figure 4.1 below. However, since the objective
is to identify a segmentation solution founded on consistent and not on random
behavioral patterns, the included data should cover a sufficient time period of at
least six months.
Predictive models such as cross-selling and churn models, on the other hand,
require the modeling dataset to be split into different time periods. To identify
data patterns associated with the occurrence of an event, the model should analyze
the customer profile before the occurrence of the event. Therefore, analysts should
focus on a past moment and analyze the customer view before the purchase of an
add-on product or before churning to a competitor.
Let us consider, for example, a typical churnmodel. During the model training
phase, the model dataset should be split to cover the following periods:
1. Historical period: Used for building the customer view in a past time period,
before the occurrence of the event. It refers to the distant past and only
predictors (input attributes) are used to build the customer view.
2. Latency period: Reserved for taking into account the time needed to collect
all necessary information to score new cases, predict future churners, and
execute the relevant campaigns.
Search WWH ::




Custom Search