Java Reference
In-Depth Information
instance customers , to include in the campaign. The concept of lift is
explored in more detail in Section 7.1.6.
The database of 1 million customers and prospects includes
demographic data such as age, income, marital status, and house-
hold size. There is also previous customer purchase data for three
related products: Tads, Zads, and Fads, which indicate whether the
customer purchased these items or not. The target attribute, what is
to be predicted, is called response . The response attribute contains a
“1” if the customer responded to the Gizmos campaign, and “0”
if not.
6.2
Data Understanding
DMWhizz's database administrators (DBAs) obtain the data and
provide it to the data miner to begin data exploration. Three tables
are obtained: CUSTOMER , PURCHASES , and PRODUCT . As illus-
trated in Figure 6-1, a customer can have many purchases, and a
product can be purchased many times.
The CUSTOMER table contains many attributes, some of which
will not be useful for mining. For example, the customer's name is
not useful in predicting who will respond in general. A rule such as
“all people named Smith buy Gizmos” likely does not hold, or even
if it did hold in the build dataset, it will not generalize to other
datasets or the broader customer base. Similarly, the attribute street
address typically will not generalize (e.g, “people on Prospect Street
buy Gizmos”). However, attributes like city, state, or zip code may
prove useful as perhaps customers in Boston find Gizmos more
appealing than customers in San Francisco. Age is the age of the
registered customers determined from their birth date relative to
the current year. Gender, income, education level, occupation, years at
residence, and household size are as reported on the latest warranty
registration card returned by the customer, or were purchased
Customer
Purchases
Product
Corresponds
to product
Has purchases
Figure 6-1
Entity-Relationship diagram of tables.
Search WWH ::




Custom Search