DMWhiz has clean data due to a data warehousing initiative they
undertook the previous quarter. However, data can be acquired for
specific tables without requiring the creation of a data warehouse.
Clean data not only simplifies the data mining effort, but can also
yield superior results.
Being satisfied with the quality of the data, the next step for
data preparation involves joining the data tables into a single table
that can be used for mining. In the PURCHASES table, the data is
represented in transactional format. This means that each item
purchased is captured in a separate row or record in the table. As
noted above, there is an identifier that associates a set of pur-
chases to a given transaction (purchase instance), and another
identifier that identifies the customer. We are interested in trans-
forming this into a table where a record indicates whether a
customer previously purchased a Tads, Zads, or Fads. Because
JDM 1.1 does not address specific transformations, an easy way to
accomplish this is to use the database query language SQL [SQL
2003] via JDBC [JDBC 2006], or SQLJ [SQLJ 2003], which enables
writing SQL statements in Java code as though they were ordinary
embedded SQL, translating such statements into JDBC invoca-
tions. Examples of executing SQL through JDBC are given in
Chapters 12 and 13.
We can define a view to provide the data in the desired format:
CREATE VIEW PURCHASES_v AS
max(decode(product_name, 'Tads', 1, 0)) AS purchased_tads, 1
max(decode(product_name, 'Zads', 1, 0)) AS purchased_zads,
max(decode(product_name, 'Fads', 1, 0)) AS purchased_fads
FROM PURCHASES AS pu, PRODUCTS AS pr
WHERE pu.product_id pr.product_id
GROUP BY customer_id;
The standard SQL syntax for decode can be expressed as
WHEN 'Tads' THEN 1
We use the Oracle SQL decode to simplify the example.