Getting Started - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

6.3

Data Preparation

DMWhiz has clean data due to a data warehousing initiative they

undertook the previous quarter. However, data can be acquired for

specific tables without requiring the creation of a data warehouse.

Clean data not only simplifies the data mining effort, but can also

yield superior results.

Being satisfied with the quality of the data, the next step for

data preparation involves joining the data tables into a single table

that can be used for mining. In the PURCHASES table, the data is

represented in transactional format. This means that each item

purchased is captured in a separate row or record in the table. As

noted above, there is an identifier that associates a set of pur-

chases to a given transaction (purchase instance), and another

identifier that identifies the customer. We are interested in trans-

forming this into a table where a record indicates whether a

customer previously purchased a Tads, Zads, or Fads. Because

JDM 1.1 does not address specific transformations, an easy way to

accomplish this is to use the database query language SQL [SQL

2003] via JDBC [JDBC 2006], or SQLJ [SQLJ 2003], which enables

writing SQL statements in Java code as though they were ordinary

embedded SQL, translating such statements into JDBC invoca-

tions. Examples of executing SQL through JDBC are given in

Chapters 12 and 13.

We can define a view to provide the data in the desired format:

CREATE VIEW PURCHASES_v AS

SELECT customer_id,

max(decode(product_name, 'Tads', 1, 0)) AS purchased_tads, 1

max(decode(product_name, 'Zads', 1, 0)) AS purchased_zads,

max(decode(product_name, 'Fads', 1, 0)) AS purchased_fads

FROM PURCHASES AS pu, PRODUCTS AS pr

WHERE pu.product_id pr.product_id

GROUP BY customer_id;

1

The standard SQL syntax for decode can be expressed as

CASE product_name

WHEN 'Tads' THEN 1

ELSE 0

END

We use the Oracle SQL decode to simplify the example.

Search WWH ::

Custom Search

Home