Database Reference
In-Depth Information
In SQL this is expressed as: update
<
table
>
set
<
field
> = <
sub query or
>
Restructure: Pivots data and generates multiple fields based on the values
of another field. Restructure is usually followed by aggregation. For instance,
restructure can be applied when we have groups of related records (e.g., multiple
records denoting the transaction amount and the transaction channel for each
customer) that we want to rearrange so that the original multiple records are
pivoted into a single record with multiple separate fields: that is, one record for
each customer with multiple fields, denoting the relative sum for each distinct
transaction channel. In SQL this is expressed as: case when
formula
<
formula
>
then
end
Derive/compute: Modifies data values and constructs new fields from existing
ones, based on calculations, logical conditions, and functions. During lengthy
data mining projects, it is common to perform several derivations, such as flags,
ratios, percentages, etc.
<
value1
>
else
<
value2
>
As a simplified example of a typical data processing procedure required for
the construction of the MCIF, let us consider the transaction data in Table 4.12
for two bank customers. The data denote the volume of transactions (Amount_
Transactions) per channel, in a format similar to the data mart's ''M_Transactions''
table presented earlier. The objective is to turn these data into informative
indicators of the channel utilization for each customer.
This information is detailed and informative, yet not immediately suitable for
mining purposes. It has to go through extensive data preparation before revealing
the transactional profile of each customer. The data preparation steps necessary in
order to summarize this information at a customer level for the needs of the MCIF
include:
1. Restructure and aggregation of the transactional data at a customer
level: The multiple records per customer (one record per year/month, account
ID, transaction code, and transaction channel) are pivoted and aggregated so
that each customer is represented by a single record and a set of relevant fields
which indicate the total volume for each transaction channel. We assume here
that no differentiation in terms of product code or time period is necessary for
the needs of our analysis. The intermediate table which is produced through
restructure and aggregation is given in Table 4.13.
The IBM SPSS Modeler Restructure node used for pivoting the data is
displayed in Figure 4.2.
2. Derive new fields to enrich the original information: After the data have
been pivoted and aggregated at a customer level, the next step is to enrich the
Search WWH ::




Custom Search