Java Reference
In-Depth Information
The first method collects the sampling parameters such as the
data table name, name of columns to be used as mining attributes,
the name of the customer identifier column, and the percentage of
records to be kept in the sampled dataset. Some sanity checks can be
performed when collecting parameters, for example, verifying that
the input table and identified column names exist. These are identi-
fied as “TODO” items. The code reads as follows:
1. public void defineSamplingParameters(String iInputTableName,
2. Collection iColumnNames,
3. String iIdentifierColumnName,
4. double iPercentage,
5. String iOutputTableName) throws SQLException {
6. mInputTableName iInputTableName;
7. mColumnNames iColumnNames;
8. mPercentage iPercentage;
9. mIdentifierColumnName iIdentifierColumnName;
10. mOutputTableName iOutputTableName;
11. //TODO: add test about presence of the input table.
12. //TODO: add tests about the existence of columns
13. //TODO: add test about absence of the output table.
14. mCurrentState CampaignOptimizerState.SAMPLE_SELECTING;
15. }
The next method uses the JDBC connection to generate an SQL
statement to produce the sample. In the following code, we assume that
the customer identifier is an integer to allow SQL code that could be
used on any database. When the remainder of dividing the customer
identifier by 100 is below the sample percentage provided, we add that
record to the sample. This approach has limitations, but it can work rea-
sonably well if the customer identifiers' last two digits are uniformly
distributed. Most of the commercial databases now provide specific
SQL extensions to perform random sampling, such as the SAMPLE
statement for Oracle, 1 but these extensions are not yet standardized.
The output table is generated with a column called RESPONSE
filled with NULL values. This column will later be filled by the IT
department with actual customer responses obtained from the
starter campaign; these will be used as the target values to build a
The Oracle sample clause can be used as follows:
SELECT * FROM tablename SAMPLE (20) SEED (4)
In this example, a sample containing about 20 percent of the records in “table-
name” will be returned. A “seed” value can be provided to ensure either the
same sample is returned or a different sample is returned upon subsequent
Search WWH ::

Custom Search