Mining Functions and Algorithms - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Attributes

Transaction

Id

Milk

Bread

Eggs Bananas Cereal

...

1

2

3

4

1

0

1

0

1

0

1

0

1

0

Cases

X 1

X 2

X m

. . .

Predictor Attributes

Case Identifier

Figure 4-5

Characterization of data used for Association—single

record case.

Transaction

Id

Attribute

Name

Value

Milk

Bread

Eggs

Cereal

1

Case 1

Bread

2

1

Case 2

Bananas

2

1

Cereal

2

1

…

Figure 4-6

Characterization of data used for Association—

multirecord case.

The choice of data format depends on how the data was originally

maintained, perhaps in the data warehouse. But more importantly, it

depends on the ability of the algorithm to handle a particular format.

Multirecord case format is considered a sparse representation of the

data since it only contains the items of interest. Single record case

format is considered a dense representation since it contains all the

information. As you may expect, sparse representations can be more

space efficient depending on the data. Consider a grocery store that

sells 10,000 different products and where customers purchase on

average 10 products at a time. If we maintained the data in single

Java Data Mining: Strategy, Standard, and Practice

Search WWH ::

Custom Search

Home