Java Reference
In-Depth Information
Attributes
Transaction
Id
Milk
Bread
Eggs Bananas Cereal
...
1
2
3
4
1
0
0
1
1
1
0
0
1
0
0
0
0
1
1
0
1
1
1
0
Cases
X 1
X 2
X m
. . .
Predictor Attributes
Case Identifier
Figure 4-5
Characterization of data used for Association—single
record case.
Transaction
Id
Attribute
Name
Value
Milk
Bread
Eggs
Cereal
1
1
1
1
1
1
1
1
Case 1
Bread
2
1
Case 2
Bananas
2
1
Cereal
2
1
Figure 4-6
Characterization of data used for Association—
multirecord case.
The choice of data format depends on how the data was originally
maintained, perhaps in the data warehouse. But more importantly, it
depends on the ability of the algorithm to handle a particular format.
Multirecord case format is considered a sparse representation of the
data since it only contains the items of interest. Single record case
format is considered a dense representation since it contains all the
information. As you may expect, sparse representations can be more
space efficient depending on the data. Consider a grocery store that
sells 10,000 different products and where customers purchase on
average 10 products at a time. If we maintained the data in single
Search WWH ::




Custom Search