Database Reference
In-Depth Information
Number of subslices
1000/day
7000/week
DATENO
ITEMNO
......
763
764
Slice
(7 days)
0.7% × 1B
= 7M
Day
1M
......
764
1002
......
1002
Subslice
Avg 10 rows
(1M/100K)
......
764
......
......
2890
764
......
......
2890
764
......
764
765
......
770
Figure 9.7 Fact table scans.
Before accessing the fact table, the intermediate table would have been sorted
by the fat index columns. This is a relatively fast operation, assuming there is
enough room for the work files in memory.
With the given filter factors and the assumed number of qualifying fact
table rows, the local response time for the star join will probably be a few
minutes, as long as the optimizer chooses the same access path. This estimate
includes accessing the dimension tables as well as building and reading the two
intermediate tables.
If there are only four dimension tables, we are reasonably well prepared for
any star join with four fat indexes on the fact table, each one starting with the
foreign key of a dimension table, as shown in Figure 9.8.
The four fat indexes on a billion-row table would, however, be quite large.
The total disk space requirement for these four indexes would probably be hun-
dreds of gigabytes, perhaps
4
×
1
.
5
×
1
,
000
,
000
,
000
×
40 bytes
=
240 GB
Assuming a monthly rent for disk space of $50 per gigabyte, the cost of the
four indexes would then be U.S.$12,000 per month. In actual fact, a fat index
on a table is often larger than the fact table itself for two reasons:
1. Tables are normally compressed, whereas indexes are not.
2. As the inserts to a fact table normally go at the end, a fact table may not
need any distributed free space. The inserts to all indexes other than the
clustering index (often starting with a time stamp) are random. To avoid
frequent index reorganization, which could take several hours with one
billion rows in the table even with current hardware, most of the indexes
Search WWH ::




Custom Search