Database Reference
In-Depth Information
resultset. This is data where the number of distinct items in the set of rows divided by the number of rows is a small
number (near zero). For example, a
GENDER
column might take on the values
M
,
F
, and
NULL
. If you have a table with
20,000 employee records in it, then you would find that 3/20000 = 0.00015. Likewise, 100,000 unique values out of
10,000,000 results in a ratio of 0.01—again, very small. These columns would be candidates for bitmap indexes.
They probably would
not
be candidates for a having B*Tree indexes, as each of the values would tend to retrieve an
extremely large percentage of the table. B*Tree indexes should be selective in general, as outlined earlier. Bitmap
indexes should not be selective—on the contrary, they should be very
unselective
in general.
Bitmap indexes are extremely useful in environments where you have lots of ad hoc queries, especially
queries that reference many columns in an ad hoc fashion or produce aggregations such as
COUNT
. For example,
suppose you have a large table with three columns:
GENDER
,
LOCATION
, and
AGE_GROUP
. In this table,
GENDER
has
a value of
M
or
F
,
LOCATION
can take on the values
1
through
50
, and
AGE_GROUP
is a code representing
18 and
under
,
19-25
,
26-30
,
31-40
, and
41 and over
. You have to support a large number of ad hoc queries that take the
following form:
select count(*)
from t
where gender = 'M'
and location in ( 1, 10, 30 )
and age_group = '41 and over';
select *
from t
where ( ( gender = 'M' and location = 20 )
or ( gender = 'F' and location = 22 ))
and age_group = '18 and under';
select count(*) from t where location in (11,20,30);
select count(*) from t where age_group = '41 and over' and gender = 'F';
You would find that a conventional B*Tree indexing scheme would fail you. If you wanted to use an index to get
the answer, you would need at least three and up to six combinations of possible B*Tree indexes to access the data
via the index. Since any of the three columns or any subset of the three columns may appear, you would need large
concatenated B*Tree indexes on the following:
GENDER
,
LOCATION
,
AGE_GROUP
: For queries that used all three, or
GENDER
with
LOCATION
,
or
GENDER
alone
•
LOCATION
,
AGE_GROUP
: For queries that used
LOCATION
and
AGE_GROUP
or
LOCATION
alone
•
AGE_GROUP
,
GENDER
: For queries that used
AGE_GROUP
with
GENDER
or
AGE_GROUP
alone
•
To reduce the amount of data being searched, other permutations might be reasonable as well in order to
decrease the size of the index structure being scanned. This is ignoring the fact that a B*Tree index on such low
cardinality data is not a good idea.
Here is where the bitmap index comes into play. With three small bitmap indexes, one on each of the
individual columns, you will be able to satisfy all of the previous predicates efficiently. Oracle will simply use
the functions AND, OR, and NOT, with the bitmaps of the three indexes together, to find the solution set for any
predicate that references any set of these three columns. It will take the resulting merged bitmap, convert the 1s
into rowids if necessary, and access the data (if you are just counting rows that match the criteria, Oracle will just