Databases Reference
In-Depth Information
cell is poor for prediction. A better solution is probably to drill down on the query cell
to a more specific one (i.e., asking more specific queries). Second, a small sample size
can cause a large confidence interval. When there are very few samples, the correspond-
ing t c is large because of the small degree of freedom. This in turn could cause a large
confidence interval. Intuitively, this makes sense. Suppose one is trying to figure out the
average income of people in the United States. Just asking two or three people does not
give much confidence to the returned response.
The best way to solve this small sample size problem is to get more data. Fortunately,
there is usually an abundance of additional data available in the cube. The data do not
match the query cell exactly; however, we can consider data from cells that are “close
by.” There are two ways to incorporate such data to enhance the reliability of the query
answer: (1) intracuboid query expansion , where we consider nearby cells within the same
cuboid, and (2) intercuboid query expansion , where we consider more general versions
(from parent cuboids) of the query cell. Let's see how this works, starting with intra-
cuboid query expansion.
Method 1. Intracuboid query expansion. Here, we expand the sample size by including
nearby cells in the same cuboid as the queried cell, as shown in Figure 5.15(a). We just
have to be careful that the new samples serve to increase the confidence in the answer
without changing the query's semantics.
So, the first question is “ Which dimensions should be expanded? ” The best candidates
should be the dimensions that are uncorrelated or weakly correlated with the measure
age - occupation cuboid
(a)
age cuboid
occupation cuboid
age - occupation cuboid
(b)
Figure 5.15 Query expansion within sampling cube: Given small data samples, both methods use strate-
gies to boost the reliability of query answers by considering additional data cell values.
(a) Intracuboid expansion considers nearby cells in the same cuboid as the queried cell.
(b) Intercuboid expansion considers more general cells from parent cuboids.
 
Search WWH ::




Custom Search