Data Warehousing and Online Analytical Processing - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

(b) The RFID data may contain lots of redundant information. Discuss a method

that maximally reduces redundancy during data registration in the RFID data

warehouse.

(c) The RFID data may contain lots of noise such as missing registration and misread

IDs. Discuss a method that effectively cleans up the noisy data in the RFID data

warehouse.

(d) You may want to perform online analytical processing to determine how many TV

sets were shipped from the LA seaport to BestBuy in Champaign, IL, by month ,

brand , and price range . Outline how this could be done efficiently if you were to

store such RFID data in the warehouse.

(e) If a customer returns a jug of milk and complains that is has spoiled before its expi-

ration date, discuss how you can investigate such a case in the warehouse to find out

what the problem is, either in shipping or in storage.

4.12 In many applications, new data sets are incrementally added to the existing large

data sets. Thus, an important consideration is whether a measure can be computed

efficiently in an incremental manner. Use count, standard deviation , and median as

examples to show that a distributive or algebraic measure facilitates efficient incremental

computation, whereas a holistic measure does not.

4.13 Suppose that we need to record three measures in a data cube: min() , average() , and

median() . Design an efficient computation and storage method for each measure given

that the cube allows data to be deleted incrementally (i.e., in small portions at a time)

from the cube.

4.14 In data warehouse technology, a multiple dimensional view can be implemented by

a relational database technique ( ROLAP ), by a multidimensional database technique

( MOLAP ), or by a hybrid database technique ( HOLAP ).

(a) Briefly describe each implementation technique.

(b) For

each

technique,

explain

how

each

of

the

following

functions

may

be

implemented:

i. The generation of a data warehouse (including aggregation)

ii. Roll-up

iii. Drill-down

iv. Incremental updating

(c) Which implementation techniques do you prefer, and why?

4.15 Suppose that a data warehouse contains 20 dimensions, each with about five levels of

granularity.

(a) Users are mainly interested in four particular dimensions, each having three fre-

quently accessed levels for rolling up and drilling down. How would you design a

data cube structure to support this preference efficiently?

(b) At times, a user may want to drill through the cube to the raw data for one or two

particular dimensions. How would you support this feature?

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home