Conclusion - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

→

All Products can be represented by the tuples ( p1, All Products, 1, 1) ,

(p2, Appliance, 2, 1) ,and (p3,TV,3,2) . The last tuple tells, for instance,

that the TV keyword belongs to the hierarchy level 3 and its parent is in

level 2 .Dimension City would be analogous and could contain, for example,

a tuple (c4, Brussels, 3, 2) . The fact table is composed of the keys from

the dimensions Product and City , the identifier of a document containing

a combination of a product and a city, and the number of times that this

combination appears in the document. For example, a tuple in the fact table

can be ( p3 , c4 , d 1 , 3 ), indicating that the combination of keywords TV and

Brussels appears three times in document d 1 . Over this structure, OLAP

operations can be performed as usual.

In [ 117 ], Lin et al. present the notion of Text Cube ,adatacubefor

textual data, while in [ 39 ] Ding et al. study the problem of keyword-based

top-k search in text cubes, that is, given a keyword query, find the top-k

most relevant cells in a text cube. The text cube contains both structural

information (i.e., conventional dimensions) and textual information. Thus,

a text cube is a traditional OLAP data cube extended to summarize and

navigate structured and unstructured text data. A cell in the text cube

aggregates a set of documents that contain a combination of keywords

and attribute values on the cube dimensions. For example, suppose we

want to analyze reviews of television models. We can design a text cube

with schema ( Brand , Model , Price , Review ), where the first three attributes

are dimensions and the last one is the measure representing the review

documents. Consider three cells in the cube, namely, c 1 :( Sony , S1 , 400 , {

d 1 }

c 2 :( Sony , S2 , 800 , {

d 2 }

), and c 3 :( Panasonic , P1 , 400 , {

d 3 }

). Also, assume

that documents d 1 , d 2 ,and d 3 contain the keywords

{

light , cheap , modern

}

{

expensive , modern

}

{

cheap , durable

}

, respectively. If a user wants to

find out the cells in the cube that are most relevant to the keywords cheap

and durable , the answer will be c 3 since the review includes the two terms.

Cells can also be aggregated. For example, cells c 1 and c 3 above have as

parent cell (

,and

∗, ∗, 400 , {

d 1 , d 2 }

), which contains the reviews aggregated by

price. Aggregated cells can also be included in the answer to a query by

analyzing the keywords present in the union of the documents.

There are other approaches along similar lines, like the ones of Zhang

et al. [ 237 , 238 ], where the authors introduce the notion of Topic Cube ,which

combines OLAP with a probabilistic topic model. We omit the description of

these proposals here.

15.4 Multimedia Data Warehouses

New and complex kinds of data are posing new challenges to data analysis.

For example, we would like to perform OLAP operations over image or

music data and, in general, over multimedia data. For this, multimedia

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home