Database Reference
In-Depth Information
All Products can be represented by the tuples ( p1, All Products, 1, 1) ,
(p2, Appliance, 2, 1) ,and (p3,TV,3,2) . The last tuple tells, for instance,
that the TV keyword belongs to the hierarchy level 3 and its parent is in
level 2 .Dimension City would be analogous and could contain, for example,
a tuple (c4, Brussels, 3, 2) . The fact table is composed of the keys from
the dimensions Product and City , the identifier of a document containing
a combination of a product and a city, and the number of times that this
combination appears in the document. For example, a tuple in the fact table
can be ( p3 , c4 , d 1 , 3 ), indicating that the combination of keywords TV and
Brussels appears three times in document d 1 . Over this structure, OLAP
operations can be performed as usual.
In [ 117 ], Lin et al. present the notion of Text Cube ,adatacubefor
textual data, while in [ 39 ] Ding et al. study the problem of keyword-based
top-k search in text cubes, that is, given a keyword query, find the top-k
most relevant cells in a text cube. The text cube contains both structural
information (i.e., conventional dimensions) and textual information. Thus,
a text cube is a traditional OLAP data cube extended to summarize and
navigate structured and unstructured text data. A cell in the text cube
aggregates a set of documents that contain a combination of keywords
and attribute values on the cube dimensions. For example, suppose we
want to analyze reviews of television models. We can design a text cube
with schema ( Brand , Model , Price , Review ), where the first three attributes
are dimensions and the last one is the measure representing the review
documents. Consider three cells in the cube, namely, c 1 :( Sony , S1 , 400 , {
d 1 }
),
c 2 :( Sony , S2 , 800 , {
d 2 }
), and c 3 :( Panasonic , P1 , 400 , {
d 3 }
). Also, assume
that documents d 1 , d 2 ,and d 3 contain the keywords
{
light , cheap , modern
}
,
{
expensive , modern
}
{
cheap , durable
}
, respectively. If a user wants to
find out the cells in the cube that are most relevant to the keywords cheap
and durable , the answer will be c 3 since the review includes the two terms.
Cells can also be aggregated. For example, cells c 1 and c 3 above have as
parent cell (
,and
∗, ∗, 400 , {
d 1 , d 2 }
), which contains the reviews aggregated by
price. Aggregated cells can also be included in the answer to a query by
analyzing the keywords present in the union of the documents.
There are other approaches along similar lines, like the ones of Zhang
et al. [ 237 , 238 ], where the authors introduce the notion of Topic Cube ,which
combines OLAP with a probabilistic topic model. We omit the description of
these proposals here.
15.4 Multimedia Data Warehouses
New and complex kinds of data are posing new challenges to data analysis.
For example, we would like to perform OLAP operations over image or
music data and, in general, over multimedia data. For this, multimedia
Search WWH ::




Custom Search