Database Reference
In-Depth Information
Table 7. Raw data (R)
Id
Age
Price
t 1 t 2 t 3
221924
9500099000120000
Table 8. Grid-cells mapping
Cell
Age
Price
Id
c 1
old
0.3/ cheap
t 1
c 2
old
reasonable
t 1 , t 2 , t 3
For the sake of simplicity, we have only reported the linguistic labels ( intent ) and the row Ids ( extent )
that point to tuples described by those linguistic labels.
3.1.2 Discussion about SAINTETIQ
Cluster analysis is one of the most useful tasks in data mining (Maimon & Rokach, 2005) process for
discovering groups and identifying interesting distributions and patterns in the underlying data. The
clustering problem is about partitioning a given data set into groups (clusters) such that the data points
in a cluster are more similar to each other than to points in different clusters.
Up to now, many clustering methods (Berkhin, 2006) have been proposed and, among them, grid-
based clustering methods (e.g., STING (Wang, Yang & Muntz, 1997), BANG (Schikuta & Erhart, 1998),
WaveCluster (Sheikholeslami, Chatterjee & Zhang, 2000), etc.). Grid-based clustering methods first
partition the data by applying a multidimensional grid structure on the feature space. Second, statisti-
cal information (e.g., min, max, mean, standard deviation, distribution) is collected for all the database
records located in each individual grid cell and clustering is performed on populated cells to form
clusters. These methods have been proved as valuable tools for analyzing the structural information of
very large databases. One of the most appealing factors is the excellent runtime behavior. In fact, their
processing time only depends on the number of populated cells L which is usually much less than the
number of database records n (L << n) (Berkhin, 2006). More specifically, the time complexity T SEQ of
the SAINTETIQ process and especially, its summarization service ( SEQ ) - the mapping service will not
be further discussed as it is a straightforward rewriting process, can be expressed as:
T SEQ = k SEQ L log( L ) O ( L log( L )
Figure 16. Example of SAINTETIQ hierarchy
Search WWH ::




Custom Search