Database Reference
In-Depth Information
Figure 6. Algorithms' performance for the queries clustering
is the number of vector elements in <attribute, value> pairs set of Δ (note that, each query in the query
history is translated into vector representations), m is the number of input queries, and l is the number
of true underlying clusters. We set '0' and '1' at random on the n elements and we then generate l ran-
dom queries by sampling at random the space of all possible permutations of n elements. These initial
queries form the centers around which we build each one of the clusters. The task of the algorithms is
to rediscover the clustering model used for the data generation. Given a cluster center, each query from
the same cluster is generated by adding to the center a specified amount of noise of a specific type. We
consider two types of noise: swaps and shifts . The swap means that '0' and '1' elements from the initial
order are picked and their positions in the order are exchanged. For the shifts we pick a random element
and we move it to a new position, either earlier (or later) in the order. All elements that are between the
new and the old positions of the element are shifted on position down (or up). The amount of noise is
the number of swaps or shifts we make.
We experiment with datasets generated for the following parameters: n = 300, m = 600, l = {4, 8, 16,
32}, noise = {2, 4, 8,…, 128} for swaps. Figure 6 shows the performance of the algorithms as a function
of the amount of noise. The y axis is the ratio: F ( A )/ F ( INP ), for A = {Greedy, Greedy-Refine}, where
the Greedy- Refine algorithm is proposed in this chapter (Algorithm 3), while the Greedy algorithm
is proposed in [2]. We compare them here since they all aim at solving the same problem (clustering
problem). The F ( A ) is the total cost of the solution provided by algorithm A when the distance showed
in Equation (3) is used as a distance measure between queries. The F ( INP ) corresponds to the cost of
the clustering structure (Equation 4) used in the data generation process.
From Figure 6 we can see that: Greedy- Refine algorithm performs greatly better than Greedy algo-
rithm. The reason is that: the Greedy- Refine is executed on the queries which were arranged according
Search WWH ::




Custom Search