Database Reference
In-Depth Information
Data distribution
You can provide the pattern of the data you are expecting in your application, and their
spread across a partition key. So, basically, you choose the following:
•
Size
: This shows the statistical distribution of the size of data in a column. For ex-
ample, for the e-mail address, I would like a 3 to 15 character column with normal
distribution. So, the mean e-mail address length would be 9-10 characters, which
seems reasonable. The default value is
UNIFORM(4..8)
.
•
Population
: This shows the unique column values, and how they are distributed.
For example, for a
city
column, I would opt for 20,000 unique values. Also as-
suming that most of the records belong to just a few cities, for example 20 percent
of cities account for 80 percent of records, we want some sort of diminishing dis-
tribution. Therefore, we would like to choose exponential distribution across rows
for the city column. The default value is
UNIFORM(1..100B)
.
•
Cluster
: If you are using a composite key, there are probable chances that you
have more than one record for a given row-key partition key. The
clustering
attribute defines how the cluster size varies. For example, whether you wanted to
fix the number of rows for a given partition key or you wanted to have some kind
of variation. The default value is
FIXED(1)
.
The stress tool provides six types of statistical distributions. They are as follows:
•
FIXED(value)
: This distribution always returns the same value as specified by
the argument
•
GAUSSIAN(min..max, mean, standard_deviation)
: Normal distri-
bution over
[min, max]
with mean as
mean
and
standard_deviation
•
GAUSSIAN(min..max, standard_deviation_range)
: Gaussian distri-
bution over
[min, max]
with mean at
(min+max)/2
and
stand-
ard_deviation
as
(mean-min)/standard_deviation_range
•
UNIFORM(min..max)
: Uniform distribution over
[min, max]
•
EXP(min..max)
: Exponential distribution over the range
[min, max]
•
EXTREME(min..max, shape)
: Weibull distribution over the range
[min,
max]