Database Reference
In-Depth Information
9 {32,41.05}
10 {32,0}
Using the MADlib function, kmeans_random() , the following SQL query
identifies six clusters within the provided dataset. A description of the key input
values is provided with the query.
/*
K-means analysis
cust_age_sales - SQL table containing the input data
coordinates - the column in the SQL table that contains the
data points
customer_id - the column in the SQL table that contains the
identifier for each point
km_coord - the table to store each point and its assigned
cluster
km_centers - the SQL table to store the centers of each
cluster
l2norm - specifies that the Euclidean distance formula is
used
25 - the maximum number of iterations
0.001 - a convergence criterion
False(twice) - ignore some options
6 - build six clusters
*/
SELECT madlib.kmeans_random('cust_age_sales', 'coordinates',
'customer_id', 'km_coord', 'km_centers',
'l2norm', 25 ,0.001, False, False, 6)
SELECT *
FROM km_coord
ORDER BY pid
LIMIT 10
pid coords cid
1 {1,1}:{32,14.98} 6
2 {1,1}:{32,51.48} 1
3 {1,1}:{33,151.89} 4
4 {1,1}:{27,88.28} 1
5 {1,1}:{31,4.85} 6
Search WWH ::




Custom Search