Database Reference
In-Depth Information
Converting Between CSV and ARFF
Weka uses ARFF as a file format. This is basically CSV with additional information
about the columns. We'll use two convenient command-line tools to convert between
CSV and ARFF, namely csv2arff (see Example 9-1 ) and arff2csv (see Example 9-2 ).
Example 9-1. Convert CSV to ARFF (csv2arf)
#!/usr/bin/env bash
weka core.converters.CSVLoader /dev/stdin
Example 9-2. Convert ARFF to CSV (arf2csv)
#!/usr/bin/env bash
weka core.converters.CSVSaver -i /dev/stdin
Comparing Three Clustering Algorithms
In order to cluster data using Weka, we need yet another custom command-line tool
to help us with this. The AddCluster class is needed to assign data points to the
learned clusters. Unfortunately, this class does not accept data from standard input,
not even when we specify -i /dev/stdin , because it expects a file with the .arf
extension. We consider this to be bad design. The source code of weka-cluster is:
#!/usr/bin/env bash
ALGO = "$@"
IN = $( mktemp --tmpdir weka-cluster-XXXXXXXX ) .arff
finish () {
rm -f $IN
}
trap finish EXIT
csv2arff > $IN
weka filters.unsupervised.attribute.AddCluster -W "weka.${ALGO}" -i $IN \
-o /dev/stdout | arff2csv
Now we can apply the EM clustering algorithm and save the assignment as follows:
$ cd data
$ < wine-both-scaled.csv csvcut -C quality,type |
> weka-cluster clusterers.EM -N 5 |
> csvcut -c cluster > data/wine-both-cluster-em.csv
Use the scaled data set, and don't use the features quality and type for the clus‐
tering
Apply the algorithm using weka-cluster
Search WWH ::




Custom Search