Modeling Data - Data Science at the Command Line

Database Reference

In-Depth Information

Introducing Weka

You may ask, surely there are better command-line tools for clustering? And you are

right. One reason we include Weka in this chapter is to show you how you can work

around imperfections by building additional command-line tools. As you spend

more time on the command line and try out other command-line tools, chances are

that you come across one that seems very promising at first, but does not work as you

expected. A common imperfection, for example, is that the command-line tool does

not handle standard input or standard output correctly. In the next section, we'll

point out Weka's imperfections and demonstrate how we work around them.

Taming Weka on the Command Line

Weka can be invoked from the command line, but it's definitely not straightforward

or user friendly. Weka is programmed in Java, which means that you have to run

java , specify the location of the weka.jar file, and specify the individual class you

want to call. For example, Weka has a class called MexicanHat , which generates a toy

data set. To generate 10 data points using this class, you would run:

$ java -cp ~/bin/weka.jar weka.datagenerators.classifiers.regression.MexicanHat \

> -n 10 | fold

%

% Commandline

%

% weka.datagenerators.classifiers.regression.MexicanHat -r weka.datagenerators.c

lassifiers.regression.MexicanHat-S_1_-n_10_-A_1.0_-R_-10..10_-N_0.0_-V_1.0 -S 1

-n 10 -A 1.0 -R -10..10 -N 0.0 -V 1.0

%

@relation weka.datagenerators.classifiers.regression.MexicanHat-S_1_-n_10_-A_1.0

_-R_-10..10_-N_0.0_-V_1.0

@attribute x numeric

@attribute y numeric

@data

4.617564,-0.215591

-1.798384,0.541716

-5.845703,-0.072474

-3.345659,-0.060572

9.355118,0.00744

-9.877656,-0.044298

9.274096,0.016186

8.797308,0.066736

8.943898,0.051718

8.741643,0.072209

Search WWH ::

Custom Search

Home