Database Reference
In-Depth Information
Don't worry about the output of this command; we'll discuss that later. At this
moment, we're concerned with the usage of Weka. There are three things to note
here:
• We need to run
java
, which is counterintuitive.
• The JAR file contains over 2,000 classes, and only about 300 of those can be used
from the command line directly. How do we know which ones?
• We need to specify the entire namespace of the class:
weka.datagenera
tors.classifiers.regression.MexicanHat
. How are we supposed to remem‐
ber that?
Does this mean that we're going to give up on Weka? Of course not! Weka contains a
lot of useful functionality and we're going to tackle these three issues in the next three
subsections.
An improved command-line tool for Weka
To address the first issue, save the following snippet as a new file called
weka
, make it
executable, and move it to a directory that's on your
PATH
:
#!/usr/bin/env bash
java -Xmx1024M -cp
${
WEKAPATH
}
/weka.jar
"weka.$@"
Subsequently, add the following line to your
~/.bashrc
file so that
weka
can be called
from anywhere:
$
export
WEKAPATH
=
/home/vagrant/repos/weka
We can now call the previous example with:
$
weka datagenerators.classifiers.regression.MexicanHat -n 10
Now that's already an improvement!
Usable Weka classes
As mentioned, the file
weka.jar
contains over 2,000 classes. Many of them cannot be
used from the command line directly. We consider a class usable from the command
line when it provides us with a help message if we invoke it with the
-h
option. For
example:
$
weka datagenerators.classifiers.regression.MexicanHat -h
Data Generator options:
-h
Prints this help.
-o <file>
The name of the output file, otherwise the generated data is