Database Reference
In-Depth Information
There's more…
F The Incanter documentation at http://clojuredocs.org/incanter/
incanter.som/som-batch-train has more information about the
som/som-batch-train function and its parameters
F Tom Germano has a more in-depth discussion of SOMs at http://davis.wpi.
edu/~matt/courses/soms/
Classifying data with decision trees
One way to classify documents is to follow a hierarchical tree of rules, inally placing
an instance into a bucket. This is essentially what decision trees do. Although they can
work with any type of data, they are especially helpful in classifying nominal variables
(discrete categories of data such as the species attribute of the Iris dataset), where
statistics designed for working with numerical data—such as K-Means clustering—doesn't
work as well.
Decision trees have another handy feature. Unlike many types of data mining where the
analysis is somewhat of a black box, decision trees are very intelligible. We can easily
examine them and readily tell how and why they classify our data the way they do.
In this recipe, we'll look at a dataset of mushrooms and create a decision tree to tell us
whether a mushroom instance is edible or poisonous.
Getting ready
First, we'll need to use the dependencies that we speciied in the project.clj ile in the
Loading CSV and ARFF iles into Weka recipe.
We'll also need these imports in our script or REPL:
(import [weka.classifiers.trees J48])
(require '[clojure.java.io :as io])
For data, we'll use one of the UCI datasets that Weka provides. You can download this set
from http://www.cs.waikato.ac.nz/ml/weka/datasets.html , or more directly
from http://www.ericrochester.com/clj-data-analysis/data/UCI/mushroom.
arff . We can load the dataile using the load-arff function from the Loading CSV and
ARFF iles into Weka recipe.
 
Search WWH ::




Custom Search