Database Reference
In-Depth Information
There's more…
incanter.som/som-batch-train
has more information about the
som/som-batch-train
function and its parameters
Classifying data with decision trees
One way to classify documents is to follow a hierarchical tree of rules, inally placing
an instance into a bucket. This is essentially what decision trees do. Although they can
work with any type of data, they are especially helpful in classifying nominal variables
(discrete categories of data such as the
species
attribute of the Iris dataset), where
statistics designed for working with numerical data—such as K-Means clustering—doesn't
work as well.
Decision trees have another handy feature. Unlike many types of data mining where the
analysis is somewhat of a black box, decision trees are very intelligible. We can easily
examine them and readily tell how and why they classify our data the way they do.
In this recipe, we'll look at a dataset of mushrooms and create a decision tree to tell us
whether a mushroom instance is edible or poisonous.
Getting ready
First, we'll need to use the dependencies that we speciied in the
project.clj
ile in the
Loading CSV and ARFF iles into Weka
recipe.
We'll also need these imports in our script or REPL:
(import [weka.classifiers.trees J48])
(require '[clojure.java.io :as io])
For data, we'll use one of the UCI datasets that Weka provides. You can download this set
ARFF iles into Weka
recipe.