Clustering, Classifying, and Working with Weka - Clojure Data Analysis

Database Reference

In-Depth Information

There's more…

F The Incanter documentation at http://clojuredocs.org/incanter/

incanter.som/som-batch-train has more information about the

som/som-batch-train function and its parameters

F Tom Germano has a more in-depth discussion of SOMs at http://davis.wpi.

edu/~matt/courses/soms/

Classifying data with decision trees

One way to classify documents is to follow a hierarchical tree of rules, inally placing

an instance into a bucket. This is essentially what decision trees do. Although they can

work with any type of data, they are especially helpful in classifying nominal variables

(discrete categories of data such as the species attribute of the Iris dataset), where

statistics designed for working with numerical data—such as K-Means clustering—doesn't

work as well.

Decision trees have another handy feature. Unlike many types of data mining where the

analysis is somewhat of a black box, decision trees are very intelligible. We can easily

examine them and readily tell how and why they classify our data the way they do.

In this recipe, we'll look at a dataset of mushrooms and create a decision tree to tell us

whether a mushroom instance is edible or poisonous.

Getting ready

First, we'll need to use the dependencies that we speciied in the project.clj ile in the

Loading CSV and ARFF iles into Weka recipe.

We'll also need these imports in our script or REPL:

(import [weka.classifiers.trees J48])

(require '[clojure.java.io :as io])

For data, we'll use one of the UCI datasets that Weka provides. You can download this set

from http://www.cs.waikato.ac.nz/ml/weka/datasets.html , or more directly

arff . We can load the dataile using the load-arff function from the Loading CSV and

ARFF iles into Weka recipe.

Search WWH ::

Custom Search

Home