Database Reference
In-Depth Information
We'll also use the defanalysis macro from the Discovering groups of data using K-Means
clustering in Weka recipe.
As a bonus, if you have Graphviz installed ( http://www.graphviz.org/ ), you can use it to
generate a graph of the decision tree. Wikipedia lists other programs that can display DOT or
GV iles, at http://en.wikipedia.org/wiki/DOT_language#Layout_programs .
How to do it…
We'll build a wrapper for the J48 class. This is Weka's implementation of the C4.5 algorithm
for building decision trees:
1.
First, we create the wrapper function for this algorithm:
(defanalysis
j48 J48 buildClassifier
[["-U" pruned true :flag-false]
["-C" confidence 0.25]
["-M" min-instances 2]
["-R" reduced-error false :flag-true]
["-N" folds 3 :predicate reduced-error]
["-B" binary-only false :flag-true]
["-S" subtree-raising true :flag-false]
["-L" clean true :flag-false]
["-A" smoothing true :flag-true]
["-J" mdl-correction true :flag-false]
["-Q" seed 1 random-seed]])
2.
We can use this function to create a decision tree of the mushroom data, but
before that, we have to load the ile and tell it which ield contains the classiication
for each one. In this case, it's the last ield that tells whether the mushroom is
poisonous or edible:
(def shrooms
(doto (load-arff "data/UCI/mushroom.arff")
(.setClassIndex 22)))
(def d-tree (j48 shrooms :pruned true))
3. The decision tree outputs Graphviz dot data, so we can write the data to a ile and
generate an image from that:
(with-open [w (io/writer "decision-tree.gv")]
(.write w (.graph d-tree)))
4.
Now, from the command line, process decision-tree.gv with dot. If you're using
another program to process the Graphviz ile, substitute that here:
$ dot -O -Tpng decision-tree.gv
 
Search WWH ::




Custom Search