Databases Reference
In-Depth Information
## report the measures of model fitness
print ( fit $ importance )
print ( fit )
print ( table ( iris_test $ species , predict ( fit , iris_test , type = "class" )))
## visualize results
plot ( fit , log = "y" , main = "Random Forest" )
varImpPlot ( fit )
MDSplot ( fit , iris_full $ species )
## export PMML + test data
out <- iris_full
out $ predict <- predict ( fit , out , type = "class" )
dat_folder <- './data'
tsv <- paste ( dat_folder , "iris.rf.tsv" , sep = "/" )
write.table ( out , file = tsv , quote = FALSE , sep = "\t" , row.names = FALSE )
saveXML ( pmml ( fit ), file = paste ( dat_folder , "iris.rf.xml" , sep = "/" ))
The R script loads the required packages, along with the Iris data set. It splits the data
set into two parts: iris_train and iris_test . Then it trains a Random Forest model
using the iris_train part, using the petal and sepal measures to predict species.
The results of this model creation get evaluated and visualized in a few different ways.
First we have a few printed reports about the fitness of the model. One well-known
aspect of the Iris data set is that the “setosa” species is relatively easy to predict, whereas
the other two species have overlap, which confuses predictive models. We see that in
the results, but overall there is an estimated 5% error rate:
OOB estimate of error rate: 5 %
Confusion matrix:
setosa versicolor virginica class.error
setosa 32 0 0 0.00000000
versicolor 0 26 2 0.07142857
virginica 0 3 37 0.07500000
The chart in Figure 6-7 shows error rate versus the number of trees. One of the param‐
eters for training an RF model is to select the number of trees in the forest. As that
parameter approaches 50 trees, decrease in error levels out.
Search WWH ::




Custom Search