Database Reference
In-Depth Information
The working directory contains a CSV file (
sample1.csv
). The file has a header
row, followed by 14 rows of training data. The attributes include
Age
,
Income
,
JobSatisfaction
, and
Desire
. The output variable is
Enrolls
, and its value
is either
Yes
or
No
. Full content of the CSV file is shown next.
Age,Income,JobSatisfaction,Desire,Enrolls
<=30,High,No,Fair,No
<=30,High,No,Excellent,No
31 to 40,High,No,Fair,Yes
>40,Medium,No,Fair,Yes
>40,Low,Yes,Fair,Yes
>40,Low,Yes,Excellent,No
31 to 40,Low,Yes,Excellent,Yes
<=30,Medium,No,Fair,No
<=30,Low,Yes,Fair,Yes
>40,Medium,Yes,Fair,Yes
<=30,Medium,Yes,Excellent,Yes
31 to 40,Medium,No,Excellent,Yes
31 to 40,High,Yes,Fair,Yes
>40,Medium,No,Excellent,No
<=30,Medium,Yes,Fair,
The last record of the CSV is used later for illustrative purposes as a test case.
Therefore, it does not include a value for the output variable
Enrolls
, which
should be predicted using the naïve Bayes classifier built from the training set.
Execute the following R code to read data from the CSV file.
# read the data into a table from the file
sample <- read.table("sample1.csv",header=TRUE,sep=",")
# define the data frames for the NB classifier
traindata <- as.data.frame(sample[1:14,])
testdata <- as.data.frame(sample[15,])
Two data frame objects called
traindata
and
testdata
are created for the naïve
Bayes classifier. Enter
traindata
and
testdata
to display the data frames.
The two data frames are printed on the screen as follows.