Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

The working directory contains a CSV file ( sample1.csv ). The file has a header

row, followed by 14 rows of training data. The attributes include Age , Income ,

JobSatisfaction , and Desire . The output variable is Enrolls , and its value

is either Yes or No . Full content of the CSV file is shown next.

Age,Income,JobSatisfaction,Desire,Enrolls

<=30,High,No,Fair,No

<=30,High,No,Excellent,No

31 to 40,High,No,Fair,Yes

>40,Medium,No,Fair,Yes

>40,Low,Yes,Fair,Yes

>40,Low,Yes,Excellent,No

31 to 40,Low,Yes,Excellent,Yes

<=30,Medium,No,Fair,No

<=30,Low,Yes,Fair,Yes

>40,Medium,Yes,Fair,Yes

<=30,Medium,Yes,Excellent,Yes

31 to 40,Medium,No,Excellent,Yes

31 to 40,High,Yes,Fair,Yes

>40,Medium,No,Excellent,No

<=30,Medium,Yes,Fair,

The last record of the CSV is used later for illustrative purposes as a test case.

Therefore, it does not include a value for the output variable Enrolls , which

should be predicted using the naïve Bayes classifier built from the training set.

Execute the following R code to read data from the CSV file.

# read the data into a table from the file

sample <- read.table("sample1.csv",header=TRUE,sep=",")

# define the data frames for the NB classifier

traindata <- as.data.frame(sample[1:14,])

testdata <- as.data.frame(sample[15,])

Two data frame objects called traindata and testdata are created for the naïve

Bayes classifier. Enter traindata and testdata to display the data frames.

The two data frames are printed on the screen as follows.

Search WWH ::

Custom Search

Home