Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Exercises

1. For a binary classification, describe the possible values of entropy. On what

conditions does entropy reach its minimum and maximum values?

2. In a decision tree, how does the algorithm pick the attributes for splitting?

3. John went to see the doctor about a severe headache. The doctor selected

John at random to have a blood test for swine flu, which is suspected to

affect 1 in 5,000 people in this country. The test is 99% accurate, in the

sense that the probability of a false positive is 1%. The probability of a false

negative is zero. John's test came back positive. What is the probability that

John has swine flu?

4. Which classifier is considered computationally efficient for

high-dimensional problems? Why?

5. A data science team is working on a classification problem in which the

dataset contains many correlated variables, and most of them are

categorical variables. Which classifier should the team consider using?

Why?

6. A data science team is working on a classification problem in which the

dataset contains many correlated variables, and most of them are

continuous. The team wants the model to output the probabilities in

addition to the class labels. Which classifier should the team consider

using? Why?

7. Consider the following confusion matrix:

Predicted Class Total

Good Bad

Actual Class Good

671

29

300

Bad

38

262

700

Total

709

291

1000

What are the true positive rate, false positive rate, and false negative rate?

Search WWH ::

Custom Search

Home