Database Reference
In-Depth Information
Exercises
1. For a binary classification, describe the possible values of entropy. On what
conditions does entropy reach its minimum and maximum values?
2. In a decision tree, how does the algorithm pick the attributes for splitting?
3. John went to see the doctor about a severe headache. The doctor selected
John at random to have a blood test for swine flu, which is suspected to
affect 1 in 5,000 people in this country. The test is 99% accurate, in the
sense that the probability of a false positive is 1%. The probability of a false
negative is zero. John's test came back positive. What is the probability that
John has swine flu?
4. Which classifier is considered computationally efficient for
high-dimensional problems? Why?
5. A data science team is working on a classification problem in which the
dataset contains many correlated variables, and most of them are
categorical variables. Which classifier should the team consider using?
Why?
6. A data science team is working on a classification problem in which the
dataset contains many correlated variables, and most of them are
continuous. The team wants the model to output the probabilities in
addition to the class labels. Which classifier should the team consider
using? Why?
7. Consider the following confusion matrix:
Predicted Class Total
Good Bad
Actual Class Good
671
29
300
Bad
38
262
700
Total
709
291
1000
What are the true positive rate, false positive rate, and false negative rate?
Search WWH ::




Custom Search