Logistic Regression - Doing Data Science

Databases Reference

In-Depth Information

But we always have more than one way of doing this translation

—more than one possible model, more than one associated met‐

ric, and possibly more than one optimization. So the science in

data science is—given raw data, constraints, and a problem state‐

ment—how to navigate through that maze and make the best

choices. Every design choice you make can be formulated as an

hypothesis, against which you will use rigorous testing and ex‐

perimentation to either validate or refute.

This process, whereby one formulates a well-defined hypothesis

and then tests it, might rise to the level of a science in certain cases.

Specifically, the scientific method is adopted in data science as

follows:

• You hold on to your existing best performer.

• Once you have a new idea to prototype, set up an experiment

wherein the two best models compete.

• Rinse and repeat (while not overfitting).

Classifiers

This section focuses on the process of choosing a classifier . Classifi‐

cation involves mapping your data points into a finite set of labels or

the probability of a given label or labels. We've already seen some ex‐

amples of classification algorithms, such as Naive Bayes and k-nearest

neighbors (k-NN), in the previous chapters. Table 5-1 shows a few

examples of when you'd want to use classification:

Table 5-1. Classifier example questions and answers

“Will someone click on this ad?”

0 or 1 (no or yes)

“What number is this (image recognition)?”

0, 1, 2, etc.

“What is this news article about?”

“Sports”

“Is this spam?”

0 or 1

“Is this pill good for headaches?”

0 or 1

From now on we'll talk about binary classification only (0 or 1).

In this chapter, we're talking about logistic regression, but there's other

classification algorithms available, including decision trees (which

we'll cover in Chapter 7 ), random forests ( Chapter 7 ), and support

Search WWH ::

Custom Search

Home