Databases Reference
In-Depth Information
But how do you build a model?
How do you have any clue whatsoever what functional form the data
should take? Truth is, it's part art and part science. And sadly, this is
where you'll find the least guidance in textbooks, in spite of the fact
that it's the key to the whole thing. After all, this is the part of the
modeling process where you have to make a lot of assumptions about
the underlying structure of reality, and we should have standards as
to how we make those choices and how we explain them. But we don't
have global standards, so we make them up as we go along, and hope‐
fully in a thoughtful way.
We're admitting this here: where to start is not obvious. If it were, we'd
know the meaning of life. However, we will do our best to demonstrate
for you throughout the topic how it's done.
One place to start is exploratory data analysis (EDA), which we will
cover in a later section. This entails making plots and building intu‐
ition for your particular dataset. EDA helps out a lot, as well as trial
and error and iteration.
To be honest, until you've done it a lot, it seems very mysterious. The
best thing to do is start simply and then build in complexity. Do the
dumbest thing you can think of first. It's probably not that dumb.
For example, you can (and should) plot histograms and look at scat‐
terplots to start getting a feel for the data. Then you just try writing
something down, even if it's wrong first (it will probably be wrong first,
but that doesn't matter).
So try writing down a linear function (more on that in the next chap‐
ter). When you write it down, you force yourself to think: does this
make any sense? If not, why? What would make more sense ? You start
simply and keep building it up in complexity, making assumptions,
and writing your assumptions down. You can use full-blown sentences
if it helps—e.g., “I assume that my users naturally cluster into about
five groups because when I hear the sales rep talk about them, she has
about five different types of people she talks about”—then taking your
words and trying to express them as equations and code.
Remember, it's always good to start simply. There is a trade-off in
modeling between simple and accurate. Simple models may be easier
to interpret and understand. Oftentimes the crude, simple model gets
you 90% of the way there and only takes a few hours to build and fit,
Search WWH ::




Custom Search