Database Reference
In-Depth Information
out as the commonalities across all those approaches. As we talked about
earlier, experience with different models, different data sets, and wrinkles with
data sets is hugely important.
Gutierrez: How does someone develop the skill to know how to choose
the right technique to apply to a problem?
Smallwood: It's about trying a lot of the different techniques and learning
some of the common pitfalls that you would come across with the different
techniques. I also think there's also a lot to be said for working in a collabora-
tive environment where you can show your approach to someone else and
hear their feedback on questions like: Why was that a good idea? Why was
that not a good idea?
It's hard to know if you're working in isolation on a model. You would have a
hard time knowing whether you built the right kind of model or not, because
the model will output something regardless if you modeled it correctly or not.
If you're cocky or full of ego, you'll just believe you did the right thing and not
stop to think about whether you actually did the right thing. It comes back
to being egoless and open-minded. So I think it's really hard to learn how to
choose the right technique to apply to a problem without getting feedback
from multiple people in the space who have experience as well. The more
people whom you can get feedback from over time, the better. I really think
that's a great way to progress.
Gutierrez: What advice is helpful for people moving into the field?
Smallwood: I would say to always bite the bullet with regard to understand-
ing the basics of the data first before you do anything else, even though it's
not sexy and not as fun. In other words, put effort into understanding how the
data is captured, understand exactly how each data field is defined, and under-
stand when data is missing. If the data is missing, does that mean something in
and of itself? Is it missing only in certain situations? These little, teeny nuanced
data gotchas will really get you. They really will.
You can use the most sophisticated algorithm under the sun, but it's the same
old junk-in-junk-out thing. You cannot turn a blind eye to the raw data, no
matter how excited you are to get to the fun part of the modeling. Dot your
i 's, cross your t i's, and check everything you can about the underlying data
before you go down the path of developing a model.
Another thing I've learned over time is that a mix of algorithms is almost
always better than one single algorithm in the context of a system, because dif-
ferent techniques exploit different aspects of the patterns in the data, especially
in complex large data sets. So while you can take one particular algorithm and
iterate and iterate to make it better, I have almost always seen that a combina-
tion of algorithms tends to do better than just one algorithm.
 
Search WWH ::




Custom Search