Database Reference
In-Depth Information
In the world of data science, being somewhat egoless is really important.
There are some industries where your role allows you to have a big ego, and
that works great for having a big impact. However, in data science, you get
humbled over and over again as you try to do things. You think you've come
up with some brilliant idea, and then you build some model or you build some
metric, even just a simple metric, and you look at the results and you're like,
“Oh, that's terrible.” It's so disappointing. Or the reverse can happen too. You
do some dinky little thing that you hardly put any thought into and it turns
out to have a massively positive impact. And so being somewhat egoless in
this field is important.
If you do have an ego, not only will you be disappointed all the time, but you
also will not be open-minded enough about what the data can say to you,
because you'll be too stuck in your one mindset. You have to be really flex-
ible with your mindset, thinking in tiny, tiny details, and then all the way up to
super-elevated levels about the forces that are going on across the data.
You also have to be open-minded about different techniques. Let's say you
build a regression model and you find that one signal you added to it really
made your regression model successful. Then you try a totally different tech-
nique and that signal was diminished to nothing in the other technique. You
have to be open-minded enough to potentially throw away the signal. You
can't get too attached to that signal because maybe you haven't learned that
the signal you got all excited about wasn't that important. This type of open-
minded, flexible, egoless kind of attitude is important.
Gutierrez: Is egoless attitude something that can be taught?
Smallwood: I think that's something that people can learn, but I don't know
that you can teach it person to person. I think people learn it more from
experiences.
Gutierrez: What should someone starting out try to understand deeply?
Smallwood: I'm a big believer in understanding probability distributions.
Understanding all the different types of distributions and what those
characteristics look like in your data really goes a long way toward under-
standing how to build different types of models. If you only know the
normal distribution, you're not going to be nearly as effective as if you know
Poisson distributions and all the other different kinds of distributions.
Knowing and understanding the distributions really help guide how you
think about modeling things.
Also important is studying a variety of techniques: clustering techniques,
regression techniques, tree-based techniques, and others. Try to get experi-
ence with a gamut of different kinds of techniques, because then over time
you realize there are subtle similarities across them. Sometimes you can learn
about a problem by coming at it from all of those angles and see what comes
 
Search WWH ::




Custom Search