Database Reference
In-Depth Information
learning them, they are almost independent fields. For example, there is the
communication aspect: How do you communicate your findings effectively
and how do you persuade people to take them on? A great deal of that has
to do with effective writing and effective data visualization. That is one very
important skill that I think is just going to continue to improve with visualiza-
tion software.
There is also the track of machine learning techniques, which is constantly
evolving. There have been a lot of techniques that have become more widely
accepted over the years. That said, every day people are coming up with new
and exciting tweaks or innovations on those techniques. Especially on more
complicated problems, such as text analytics, the techniques are continu-
ously evolving—so it is important to keep up-to-date with the most advanced
machine learning techniques that can be applied and what implementations
are better than others.
There is also the data management track, which deals with the entire back end
that goes into storing and later retrieving the data. I did not fully comprehend
this track when I was dealing with smaller sets of data, such as in baseball or
in the actuarial fields. This was because the amount of data that we used was
relatively manageable on one machine or a couple of machines. But now as we
are starting to deal with bigger and bigger sets of data, having to deal with all
the back end of storing huge quantities of data, being able to access the data
later, and then being able to run algorithms on it is definitely a big challenge.
I think the parallelization of traditional machine learning algorithms is still
something we are struggling with, especially with the most efficient ways to
do it. I'm excited to see how it continues to develop.
Gutierrez: How do you learn new skills in the various tracks?
Hu: I think both speaking at and attending conferences is a fantastic way to
keep up-to-date. I chaired one of the days at the Predictive Analytics Innovation
Summit Chicago 2013 conference, and I got a chance to chat briefly with each
of the speakers. Hearing what they have learned and what they are on the
forefront of is always very exciting because you cannot keep track of every-
thing at once. There is not enough time for that. I wrote down so many things
that I learned and heard about from other speakers that I am very excited to
incorporate into our workflow.
Outside of conferences, meetups, books, and discussion groups are good ways
to stay up-to-date as well. There are a lot of meetups in New York City. It is
hard to keep track of what is going on in all of them, but anytime there is an
interesting talk or book that I hear about, I definitely try to attend or read it.
There are a lot of discussion groups that lead to a lot of free-flowing
discussions. I think the New York City data science community is pretty tight.
For example, I was at a data dive for the DataKind organization this past
weekend and I ran into a lot of people that I have run into in the last
 
Search WWH ::




Custom Search