Database Reference
In-Depth Information
Gutierrez: What tools have you chosen to use at Planet OS, and how do
they compare to the tools you were using at Skype?
Karpištšenko: The scale of the problems and requirements are very differ-
ent because the Industrial Internet and Consumer Internet are quite distinct
things. For more information about the Industrial Internet, General Electric
is talking about it a lot these days. At Skype, I was focused very much on the
Consumer Internet. The tools we used were things like Greenplum, R, Python,
and network analysis tools such as Gephi. We chose them depending on what
problem we were working on. If it was fraud detection or marketing campaign
optimization, we would use different tools compared to when it was traffic-
shape detection on the network, social network analysis, or social network
recommendations.
As I moved into Planet OS, the key to our success, in my mind, was our
productivity and our ability to iterate fast. I knew from past experience that
Python had a very strong growing community and a lot of data scientists. I had
worked with Java quite a bit, so I was certain that Java was not the right choice
just yet. So we decided to use Python. And as we worked, we had to look for
different storage solutions, data warehousing, and analysis solutions. These
we picked based on what was going to scale well, as well as what was going
to perform well for us in the next six months and still be a viable solution in
two years. That said, some portion of my tool choices have also been explor-
atory to see new promising methods and to determine if there is a problem
set where you can apply these methods to see whether it's applicable. I find
that this applied way of working with methods and tools is the best way to
learn. I've done it this way for so long and I've ended up working with so many
different technologies that it would take a few pages to list all the tools I've
seen and used. Of course, these come and go, so they're highly situational and
context-dependent.
Gutierrez: What lessons have you learned as you've gone through this tool-
and-method exploration?
Karpištšenko: The core lesson is that there is no silver bullet. Initially, when
you start in the data science or software business, you think that a new lan-
guage or a new framework or a method will solve everything. For example,
when I was looking into how to do better feature selection, feature engineer-
ing, and better model building without my team or me being too involved in
it, I looked for automated ways of doing that. Brute-force analytics looked like
the best way to do it. Well, it works fine, but you need a considerable amount
of computing resources for that to work, and unfortunately, you don't always
have enough resources available. The resource and time constraints that you
have will tell you which tool to use.
 
Search WWH ::




Custom Search