Database Reference
In-Depth Information
The longer you wait to figure out whether what you are doing is correct or
not or helpful or not, the more time you could potentially be doing the wrong
thing.
Gutierrez: So having data, a success metric, and a model is not enough—you
also need impactful feedback.
Hu: Precisely. Again, I think that impactful feedback is a very overlooked
thing and is something I have really learned to look for over time. You really
need that feedback from whomever you are providing insights for or whom
you are interacting with, because it is easy to get burrowed in your cave.
It is easy to move towards providing this beautiful mathematical insight or
applying this really sophisticated algorithm, and then later to realize you're
providing something that does not have any real impact. This is especially
important because a lot of the machine learning techniques do not lend
themselves to interpretability. I would say a large percentage of the time, you
would rather have an easily interpretable algorithm and results than a slightly
more accurate one.
Gutierrez: What technologies or techniques do you see as the future of data
science?
Hu: So definitely, the number one thing is natural language processing. Everyone
thinks it is interesting, everyone cares about it, and everyone thinks there is
a lot of potential there. Yet, no one has really done it effectively. I believe in it
and its future. I work on NLP projects whenever I can, both at work and in my
free time. When—not if—we solve the problem of understanding sentiment
and being able to extract meaning from large bodies of text, I think that will
really change the reach of data science in basically any field.
Gutierrez: What nonwork data sets have you worked with recently?
Hu: As I mentioned before, one of the big nonwork projects that I have been
working on recently is DataKind. Their mission is essentially “data science for
good,” so they connect data technology people with NGOs that have good
data or interesting data, and then we work together to provide insights for
them. It is a great mission. One of the projects that I worked on recently was
with a non-profit that focuses on trying to catch child predators with data
from online message boards. It is a great cause and this project involved a lot
of text processing. What is remarkable is that there is just so little done with
this data from NGOs at this point, that DataKind can really help.
What is powerful is that we were able to provide simple insights like, “These
are the topics that people are discussing in your data and this is how you
can identify whom you can target from that.” We can then provide this code
to the organization and provide them an easy way for them to run this over
time. So this idea of being able to apply a clustering algorithm fairly quickly—
something that you can do in a couple of hours and that is reproducible—
is very powerful. The lesson is that relatively quick simple things can really
 
Search WWH ::




Custom Search