Database Reference
In-Depth Information
On the other hand, I think you have a lot of people who have been working in
industry for a long time, who maybe don't have as deep a technical knowledge
in a certain area but have a better idea about how to work in teams and the
industry, as well as what it's like to have a product built on top of their work.
I think, in general, it's very hard to hire people who are a complete package,
who know what to do and how to do it. It's very challenging, so for the hiring
we do, we kind of take bets on a bit of everything, or mixing those together,
or looking at the people who just have excitement and enthusiasm and who
will learn what they don't know. I think probably going forward, this kind of
career is going to be very much one of not being afraid to keep learning a huge
amount. So that kind of aptitude and attitude is really important.
Gutierrez: What specific tools or techniques do you use?
Heineike: We use Python extensively to do computations. Python is a really
nice language, which is relatively easy to learn and quite elegant to work with.
Within the data science work, there's a lot of natural language processing,
which there are toolkits for, and we end up writing quite a bit of our own
code, too, to make sure it does exactly what we want it to do. We worry
about entity extraction, tokenization, and normalization. We worry about
different ways of doing dimensionality reduction. We worry about all kinds of
issues that come up with text.
As for the network work we do, I think the network science space is interest-
ing because it's a much smaller community. Probably fewer people know about
that. There's been a lot of very cool work done over the last 20 years. Graph
theory's been going on for ages, but it's been much more recently that people
have actually had really large network data sets where they've been able to
study the structure of the network and what it means. There's very active
research into how to identify an interesting node in a network, how to find a
community within a network, or what properties of networks are meaningful.
So that's a really fun community to keep interacting with and an important
source of new techniques for us.
One thing that's maybe a little surprising is that we've found some of the closest
parallels to what we do are actually being done in bioinformatics. For example,
Patsy Babbitt at UCSF [University of California, San Francisco] has a lab that's
running analysis of proteins, where they look at large numbers of proteins,
compare them all to each other, use network visualizations to examine them,
and then, through analyzing those proteins at scale, find leads for what science
should be done. Their results allow them to tell other scientists, “Probably one
of these proteins will be doing something interesting,” or “Maybe you should
go and look at this,” or “This protein might tell us about the evolutionary
history of these proteins because it bridges them,” or “This result is actually
very surprising.” They're able to give context to decisions about what science
 
Search WWH ::




Custom Search