Amy Heineike - Data Scientists at Work

Database Reference

In-Depth Information

On the other hand, I think you have a lot of people who have been working in

industry for a long time, who maybe don't have as deep a technical knowledge

in a certain area but have a better idea about how to work in teams and the

industry, as well as what it's like to have a product built on top of their work.

I think, in general, it's very hard to hire people who are a complete package,

who know what to do and how to do it. It's very challenging, so for the hiring

we do, we kind of take bets on a bit of everything, or mixing those together,

or looking at the people who just have excitement and enthusiasm and who

will learn what they don't know. I think probably going forward, this kind of

career is going to be very much one of not being afraid to keep learning a huge

amount. So that kind of aptitude and attitude is really important.

Gutierrez: What specific tools or techniques do you use?

Heineike: We use Python extensively to do computations. Python is a really

nice language, which is relatively easy to learn and quite elegant to work with.

Within the data science work, there's a lot of natural language processing,

which there are toolkits for, and we end up writing quite a bit of our own

code, too, to make sure it does exactly what we want it to do. We worry

about entity extraction, tokenization, and normalization. We worry about

different ways of doing dimensionality reduction. We worry about all kinds of

issues that come up with text.

As for the network work we do, I think the network science space is interest-

ing because it's a much smaller community. Probably fewer people know about

that. There's been a lot of very cool work done over the last 20 years. Graph

theory's been going on for ages, but it's been much more recently that people

have actually had really large network data sets where they've been able to

study the structure of the network and what it means. There's very active

research into how to identify an interesting node in a network, how to find a

community within a network, or what properties of networks are meaningful.

So that's a really fun community to keep interacting with and an important

source of new techniques for us.

One thing that's maybe a little surprising is that we've found some of the closest

parallels to what we do are actually being done in bioinformatics. For example,

Patsy Babbitt at UCSF [University of California, San Francisco] has a lab that's

running analysis of proteins, where they look at large numbers of proteins,

compare them all to each other, use network visualizations to examine them,

and then, through analyzing those proteins at scale, find leads for what science

should be done. Their results allow them to tell other scientists, “Probably one

of these proteins will be doing something interesting,” or “Maybe you should

go and look at this,” or “This protein might tell us about the evolutionary

history of these proteins because it bridges them,” or “This result is actually

very surprising.” They're able to give context to decisions about what science

Search WWH ::

Custom Search

Home