Database Reference
In-Depth Information
multithreaded computing, and MapReduce patterns. Being able to do that in the
smallest amount of code possible is essential. Our code base is too big. It needs
to get smaller. I am very excited about that.
Gutierrez: Outside of programming languages, what other tools or pro-
cesses are you excited about?
Lenaghan: I am very excited about real-time processing and real-time com-
putation systems like Storm, even though Storm is not exactly nascent. Real-
time processing and computation affects us in a few places of our data/product
pipeline. It affects us at the beginning of the pipeline where we are ingesting,
processing, and analyzing ad-request logs, as well as writing the results to
Amazon's S3 service. It also affects us at the end of the pipeline, where we do
a lot of batch processing to build audiences and serve ads.
This is especially relevant in the environment in which we serve ads because
it is a high-QPS [Queries Per Second], low-latency environment. I would like
to move a lot of our batch processing to more real-time, on the back end.
So that would mean we can find problems much earlier. We are now moving
towards that.
Gutierrez: What does the future look like to you?
Lenaghan: A welcome trend we have seen more of in the last year and a
half has been the consolidation of programming libraries and packages. The
big push towards further consolidation of—and abstraction away from—
packages and libraries is fantastic. It definitely allows more people to do inter-
esting work without having to spend years and years in a PhD program to
understand which algorithms you can apply a stochastic gradient descent to
and with what convergence.
Along similar lines, people and startups are starting to try to democratize data
science and analytics. I am all in favor of this move, as well. While it will make our
life easier having these better tools, it will never obviate the need for somebody
to make and use these tools. You will always need data scientists, even with
these consolidated and democratized tools. Just because people have access to
statistical tools like R, Stata, SAS, and others, it does not mean that everybody
can all of a sudden run statistical analyses correctly. While you can run the
statistical analyses more easily with these tools, you still have to know whether
you are running the right thing or even interpreting the analyses correctly.
Real-time is also very exciting to me. As we discussed earlier, a lot of our
business is built on the tons and tons of data exhaust of mobile phones and
devices, so being able to make it actionable as quickly as possible really is
very much the future. We are looking at a lot of interesting technologies that
are making that possible. Storm, which I already mentioned, and tools that
make it much easier to shard databases, so we can have horizontally scalable
databases that we can actually run relational queries against. I am very excited
 
Search WWH ::




Custom Search