Database Reference
In-Depth Information
• Computation is not expressible in SQL.
• It is too slow or expensive when expressed as a SQL query.
• Requires specialized functions that are not supported
If you extract your data from the service, you are then free to run your
computation using a framework that supports the transformation you
require. The MapReduce family of data processing frameworks is especially
well suited to transformations of large datasets. Hadoop is the most popular
implementation of this computation model, but it is not the only one. The
AppEngine platform also supports the MapReduce model of computation,
which can be used to transform BigQuery tables. This section covers using
this framework to augment BigQuery.
Before diving into the nuts and bolts of using AppEngine MapReduce, it
is useful to have a well-defined use case in mind. Compared to running a
query within BigQuery, the AppEngine framework is going to appear rather
cumbersome. This is to be expected because it is a more general-purpose
computing framework. However, it does warrant a motivating example that
justifies the additional complexity.
In our sample application in Chapter 8, “Putting It Together”, we had
captured logs from the phone that included geolocation information
describing the position of the phone. The logs record the latitude, longitude,
and ZIP (postal) code that most closely correspond to the coordinates. ZIP
codes prove handy for joining log records with other geographic
information.
Joining tables based on geographic information using latitude and longitude
is actually challenging in (BigQuery) SQL because a simple equality join is
not feasible. Equality joins work when the exact value in one table matches
the exact value in another table. If you have latitude and longitude points,
you rarely will have two points in different tables that match exactly, and
typically you're more interested in proximity than exact overlap.
However, if you can bucket data into sufficiently small regions, such as a
ZIP code, then you can use a straightforward equality join. You can easily
imagine that in the first iteration of our application we neglected to include
the ZIP code in our log records; however, we are going to use this to drive
our examples for this section.
Search WWH ::




Custom Search