Databases Reference
In-Depth Information
4.3.4
Case study: storing analytical information in Bigtable
In Google's Bigtable paper, they described how Bigtable is used to store website usage
information in Google Analytics. The Google Analytics service allows you to track
who's visiting your website. Every time a user clicks on a web page, the hit is stored in a
single row-column entry that has the URL and a timestamp as the row ID . The row ID s
are constructed so that all page hits for a specific user session are together.
As you can guess, viewing a detailed log of all the individual hits on your site would
be a long process. Google Analytics makes it simple by summarizing the data at regu-
lar intervals (such as once a day) and creating reports that allow you to see the total
number of visits and most popular pages that were requested on any given day.
Google Analytics is a good example of a large database that scales in a linear fash-
ion as the number of users increases. As each transaction occurs, new hit data is imme-
diately added to the tables even if a report is running. The data in Google Analytics,
like other logging-type applications, is generally written once and never updated. This
means that once the data is extracted and summarized, the original data is com-
pressed and put into an intermediate store until archived.
This pattern of storing write-once data is the same pattern we discussed in the data
warehouse and business intelligence section in chapter 3. In that section, we looked at
sales fact tables and how business intelligence/data warehouse ( BI / DW ) problems can
be cost-effectively solved by Bigtable implementations. Once the data from event logs
is summarized, tools like pivot tables can use the aggregated data. The events can be
web hits, sales transactions, or any type of event-monitoring system. The last step will
be to use an external tool to generate the summary reports.
In the case of using HBase as a Bigtable store, you'll need to store the results in the
Hadoop distributed filesystem ( HDFS ) and use a reporting tool such as Hadoop Hive
to generate the summary reports. Hadoop Hive has a query language that looks simi-
lar to SQL in many ways, but it also requires you to write a MapReduce function to
move data into and out of HBase.
4.3.5
Case study: Google Maps stores geographic information
in Bigtable
Another example of using Bigtable to store large amounts of information is in the
area of geographic information systems ( GIS ). GIS systems, like Google Maps, store
geographic points on Earth, the moon, or other planets by identifying each location
using its longitude and latitude coordinates. The system allows users to travel around
the globe and zoom into and out of places using a 3D-like graphical interface.
When viewing the satellite maps, you can then choose to display the map layers or
points of interest within a specific region of a map. For example, if you post vacation
photos from your trip to the Grand Canyon on the web, you can identify each photo's
location. Later, when your neighbor, who heard about your awesome vacation, is
searching for images of the Grand Canyon, they'll see your photo as well as other pho-
tos with the same general location.
Search WWH ::




Custom Search