Database Reference
In-Depth Information
6
Building a Data Dashboard
with Google BigQuery
T he media success of Apache Hadoop has been both a blessing and a curse for those
who have an interest in learning about new distributed data technologies. The hype,
although well-deserved, has led to an emphasis on Hadoop as the be-all and end-all
solution for anything related to “big data.”
In reality, applications built to handle large-scale data problems often require a
collection of different technologies, each optimized for a particular use case. Non-
relational databases are useful for managing data at scale, especially when the amount
of data reads greatly exceeds the amount of writes. MapReduce frameworks are useful
for transforming data from one form into another. Along with these use cases, analysts
need to be able to ask questions about the entire set of data, ideally using an iterative
process. Neither MapReduce frameworks nor nonrelational databases are ideal for run-
ning queries over huge datasets quickly. In order to deal with aggregate queries, we
need to look at another potential solution: an analytical database.
This chapter focuses on the concept of analytical databases: in particular, Google
BigQuery, a technology that is very different from, and often complementary to, many
of the other technologies covered in the rest of this topic. BigQuery, which is a hosted
service accessed through an API, allows developers to run queries over large datasets
and obtain results very quickly. We'll take a look at how this technology can be use-
ful for quickly building online data dashboards. We'll also take a look at how to make
practical decisions about when to use tools like BigQuery versus MapReduce for simi-
lar use cases.
Analytical Databases
The process of asking questions about large amounts of data requires many steps. Data
collection itself takes effort; as we've seen in previous chapters, the process of stor-
ing and sharing lots of data can be a great challenge. Transformations, processing,
and normalization must take place before we can start analyzing our datasets. Once
 
 
 
 
Search WWH ::




Custom Search