Database Reference
In-Depth Information
attributes—for example, a person field might contain a list of records that define a
playlist.
Each time a query is run, BigQuery creates a new table to hold the results. If you
explicitly name the table, it will persist indefinitely (otherwise, the table is considered
“temporary” and is only stored for a short time).
Building a Custom Big Data Dashboard
In Chapter 4, “Strategies for Dealing with Data Silos,” we took a look at strategies for
breaking down data silos. One of these strategies is to help empower employees to ask
their own questions about your organization's data. Unless you are very lucky, it's likely
that not everyone in your organization is a data scientist; therefore, it is sometimes nec-
essary to provide or build tools to help less technical users access analytical information.
When datasets become very large, common analysis tools such as R and Excel may not
be able to easily handle the job, forcing your IT department to spend time either pro-
cessing data or managing distributed data processing infrastructure. Every organization
is different; big or small, each has specific data-reporting needs. This can sometimes
make generalized organizational analytics tools hard products to provide; inevitably,
many customizations are required. Furthermore, organizations rely on a variety of soft-
ware for data reporting, including Excel, R, and Tableau, often in conjunction with
custom-written code. When trying to evaluate what to build or buy (see Chapter 13,
“When to Build, When to Buy, When to Outsource”), it pays to understand which
tools members of your organization are already familiar with; however, a common pat-
tern in large organizations is that there is often a need for custom data-reporting tools.
The Dremel-based infrastructure that powers BigQuery is something that data
developers will never access directly; every BigQuery feature is made available via a
programmatic interface. Although a BigQuery command-line tool (called bq), and a
BigQuery Web UI are available, both are applications that make calls to the BigQuery
API service. Providing data analytics as a service enables a kind of development f lex-
ibility not found in most distributed data software. Although hosted and completely
managed Hadoop services are becoming more available, an out-of-the-box Hadoop
installation requires your organization to manage hardware and software updates
and hire experienced staff to keep it all running. Hosted systems free up developers
to concentrate on building applications without having to spend quite so much time
managing infrastructure. There are, of course, trade-offs and challenges that arise
when incorporating a data analysis API into your application-development process. To
illustrate the process of developing our own large-scale data analysis tool, we will take
a look at how to build a simple data dashboard using the BigQuery API.
When it comes to processing data in batches, Hadoop is an excellent choice; how-
ever, like many other distributed systems, it inherently requires some amount of infra-
structure management. Hadoop's APIs are designed for building MapReduce-based
applications, but connecting these applications to a browser-based client requires
another layer of framework code. In contrast, the BigQuery API makes it a good
 
 
Search WWH ::




Custom Search