Building a Data Dashboard with Google BigQuery - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

attributes—for example, a person field might contain a list of records that define a

playlist.

Each time a query is run, BigQuery creates a new table to hold the results. If you

explicitly name the table, it will persist indefinitely (otherwise, the table is considered

“temporary” and is only stored for a short time).

In Chapter 4, “Strategies for Dealing with Data Silos,” we took a look at strategies for

breaking down data silos. One of these strategies is to help empower employees to ask

their own questions about your organization's data. Unless you are very lucky, it's likely

that not everyone in your organization is a data scientist; therefore, it is sometimes nec-

essary to provide or build tools to help less technical users access analytical information.

When datasets become very large, common analysis tools such as R and Excel may not

be able to easily handle the job, forcing your IT department to spend time either pro-

cessing data or managing distributed data processing infrastructure. Every organization

is different; big or small, each has specific data-reporting needs. This can sometimes

make generalized organizational analytics tools hard products to provide; inevitably,

many customizations are required. Furthermore, organizations rely on a variety of soft-

ware for data reporting, including Excel, R, and Tableau, often in conjunction with

custom-written code. When trying to evaluate what to build or buy (see Chapter 13,

“When to Build, When to Buy, When to Outsource”), it pays to understand which

tools members of your organization are already familiar with; however, a common pat-

tern in large organizations is that there is often a need for custom data-reporting tools.

The Dremel-based infrastructure that powers BigQuery is something that data

developers will never access directly; every BigQuery feature is made available via a

programmatic interface. Although a BigQuery command-line tool (called bq), and a

BigQuery Web UI are available, both are applications that make calls to the BigQuery

API service. Providing data analytics as a service enables a kind of development f lex-

ibility not found in most distributed data software. Although hosted and completely

managed Hadoop services are becoming more available, an out-of-the-box Hadoop

installation requires your organization to manage hardware and software updates

and hire experienced staff to keep it all running. Hosted systems free up developers

to concentrate on building applications without having to spend quite so much time

managing infrastructure. There are, of course, trade-offs and challenges that arise

when incorporating a data analysis API into your application-development process. To

illustrate the process of developing our own large-scale data analysis tool, we will take

a look at how to build a simple data dashboard using the BigQuery API.

When it comes to processing data in batches, Hadoop is an excellent choice; how-

ever, like many other distributed systems, it inherently requires some amount of infra-

structure management. Hadoop's APIs are designed for building MapReduce-based

applications, but connecting these applications to a browser-based client requires

another layer of framework code. In contrast, the BigQuery API makes it a good

Search WWH ::

Custom Search

Home