Building a Data Dashboard with Google BigQuery - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

// Draw the Google Charts Table

var table =

new google.visualization.Table(document.getElementById('results'));

table.draw(data, {showRowNumber: true});

}

</script>

<body>

<h2>BigQuery + JavaScript Example</h2>

<button id="auth_button" onclick="auth();">Authorize</button>

<hr>

<button id="query_button" onclick="runQuery();">Run Query</button>

<hr>

</body>

The Hadoop project was inspired by the design concepts described by Google's pub-

licly available MapReduce and Google File System research papers. Similarly, ideas

from Google's paper on the technology behind Dremel have formed the basis of sev-

eral open-source projects that aim to speed up query processing over large datasets.

One of these projects is Cloudera's open-source Impala , which aims to provide

fast, interactive queries over data stored in HDFS or HBase. Like Dremel, Impala

uses a columnar data format for structuring data at rest. Impala also skips Hadoop's

MapReduce framework entirely, using an in-memory processing engine to ensure that

queries run fast.

There are other projects that use in-memory data objects to produce very fast query

results over large datasets. The AMPLab Shark project is a data warehouse system

based on Apache Hive. Like Impala, Shark is able to use data from existing HDFS and

HBase sources. Shark uses a completely in-memory model that returns query results

many times faster than Hive in some benchmarks (see Chapter 5, “Using Hadoop,

Hive, and Shark to Ask Questions About Large Datasets,” for more information on

Shark and the related project, Spark).

With so many potential choices for analytical databases, it can become difficult to

decide which tool is the best fit for a particular challenge. One of the potential ben-

efits of Impala and Shark is to improve the speed of queries over data that is already

in a Hadoop environment. In contrast, tools such as Google BigQuery are excellent

choices for projects that lack existing distributed infrastructure.

One of the most interesting things about Google BigQuery is that it is a pointer

toward what other data processing tools may look like in the future. There are many

Search WWH ::

Custom Search

Home