Database Reference
In-Depth Information
// Draw the Google Charts Table
var table =
new google.visualization.Table(document.getElementById('results'));
table.draw(data, {showRowNumber: true});
}
</script>
<body>
<h2>BigQuery + JavaScript Example</h2>
<button id="auth_button" onclick="auth();">Authorize</button>
<hr>
<div><input id="query" type></input></div>
<button id="query_button" onclick="runQuery();">Run Query</button>
<hr>
<div id="results"></div>
</body>
The Future of Analytical Query Engines
The Hadoop project was inspired by the design concepts described by Google's pub-
licly available MapReduce and Google File System research papers. Similarly, ideas
from Google's paper on the technology behind Dremel have formed the basis of sev-
eral open-source projects that aim to speed up query processing over large datasets.
One of these projects is Cloudera's open-source Impala , which aims to provide
fast, interactive queries over data stored in HDFS or HBase. Like Dremel, Impala
uses a columnar data format for structuring data at rest. Impala also skips Hadoop's
MapReduce framework entirely, using an in-memory processing engine to ensure that
queries run fast.
There are other projects that use in-memory data objects to produce very fast query
results over large datasets. The AMPLab Shark project is a data warehouse system
based on Apache Hive. Like Impala, Shark is able to use data from existing HDFS and
HBase sources. Shark uses a completely in-memory model that returns query results
many times faster than Hive in some benchmarks (see Chapter 5, “Using Hadoop,
Hive, and Shark to Ask Questions About Large Datasets,” for more information on
Shark and the related project, Spark).
With so many potential choices for analytical databases, it can become difficult to
decide which tool is the best fit for a particular challenge. One of the potential ben-
efits of Impala and Shark is to improve the speed of queries over data that is already
in a Hadoop environment. In contrast, tools such as Google BigQuery are excellent
choices for projects that lack existing distributed infrastructure.
One of the most interesting things about Google BigQuery is that it is a pointer
toward what other data processing tools may look like in the future. There are many
 
 
Search WWH ::




Custom Search