Database Reference
In-Depth Information
Dremel supports an alternative query mode called materialize where the
shards write out the results directly and in parallel. This, of course, means
that the entire query must be parallelizable. Top-level ORDER BY operations
may be slow (because a sort must be computed globally), and queries may
end up taking substantially longer to complete because writing data to
permanent storage is slower than returning it over the network. Despite
these limitations, materialize queries are extremely powerful—they can
write out hundreds of gigabytes in only a few seconds.
For most queries, however, the user who issued them cares only about the
first few results. In the BigQuery Web UI, for example, few people fetch
the second page of results, let alone the last one. If you run a SELECT *
query to look at the first few rows of a table and you write out 10 TB and
wait 5 minutes, you'll be wasting a lot of your time and Google's processing
resources.
To prevent you from accidentally generating massive results, materialize
queries are not the default. If you want a massive result set, you need to set
the flag allowLargeResults in the query and specify a destination table.
The destination table is required because of how large results tend to be
used; because they are too large to be processed by a human, they are usually
going to be processed by a machine. Often you want to run another query
against your large result table, or maybe you want to export the results to
Google Cloud Storage so that you can process them on an external system.
In these cases, it is useful to have a table with a name to refer to as the source
of those subsequent operations.
Finally, writing out large results is expensive. Although BigQuery doesn't
charge you for normal query results, it does charge you for the storage used
for anything with a destination table. If you write out a 1 TB table, you have
pay the cost of storing that table.
Architecture Comparisons
To understand BigQuery, it can be helpful to put it in context with other
architectures you may already be familiar with. If you're coming from a
background where you've used a lot of relational databases, or you use
Hadoop to process your data, you might have certain assumptions about
what is going to be fast, what is going to be slow, or what is going to be
impossible. BigQuery's architecture creates a different set of things that are
Search WWH ::




Custom Search