Database Reference
In-Depth Information
FROM [bigquery-samples:wikipedia-benchmark<size>] as
wiki1
JOIN EACH (
SELECT title, MAX(views) as max_views
FROM [bigquery-samples:wikipedia-benchmark<size>]
GROUP EACH BY title
) AS wiki2
ON wiki1.title = wiki2.title
GROUP EACH BY wiki1.year, wiki1.month, wiki1.day
The timing of these queries can be seen as the Self Join line in Figure
2.1 . They take longer than the simple queries, but still increase slowly until
about 100 million rows. Increasing from 100 million to a billion rows takes
about double the time—still faster than linear but a significant slowdown.
You can see, however, that the line ends at the 1 billion row point; this
is because the query against the larger 10 billion row table failed with an
Insufficient Resources error. Chapter 9 gives much more information about
which queries will work well, which won't, and why. That chapter also
provides some pointers for what to do when you hit errors like this one.
Cloud Storage System
In addition to being a way to run queries over your data, BigQuery is also
a place to store your structured data in the cloud. Although this aspect of
BigQuery grew out of necessity—if your data didn't live in Google's cloud
then, you couldn't query it—it has grown into a significant and useful
subsystem.
Your data is replicated to multiple geographically distinct locations for
improved availability and durability. If a Google datacenter in Atlanta gets
shut down because of a hurricane, that shouldn't cause a hiccup in your
ability to access your data. Data is also replicated within a cluster, so your
data should be virtually immune to data loss due to hardware failure. Of
course, the BigQuery service may not have perfect uptime, and if your data
is important, you should make sure it is backed up. You can back up your
tables by exporting them to Google Cloud Storage for safekeeping, or you
can run a table copy job in BigQuery to save a snapshot.
Search WWH ::




Custom Search