Database Reference
In-Depth Information
and sorts by the number of views. Here is the query that was used (where
<size> ranged from 1k to 10B):
SELECT language, SUM(views) AS views
FROM [bigquery-samples:wikipedia-benchmark<size>]
WHERE REGEXP_MATCH(title, "G.*o.*o.*g")
GROUP BY language
ORDER BY views DESC
We used a reasonably complex query because we didn't want it to be
something that could be done with a simple filter or index, and we wanted to
make sure the query engine wouldn't be able to use any tricks to skip data.
From the chart, you can see that each order of magnitude increase in the
table size roughly corresponds to an additional 1 second of query time (the
x-axis is plotted in log scale). The final point on the right corresponds to a
10 billion-row table (10,677,046,566 rows, to be exact). Querying this table
scans almost one-half a terabyte of data (446 GB).
This query would be extremely slow on a relational database; there isn't a
way to precompute the regular expression results, so a relational database
would have to do a table scan. In BigQuery, the query over the largest table
took less than 8 seconds, on average, to read half a terabyte and perform 10
billion regular expressions.
We mentioned that there is a size after which you'll start seeing linear
performance. To show this, we added another table, this one with more than
100 billion rows, and re-ran the same query. This query processed 4.4 TB
and took 69 seconds on average, which is about ten times as long as the
query with one tenth the data took. While we didn't go further than the 100
billion row table in our tests, there are a number of customers who routinely
query over tens or hundreds of terabytes at a time.
You should not assume that all queries will run this quickly, however. JOIN
queries or queries that produce a lot of results may run much more slowly.
Some queries will run out of memory; others may hit other limits in the
system. For example, in order to test JOIN performance, we ran the
following self-join against the same tables from the previous query:
SELECT wiki1.year, wiki1.month, wiki1.day,
sum(wiki2.max_views)
Search WWH ::




Custom Search