Database Reference
In-Depth Information
SUM(head_object) object_heads,
SUM(get_object) object_gets
FROM (
SELECT
USEC_TO_TIMESTAMP(time_micros) time,
IF(cs_operation = 'GET_Bucket', 1, 0) get_bucket,
IF(cs_operation = 'PUT_Bucket', 1, 0) put_bucket,
IF(cs_operation = 'HEAD_Object', 1, 0) head_object,
IF(cs_operation = 'GET_Object', 1, 0) get_object
FROM [bigquery-e2e:ch14.gcs_usage])
WHERE
time >= '2014-02-19 00:00:00' AND
time < '2014-02-20 00:00:00'
GROUP BY 1
ORDER BY 1
This query computes counts of different types of GCS operations broken
down by the hour for a single day. The USEC_TO_TIMESTAMP conversion is
required because we used the reference schema that defines time_micros
as an integer field.
Keep in mind that your data has been copied into BigQuery, so you have two
copies of your logs. Depending on how you intend to use this data, you may
want to retain only a single copy. Because your GCS logs are regular files in
GCS, you are charged for the storage they consume.
Summary
This chapter presented a handful of Google products that make large
volumes of data more useful to their customers by exposing it through
BigQuery. BigQuery is useful in this context because it provides a way for
customers to operate on their data rather than simply expose it as bytes they
must download and process before extracting value from it. Although the
current list of products enabling this access is a small fraction of Google
services' universe, over time more products will follow suit. Hopefully, the
manner in which the data is exposed will also become more uniform across
Google products. For now, if you are a user of one of these products, you can
use the recipes described in this chapter to get more mileage from them.
Search WWH ::




Custom Search