Database Reference
In-Depth Information
The data problem led to the development of an internal tool called Dremel,
which enabled Google employees to run extremely fast SQL queries on large
datasets. According to Armando Fox, a professor of computer science at
the University of California at Berkley, “If you told me beforehand what
Dremel claims to do, I wouldn't have believed you could build it.” Dremel
has become extremely popular at Google; Google engineers use it millions of
times a day for tasks ranging from building sales dashboards to datacenter
temperature analyses to computing employees' percentile rank of how long
they've worked at the company.
In 2012, at Google I/O, Google publicly launched BigQuery, which allowed
users outside of Google to take advantage of the power and performance
of Dremel. Since then, BigQuery has expanded to become not just a query
engine but a hosted, managed cloud-based structured storage provider. The
following sections describe the main aspects of BigQuery.
SQL Queries over Big Data
The primary function of BigQuery is to enable interactive analytic queries
over Big Data. Although Big Data is a fuzzy term, in practice it just means
“data that is big enough that you have to worry about how big it is.”
Sometimes the data might be small now, but you anticipate it growing
by orders of magnitude later. Sometimes the data might be only a few
megabytes, but your algorithms to process it don't scale well. Or sometimes
you have a million hard drives full of customer data in a basement.
BigQuery tries to tackle Big Data problems by attempting to be
scale-invariant. That is, whether you have a hundred rows in your table or
a hundred billion, the mechanism to work with them should be the same.
Although some variance in execution time is expected between running a
query over a megabyte and running the same query over a terabyte, the
latter shouldn't be a million times slower than the former. If you start using
BigQuery when you are receiving 1,000 customer records a day, you won't
hit a brick wall when you scale up to 1 billion customer records a day.
BigQuery SQL
The lingua franca for data analyses is the SQL query language. Other
systems, such as Hadoop, enable you to write code in your favorite language
to perform analytics, but these languages make it difficult to interactively
ask questions of your data. If you have to write a Java program to query
Search WWH ::




Custom Search