Database Reference
In-Depth Information
completely immune to loss. A buggy software release could cause data to be
inadvertently deleted from all nine of those disks. If you have critical data,
make sure to back it up.
Many organizations are understandably reluctant to move their data into the
cloud. It can be difficult to have your data in a place where you don't control
it. If there is data loss, or an outage, all you can do is take your business
elsewhere—there is no one except support staff to yell at and little you can
do to prevent the problem from happening in the future.
That said, the specialized knowledge and operational overhead required to
run your own hardware is large and gets only larger. The advantages of scale
that Google or Amazon has only get bigger as they get better at managing
their datacenters and improving their data warehousing techniques. It
seems likely that the days when most companies run their own IT hardware
are numbered.
Multitenancy and Parallel Execution
When you run a query on MySQL that takes one second, you get to occupy
a single processor core for one second. If you have eight processors, you can
run eight queries at once. Amazon Redshift lets you run a single query in
parallel, but on a fixed number of cores that are all yours for the entire time
you are renting the Redshift instance.
BigQuery operates on a fundamentally different model; your query will run
on thousands of cores in parallel. If you have eight queries, those may all
run on a thousand cores in parallel. The query engine will time-slice the
operations and make progress on some queries while others are waiting
for disk or network I/O. All queries perform a mix of I/O and processing;
waiting for I/O would mean that the processor would sit idle. The Dremel
engine underlying BigQuery can maximize the throughput of the system by
pipelining queries so that as some queries are waiting for I/O operations,
other queries will use the processor.
By allowing your queries to run on all the hardware in a compute cluster,
you can see performance far beyond what you otherwise could see unless
you were willing to pay for a dedicated similarly sized cluster. There just isn't
another way you can process hundreds of gigabytes per second without a
massive amount of hardware. If you want to build it yourself, it would cost
millions of dollars to build and maintain. (Licenses for on-premise solutions
like Netezza generally run in the seven figures.)
Search WWH ::




Custom Search