Database Reference
In-Depth Information
has been established, the unknown unknown has been identified. It is now a
known unknown.
Remember that the MapReduce program knows nothing about the data in
the files contained within the Hadoop Distributed File System (HDFS). This
is in stark contrast to the approach taken by relational database systems.
The schema and model imposed by a database designer empowers the
database engine to hold statistical information on the data held inside the
tables. This leads to optimized computing and a significant reduction in
compute resources required.
As Hadoop 2.0 takes hold, it will be interesting to see how the Hadoop
community adopts other engines. We'll have to see how quickly they move
to interactive query with Tez as the underpinning engine, for example.
Furthermore, we'll have to wait and see which engine is adopted by the
community for other types of problems. The battleground seems drawn
for complex event processing, for example. Are you ready for the Flume
versus Storm showdown? One thing is certain: how each project leverages
the compute at its disposal will be a significant factor. It will, however, only
be one factor. Expect ease of programmatic use to be just as important, if
not more important. After all, the “winner” will be the project that offers the
fastest “time to insight.” In the world of Hadoop, insight is king.
Introducing Parallel Data Warehouse (PDW)
Much like the SETI@Home program, Parallel Data Warehouse (PDW) is a
scale-out solution, designed to bring massive computing resources to bear
on a problem. Both PDW and SETI, therefore, are massively parallel
processing (MPP) systems. However, unlike SETI, PDW is designed to
support many forms of analysis, not just for searching for alien life.
PDW is a distributed database technology that supports set-based theory
and the relational model. This is what sets it apart from Hadoop, even
when used with Hive. PDW supports transactions, concurrency, security,
and more. All the things you expect from a database, but backed by
significantly more resources, courtesy of the scale-out architecture.
Currently, PDW can sometimes be seen by the market as being a niche
technology. For starters, Microsoft is not ubiquitously known for building
scale-out database technology products. It has one: PDW. It is known for
building SQL Server, which is a scale-up solution. It is, therefore, not always
Search WWH ::




Custom Search