Database Reference
In-Depth Information
systems and having to manage them separately? There's an opportunity for
consolidated management and monitoring here for PDW and Hadoop using
a single interface that goes beyond System Center Management Packs. The
simplicity of an appliance experience will be a factor for some. That said,
I am not sure about this one. Those who have more of a database-centric
view of the world might be swayed by such an argument. Having a clean,
consistent deployment of Hadoop without any configuration headaches as
well having only “one throat to choke” when it comes to support may well
prove attractive. However, it would very much depend on the PDW teams'
execution focus and priorities. For this to prove really valuable to customers
would be a lot of work I expect.
Architecture Versus Implementation
One positive aspect of the Polybase is that, architecturally, it makes no
assumptions. Questions concerning data format, data location within
HDFS, and number of data nodes, for example, are answered at runtime.
This means that the Hadoop ecosystem and PDW can happily coexist
without any dependencies.
However, not everything in that architectural vision made the cut when it
came to the initial implementation that formed RTM of PDW 2012. One
example concerns the format of the data in Hadoop. Although it is true
to say that architecturally PDW makes no assumption on the format used
to persist the data in HDFS (the HDFS Bridge provides the necessary
abstraction), it is not true for the RTM Polybase implementation. At this
moment, Polybase supports files in only delimited text format.
That said, I expect to see Polybase supporting other standard file formats
and possibly even custom file formats in the future. As long as we can
impose that structure on the underlying data file, we should beall good. One
example that I could see in the near term is the optimized row columnar
(ORC)fileformat, whichhasbeenthesubjectofmuchdevelopment through
the stinger initiative and is very commonly used by Hive users.
Optimized Big Data Queries
In his keynote presentation “Polybase: What, Why, How” at the PASS
Summit in 2012 ( http://gsl.azurewebsites.net/Portals/0/Users/dewitt/
talks/PolybasePass2012.pptx ) , Dr. DeWitt gave some indications to the
roadmap of Polybase. At this illuminating session, he broke the roadmap
Search WWH ::




Custom Search