Database Reference
In-Depth Information
Figure 10.11 More complex Polybase query against PDW and Hadoop
data
If you look at Figure 10.11 , you will notice that this changes the plan and
PDW uses a different DMS operation to source the data. Instead of the
ExternalRoundRobinMove PDW uses the
ExternalShuffleOperation . We require this operation as we are going
to be performing joins. We will look at this new operation in more detail
next as we investigate how PDW imports the data into its environment. It
shouldhopefullybeapparentthatevenwhenPDWisonlyreadingdatafrom
the query perspective, it is, in fact, importing data first into PDW and then
selecting from that imported data set. It therefore makes sense to look at
the ExternalShuffleOperation more from both the query and import
perspectives to understand the difference in the respective plans.
Importing Data with CTAS
PDW enables the parallel import of data through its CREATE TABLE AS
SELECT (CTAS) statement. This is akin to a SELECT INTO in an SMP
environment, butyouhave some added flexibility in terms oftable geometry
(distributed or replicated) and indexing.
To import data from HDFS, all you need to do is reference an external table
in the SELECT part of the CTAS statement. This external table could have
 
Search WWH ::




Custom Search