Database Reference
In-Depth Information
However, before we delve a little deeper into this topic, there is another
tidbit of information from the same whitepaper: “There are no plans
currently to allow customers to co-locate a SQL Server instance and a
Hadoop Datanode.”
This makes quite a bit of sense to me. The virtual machines used by PDW
for the compute nodes are presized to consume almost all the resources
available. Therefore, there isn't much compute processing room for the
data nodes. Furthermore, you probably wouldn't want any unpredictable
resource contention, which might be another reason to keep the two
workloads “disjointed.”
One possible advantage of having Hadoop “inside” of PDW is that you might
be able to have relational database management system (RDBMS)-like
security over the data inside HDFS. As mentioned previously, security isn't
exactly Hadoop's forte. However, I would imagine that I can only grant
access privileges to people I want to be able to engage on the data, for
example.
Another reason might be flexibility. Remember that PDW's configuration is
almost completely virtual. The workload definition, for example, is in the
compute nodes (a virtual machine). Would it be not be possible therefore
to decide that I would like to have four compute PDW nodes with four data
nodesonedayandsixcomputeandfourdatathenext?Granted,theamount
of data committed to any data node(s) would represent an obstacle (data
would need to be rebalanced between the remaining compute nodes), but it
would in theory at least be possible.
Earlier in the chapter we discussed having separate Hadoop environments
for various user personas. A partitioned appliance could be a great location
for our power user and consumer community grade data sets, giving them
dedicated compute resources that would be very close to the enterprise data
held in PDW. This workload isolation would also benefit the data scientists
as they wouldn't need to worry about interference from corporate users as
much. Naturally, Polybase would be the perfect fit for migrating the data
betweentheseBronze,Silver,andGoldenvironments.Beingabletoleverage
PDW's ultra-low latency Infiniband network would also come in extremely
handy for those challenging ad-hoc queries.
Finally, manageability: from an operational perspective, will a
single-managed appliance be more attractive than having two distributed
Search WWH ::




Custom Search