Database Reference
In-Depth Information
from -ls command. Figure10.14 showsthevalue 3 appearingoneveryrow.
It's located on the left after the file permissions and is followed by the name
of the user (always pdw_user for data written by PDW) responsible for its
creation. Sadly, the block size used isn't available through -ls .
What about replicated tables? Good question. Replicated tables export only
to a single file, so in that sense there is no parallel export of a replicated
table.
Aside from the ExternalExportDistributedOperation operation, it's
basically the same suite of steps taken when an external table is created,
including the collection of the table size and row count statistical
information, courtesy of ExternalStatisticsOperation .
Remember that the external table persists after the CETAS operation. On
the one hand, this is helpful for querying or importing the data later on.
We also know that we have some statistical data associated with the table.
On the other hand, it does prevent you from reexecuting this code (that is,
you have to first drop the external table first). (I have to say that it would
be really nice if you could just execute this code with a DROP_EXISTING
instead, because remembering to drop the external table is a pain.)
There is one possible reason for not including DROP_EXISTING syntax in
the CETAS statement. Simply dropping the external table does nothing to
affect the data in Hadoop. This might lead to unexpected behavior for some
people. If I did decide to drop the external table and simply reexecuted
the CETAS query, I would in effect append the same data to the “table” in
HDFS. Remember that the table in HDFS is merely a folder containing files
of data. A simple export of data, via CETAS, pushes another new set of files
to the same folder within HDFS, which effectively replicates the content. To
first properly clean up the data and remove it from HDFS, I must execute a
command in Hadoop like the one here:
Hadoop fs -rmr /files/HDFS_FactInternetSales
Now the data and the folder have been moved to the trash, as you can see in
Figure 10.15 (IP address and port blacked out).
 
Search WWH ::




Custom Search