Databases Reference
In-Depth Information
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:28 part-00000
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:28 part-00001
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:29 part-00002
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:29 part-00003
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:29 part-00004
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:29 part-00005
-rwxrwxrwx 1 Administrator None 0 Jul 31 06:29 part-00006
We have a set of files prefixed with chrono and another set of files prefixed with geo .
Note that the program created the default output files part-* even though it wrote
nothing explicitly. It's entirely possible to write to these files using the original
OutputCollector passed in through the map() method. In fact, if this was not a
map-only program, records written to the original OutputCollector , and only those
records, would be passed to the reducers for processing.
One of the trade-offs with MultipleOutputs is that it has a rigid naming structure
compared to MultipleOutputFormat . Your output collector's name cannot be part ,
because that's already in use for the default. The output filename is also strictly defined
as the output collector's name followed by m or r depending on whether the output was
collected at the mapper or the reducer. It's finally followed by a partition number.
head output/chrono-m-00000
"PATENT","GYEAR","GDATE"
3070801,1963,1096
3070802,1963,1096
3070803,1963,1096
3070804,1963,1096
3070805,1963,1096
3070806,1963,1096
3070807,1963,1096
3070808,1963,1096
3070809,1963,1096
head output/geo-m-00000
"PATENT","COUNTRY","POSTATE"
3070801,"BE",""
3070802,"US","TX"
3070803,"US","IL"
3070804,"US","OH"
3070805,"US","CA"
3070806,"US","PA"
3070807,"US","OH"
3070808,"US","IA"
3070809,"US","AZ"
Looking at the output files, we see that we've successfully projected out the columns on
the patent data set into distinct files.
7.4
Inputting from
and outputting to a database
Although Hadoop is useful for processing large data, relational databases remain
the workhorse of many data processing applications. Oftentimes Hadoop will need to
interface with databases.
 
Search WWH ::




Custom Search