Hadoop I/O - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

scure (see Example 5-6 ) : turn the Text object into a java.nio.ByteBuffer , then

repeatedly call the bytesToCodePoint() static method on Text with the buffer.

This method extracts the next code point as an int and updates the position in the buffer.

The end of the string is detected when bytesToCodePoint() returns -1.

Example 5-6. Iterating over the characters in a Text object

public class TextIterator {

public static void main ( String [] args ) {

Text t = new Text ( "\u0041\u00DF\u6771\uD801\uDC00" );

ByteBuffer buf = ByteBuffer . wrap ( t . getBytes (), 0 , t . getLength ());

int cp ;

while ( buf . hasRemaining () && ( cp = Text . bytesToCodePoint ( buf )) !=

- 1 ) {

System . out . println ( Integer . toHexString ( cp ));

}

Running the program prints the code points for the four characters in the string:

% hadoop TextIterator

41

df

6771

10400

Mutability

Another difference from String is that Text is mutable (like all Writable imple-

mentations in Hadoop, except NullWritable , which is a singleton). You can reuse a

Text instance by calling one of the set() methods on it. For example:

Text t = new Text ( "hadoop" );

t . set ( "pig" );

assertThat ( t . getLength (), is ( 3 ));

assertThat ( t . getBytes (). length , is ( 3 ));

Search WWH ::

Custom Search

Home