Database Reference
In-Depth Information
scure (see Example 5-6 ) : turn the Text object into a java.nio.ByteBuffer , then
repeatedly call the bytesToCodePoint() static method on Text with the buffer.
This method extracts the next code point as an int and updates the position in the buffer.
The end of the string is detected when bytesToCodePoint() returns -1.
Example 5-6. Iterating over the characters in a Text object
public class TextIterator {
public static void main ( String [] args ) {
Text t = new Text ( "\u0041\u00DF\u6771\uD801\uDC00" );
ByteBuffer buf = ByteBuffer . wrap ( t . getBytes (), 0 , t . getLength ());
int cp ;
while ( buf . hasRemaining () && ( cp = Text . bytesToCodePoint ( buf )) !=
- 1 ) {
System . out . println ( Integer . toHexString ( cp ));
}
}
}
Running the program prints the code points for the four characters in the string:
% hadoop TextIterator
41
df
6771
10400
Mutability
Another difference from String is that Text is mutable (like all Writable imple-
mentations in Hadoop, except NullWritable , which is a singleton). You can reuse a
Text instance by calling one of the set() methods on it. For example:
Text t = new Text ( "hadoop" );
t . set ( "pig" );
assertThat ( t . getLength (), is ( 3 ));
assertThat ( t . getBytes (). length , is ( 3 ));
Search WWH ::




Custom Search