Database Reference
In-Depth Information
repeatedly call the
bytesToCodePoint()
static method on
Text
with the buffer.
This method extracts the next code point as an
int
and updates the position in the buffer.
The end of the string is detected when
bytesToCodePoint()
returns -1.
Example 5-6. Iterating over the characters in a Text object
public class
TextIterator
{
public static
void
main
(
String
[]
args
) {
Text t
=
new
Text
(
"\u0041\u00DF\u6771\uD801\uDC00"
);
ByteBuffer buf
=
ByteBuffer
.
wrap
(
t
.
getBytes
(),
0
,
t
.
getLength
());
int
cp
;
while
(
buf
.
hasRemaining
() && (
cp
=
Text
.
bytesToCodePoint
(
buf
)) !=
-
1
) {
System
.
out
.
println
(
Integer
.
toHexString
(
cp
));
}
}
}
Running the program prints the code points for the four characters in the string:
%
hadoop TextIterator
41
df
6771
10400
Mutability
Another difference from
String
is that
Text
is mutable (like all
Writable
imple-
mentations in Hadoop, except
NullWritable
, which is a singleton). You can reuse a
Text
instance by calling one of the
set()
methods on it. For example:
Text t
=
new
Text
(
"hadoop"
);
t
.
set
(
"pig"
);
assertThat
(
t
.
getLength
(),
is
(
3
));
assertThat
(
t
.
getBytes
().
length
,
is
(
3
));