Java Reference
In-Depth Information
if (legacySJIS.length == toSJIS.length) {
for (int x=0; x< legacySJIS.length; x++) {
if(legacySJIS[x] != toSJIS[x]) break;
}
same = true;
}
System.out.printf("Same: %s\n", same.toString());
As expected, the output indicates that the round-trip conversion back to the legacy
encoding was successful. The original byte array and the converted byte array contain
the same bytes:
Same: true
How It Works
The Java platform provides conversion support for many legacy character set encod-
ings. When you create a
String
instance from a
byte
array, you must provide a
charset
argument to the
String
constructor so that the platform knows how to per-
form the mapping from the legacy encoding to Unicode. All Java strings use Unicode
as their native encoding.
The number of bytes in the original array does not usually equal the number of
characters in the result string. In this recipe's example, the original array contains 18
bytes. The 18 bytes are needed by the Shift-JIS encoding to represent the Japanese text.
However, after conversion, the result string contains nine characters. There is not a 1:1
relationship between bytes and characters. In this example, each character requires two
bytes in the original Shift-JIS encoding.
There are literally hundreds of different
charset
encodings. The number of en-
codings is dependent on your Java platform implementation. However, you are guaran-
teed support of several of the most common encodings, and your platform most likely
contains many more than this minimal set:
•
US-ASCII
•
ISO-8859-1
•
UTF-8
•
UTF-16BE