Java Reference
In-Depth Information
if (legacySJIS.length == toSJIS.length) {
for (int x=0; x< legacySJIS.length; x++) {
if(legacySJIS[x] != toSJIS[x]) break;
}
same = true;
}
System.out.printf("Same: %s\n", same.toString());
As expected, the output indicates that the round-trip conversion back to the legacy
encoding was successful. The original byte array and the converted byte array contain
the same bytes:
Same: true
How It Works
The Java platform provides conversion support for many legacy character set encod-
ings. When you create a String instance from a byte array, you must provide a
charset argument to the String constructor so that the platform knows how to per-
form the mapping from the legacy encoding to Unicode. All Java strings use Unicode
as their native encoding.
The number of bytes in the original array does not usually equal the number of
characters in the result string. In this recipe's example, the original array contains 18
bytes. The 18 bytes are needed by the Shift-JIS encoding to represent the Japanese text.
However, after conversion, the result string contains nine characters. There is not a 1:1
relationship between bytes and characters. In this example, each character requires two
bytes in the original Shift-JIS encoding.
There are literally hundreds of different charset encodings. The number of en-
codings is dependent on your Java platform implementation. However, you are guaran-
teed support of several of the most common encodings, and your platform most likely
contains many more than this minimal set:
US-ASCII
ISO-8859-1
UTF-8
UTF-16BE
Search WWH ::




Custom Search