Java Reference
In-Depth Information
You need to convert characters in a byte array from a legacy character set encoding to a
Unicode string.
Solution
Convert legacy character encodings from a byte array to a Unicode string using the
String class. The following code snippet from the
org.java8recipes.chapter12.recipe12_6.Recipe12_6 class demon-
strates how to convert a legacy Shift-JIS encoded byte array to a string. Later in this
same example, the code demonstrates how to convert from Unicode back to the Shift-
JIS byte array.
byte[] legacySJIS
= {(byte)0x82,(byte)0xB1,(byte)0x82,(byte)0xF1,
(byte)0x82,(byte)0xC9,(byte)0x82,(byte)0xBF,
(byte)0x82,(byte)0xCD,(byte)0x81,(byte)0x41,
(byte)0x90,(byte)0xA2,(byte)0x8A,(byte)0x45,
(byte)0x81,(byte)0x49};
// Convert a byte[] to a String
Charset cs =Charset.forName("SJIS");
String greeting = new String(legacySJIS, cs);
System.out.printf("Greeting: %s\n", greeting);
This code prints out the converted text, which is “Hello, world!” in Japanese:
Greeting:
!
Use the getBytes() method to convert characters from a string to a byte array.
Building on the previous code, convert back to the original encoding with the follow-
ing code and compare the results:
// Convert a String to a byte[]
byte[] toSJIS = greeting.getBytes(cs);
// Confirm that the original array and newly converted
array are same
Boolean same = false;
Search WWH ::




Custom Search