Java Reference
In-Depth Information
Solution
Convert the text to or from internal Unicode by specifying a converter when you construct an
InputStreamReader
or
PrintWriter
.
Discussion
Classes
InputStreamReader
and
OutputStreamWriter
are the bridge from byte-oriented
Stream
s to character-based
Reader
s. These classes read or write bytes and translate them to
or from characters according to a specified character encoding. The UTF-16 character set
used inside Java (
char
and
String
types) is a 16-bit character set. But most character
sets—such as ASCII, Swedish, Spanish, Greek, Turkish, and many others—use only a small
subset of that. In fact, many European language character sets fit nicely into 8-bit characters.
Even the larger character sets (script-based and pictographic languages) don't all use the
same bit values for each particular character. The encoding, then, is a mapping between Java
characters and an external storage format for characters drawn from a particular national or
linguistic character set.
To simplify matters, the
InputStreamReader
and
OutputStreamWriter
constructors are the
only places where you can specify the name of an encoding to be used in this translation. If
you do not specify an encoding, the platform's (or user's) default encoding is used.
PrintWriters
,
BufferedReaders
, and the like all use whatever encoding the
In-
putStreamReader
or
OutputStreamWriter
class uses. Because these bridge classes only
accept
Stream
arguments in their constructors, the implication is that if you want to specify a
nondefault converter to read or write a file on disk, you must start by constructing not a
FileReader
or
FileWriter
, but a
FileInputStream
or
FileOutputStream
!
// UseConverters.java
BufferedReader fromKanji = new BufferedReader(
new InputStreamReader(new FileInputStream("kanji.txt"), "EUC_JP"));
PrintWriter toSwedish = new PrinterWriter(
new OutputStreamWriter(new FileOutputStream("sverige.txt"), "Cp278"));
Not that it would necessarily make sense to read a single file from Kanji and output it in a
Swedish encoding; for one thing, most fonts would not have all the characters of both char-
acter sets, and, at any rate, the Swedish encoding certainly has far fewer characters in it than
the Kanji encoding. Besides, if that were all you wanted, you could use a JDK tool with the
ill-fitting name
native2ascii
(see its documentation for details). A list of the supported encod-