Java Reference
In-Depth Information
Solution
Convert the text to or from internal Unicode by specifying a converter when you construct an
InputStreamReader or PrintWriter .
Discussion
Classes InputStreamReader and OutputStreamWriter are the bridge from byte-oriented
Stream s to character-based Reader s. These classes read or write bytes and translate them to
or from characters according to a specified character encoding. The UTF-16 character set
used inside Java ( char and String types) is a 16-bit character set. But most character
sets—such as ASCII, Swedish, Spanish, Greek, Turkish, and many others—use only a small
subset of that. In fact, many European language character sets fit nicely into 8-bit characters.
Even the larger character sets (script-based and pictographic languages) don't all use the
same bit values for each particular character. The encoding, then, is a mapping between Java
characters and an external storage format for characters drawn from a particular national or
linguistic character set.
To simplify matters, the InputStreamReader and OutputStreamWriter constructors are the
only places where you can specify the name of an encoding to be used in this translation. If
you do not specify an encoding, the platform's (or user's) default encoding is used.
PrintWriters , BufferedReaders , and the like all use whatever encoding the In-
putStreamReader or OutputStreamWriter class uses. Because these bridge classes only
accept Stream arguments in their constructors, the implication is that if you want to specify a
nondefault converter to read or write a file on disk, you must start by constructing not a
FileReader or FileWriter , but a FileInputStream or FileOutputStream !
// UseConverters.java
BufferedReader fromKanji = new BufferedReader(
new InputStreamReader(new FileInputStream("kanji.txt"), "EUC_JP"));
PrintWriter toSwedish = new PrinterWriter(
new OutputStreamWriter(new FileOutputStream("sverige.txt"), "Cp278"));
Not that it would necessarily make sense to read a single file from Kanji and output it in a
Swedish encoding; for one thing, most fonts would not have all the characters of both char-
acter sets, and, at any rate, the Swedish encoding certainly has far fewer characters in it than
the Kanji encoding. Besides, if that were all you wanted, you could use a JDK tool with the
ill-fitting name native2ascii (see its documentation for details). A list of the supported encod-
Search WWH ::




Custom Search