Strings and Regular Expressions - The Java Programming Language

Java Reference

In-Depth Information

13.2.8. Character Set Encoding

A character set encoding specifies how to convert between raw 8-bit

"characters" and their 16-bit Unicode equivalents. Character sets are

named using their standard and common names. The local platform

defines which character set encodings are understood, but every imple-

mentation is required to support the following:

7-bit ASCII , also known as ISO646-US , and as the

Basic Latin block of the Unicode character set

US-ASCII

ISO Latin Alphabet No. 1, also known as ISO-

LATIN-1

ISO-8859-1

8-bit Unicode Transformation Format

UTF-8

16-bit Unicode Transformation Format, big-endi-

an byte order

UTF-16BE

16-bit Unicode Transformation Format, little-en-

dian byte order

UTF-16LE

16-bit Unicode Transformation Format, byte or-

der specified by a mandatory initial byte-order

mark (either order accepted on input, big-endi-

an used on output)

UTF-16

Consult the release documentation for your implementation to see if any

other character set encodings are supported.

Search WWH ::

Custom Search

Home