Character Encodings - Beginning Java 8 Fundamentals

Java Reference

In-Depth Information

UTF-8 (UCS Transformation Format 8)

This is a variable-length encoding method, which may use 1 to 6 octets to represent a character from UCS. All ASCII

characters are encoded using one octet. In the UTF-8 format of character encoding, characters are represented using

one or more octets as shown in Table A-2 .

Table A-2. List of Legal UTF-8 Sequences

Number of Octets

Bit Patterns Used

UCS Code

1

Octet 1: 0xxxxxxx

00000000-0000007F

2

Octet 1: 110xxxxx

Octet 2: 10xxxxxx

00000080-000007FF

3

Octet 1: 1110xxxx

Octet 2: 10xxxxxx

Octet 3: 10xxxxxx

00000800-0000FFFF

4

Octet 1: 11110xxx

Octet 2: 10xxxxxx

Octet 3: 10xxxxxx

Octet 4: 10xxxxxx

00010000-001FFFFF

5

Octet 1: 111110xx

Octet 2: 10xxxxxx

Octet 3: 10xxxxxx

Octet 4: 10xxxxxx

Octet 5: 10xxxxxx

00200000-03FFFFFF

6

Octet 1: 1111110x

Octet 2: 10xxxxxx

Octet 3: 10xxxxxx

Octet 4: 10xxxxxx

Octet 5: 10xxxxxx

Octet 6: 10xxxxxx

04000000-7FFFFFFF

The “x” in the table indicates either a 0 or a 1. Note that, in UTF-8 format, an octet that starts with a 0 bit indicates

that it is representing an ASCII character. An octet starting with 110 bits combinations indicates that it is the first octet

of the 2-octet representation of a character, and so on. Also note that, in the case an octet is a part of a multi-octet

character representation, the octet other than the first one starts with a 10 bits pattern. Security checks can be easily

implemented for UTF-8 encoded data. UTF-8 octet sequences, which do not conform to the octet-sequences shown in

the table, are considered invalid.

Java and Character Encodings

Java stores and manipulates all characters and strings as Unicode characters. In serialization and byte codes, Java uses

the UTF-8 encoding of the Unicode character set. All implementations of Java virtual machine are required to support

the character encoding methods, as shown in Table A-3 .

Beginning Java 8 Fundamentals

Search WWH ::

Custom Search

Home