Sending and Receiving Messages - TCP/IP Sockets in Java Practical Guide for Programmers

Java Reference

In-Depth Information

handles the byte-to-character translation), looking for the delimiter sequence, and returning

the character string preceding it.

Unfortunately, the Reader classes do not support reading binary data. Moreover, the

relationship between the number of bytes read from the underlying InputStream and the

number of characters read from the Reader is unspecified, especially with multibyte encodings.

When a message uses a combination of the two framing methods mentioned above, with some

explicit-length-delimited fields and others using character markers, this can create problems.

The class Framer , defined below, allows an InputStream to be parsed as a sequence of

fields delimited by specific byte patterns. The static method Framer.nextToken() reads bytes

from the given InputStream until it encounters the given sequence of bytes or the stream ends.

All bytes read up to that point are then returned in a new byte array. If the end of the stream is

encountered before any data is read, null is returned. The delimiter can be different for each

call to nextToken() , and the method is completely independent of any encoding.

A couple of words of caution are in order here. First, nextToken() is terribly inecient;

for real applications, a more ecient pattern-matching algorithm should be used. Second,

when using Framer.nextToken() with text-based message formats, the caller must convert the

delimiter from a character string to a byte array and the returned byte array to a character

string. In this case the character encoding needs to distribute over concatenation, so that it

doesn't matter whether a string is converted to bytes all at once, or a little bit at a time.

To make this precise, let

represent an encoding—that is, a function that maps

character sequences to byte sequences. Let

E( )

and

be sequences of characters, so

E(a)

denotes the sequence of bytes that is the result of encoding

. Let “

” denote concatenation

of sequences, so

. This explicit-conversion

approach (as opposed to parsing the message as a character stream) should only be used with

encodings that have the property that

a + b

is the sequence consisting of

followed by

; otherwise, the results may be

unexpected. Although most encodings supported in Java have this property, some do not.

In particular, UnicodeBig and UnicodeLittle encode a String by first outputting a byte-order

indicator (the 2-byte sequence 254-255 for big-endian, and 255-254 for little-endian), followed

by the 16-bit Unicode value of each character in the String , in the indicated byte order. Thus,

the encoding of “Big fox” using UnicodeBig is as follows:

E(a + b) = E(a) + E(b)

254 255

102 0 120

[mark] 'B' 'i' 'g' ' ' 'f' 'o' 'x'

105

103

0 20

111

while the encoding of “Big” concatenated with the encoding of “fox”, using the same encoding,

is as follows:

254

102 0 120

[mark] 'B' 'i' 'g' [mark] ' ' 'f' 'o' 'x'

255

105

103

254

255

0 20

111

Using either of these encodings to convert the delimiter results in a byte sequence that

begins with the byte-order marker. Moreover, if the byte array returned by nextToken() does not

begin with one of the markers, any attempt to convert it to a String using one of these encodings

TCP/IP Sockets in Java Practical Guide for Programmers

Search WWH ::

Custom Search

Home