Java Reference
In-Depth Information
When saving files in the old ASCII standard, you ensure compatibility but can
only store a basic range of characters (fine for English).
Otherwise, try to work with UTF-8. This ensures that every ASCII character is still
represented by one byte, and all other characters will take up a few more bytes.
Binary files are much less structured, in the sense that they contain a sequence of bits that are struc-
tured and organized completely according to the whims of the program or programmer who created
the file. This does not mean, however, that there cannot be some form of standardization behind
such files. Consider, for example, image formats such as JPG or PNG, which can be opened by many
image-viewing programs. Consider Figure 8-2, which shows a JPG file of a fractal opened by an
image viewer capable of interpreting this format on the left side and the raw bytes on the right side.
fiGure 8-2
As you can see in Figure 8-2, the contents of this binary file cannot be represented as a series of
characters in a meaningful way, although fragments and pieces of text can exist here and there,
representing metadata (where the picture was taken, for instance) or strings. In many cases, many
binary files will also start with a specific sequence of bytes (called a “magic number”), denoting
what type of file it is. JPG files, for instance, begin with FF D8.
special characters
There's more to text files than character encodings alone. When dealing with text,
not every character necessarily needs to represent a letter or a number. Consider
for instance a special “character” representing a space or a tab, or a character rep-
resenting a line break (the end of a line). Especially regarding the latter, some dif-
ferences exist between operating systems. On Windows, a line break is represented
by two characters ( 0D and 0A in hexadecimal). On most Unix-derived operating
systems (such as Linux), a line break is represented by a single character ( 0A ). The
latter also holds for Mac systems, except for older Macs, where 0D is used instead.
Some older operating systems also use fixed line lengths or other characters, but
suffice it to say that this is something not all systems agree on. Luckily, Java helps
correctly detect the end of a line in a text file, as you will see later.
 
Search WWH ::




Custom Search